Autocorrelation

Autocorrelation measures the linear relationship of a variable with the lagged version of itself. Just like Pearson’s correlation it can take values between -1 and 1. A value of 1 would mean a perfect positive linear relationship between a variable and its lagged values. A value of -1 would mean a perfect negative linear relationship between a variable and its lags. Finally, a value of 0 would mean no relationship between a variables and its lags. (James et al.)

Lag 1

yt	lag
40.74014	33.23334
29.17253	40.74014
53.40970	29.17253
72.41567	53.40970
45.74196	72.41567
36.44378	45.74196

There appears to be a strong correlation in the lag values and the current values.

Autocorrelation Statistic for Lag 1:

\(r_{1}\)=\(\frac{\sum_{t=2}^{T} (y_{t-1}-\bar{y})(y_{t}-\bar{y})}{\sum_{t=1}^{T} (y_{t}-\bar{y})^{2}}\)

Autocorrelation Statistic for Lag k:

\(r_{k}\)=\(\frac{\sum_{t=k+1}^{T} (y_{t-k}-\bar{y})(y_{t}-\bar{y})}{\sum_{t=1}^{T} (y_{t}-\bar{y})^{2}}\)

Statistical tests can also be performed on our autocorrelation statistics. If it is found that the autocorrelation for each lag is not statistically significant than this is evidence for a white noise process. If at least one of the lags is found to be statistically significant it is evidence to use a model other than white noise model.

\(H_{0}\): \(\rho_{k}\)=0, autocorrelation is zero, evidence of white noise

\(H_{a}\): \(\rho_{k} \ne 0\) autocorrelation is not zero, don’t use white noise

\(\rho_{k}\) is the population autocorrelation

The standard error of \(r_{k}\):

\(se_{r_{k}}\) = \(\frac{1}{\sqrt{T}}\)

Reject if:

\(|\frac{r_{k}-0}{se_{r_{k}}}|>t_{a/2,df}\)

AR(1) Model

\(y_{t}=\beta_{0}+\beta_{1}y_{t-1}+\epsilon_{t}\)

\(\beta_{0}\): fixed constant
\(\beta_{1}\): value between -1 and 1, not including -1 or 1
\(\epsilon_{t}\): white noise process

AR(1) is an autoregressive model of order 1. An autoregressive model means it uses the values of lagged observations in order to predict new ones. The value of order 1 means it uses the observations that are 1 lag away to predict future values.

In order to use an autoregressive model, the \(\beta_{1}\) value must be between -1 and 1, and not 0. An autoregressive model is stationary meaning the mean and variance stay the same over time. If the \(\beta_{1}\) value is 1 then the model becomes a random walk model, if the \(\beta_{1}\) value is 0 it simplifies to a white noise process.

If \(\beta_{1}\)=1:

\(y_{t}=\beta_{0}+\beta_{1}y_{t-1}+\epsilon_{t}\)

\(y_{t}=\beta_{0}+(1)y_{t-1}+\epsilon_{t}\)

\(y_{t}-y_{t-1}=\beta_{0}+\epsilon_{t}\)

If \(\beta_{1}\)=0:

\(y_{t}=\beta_{0}+\beta_{1}y_{t-1}+\epsilon_{t}\)

\(y_{t}=\beta_{0}+(0)y_{t-1}+\epsilon_{t}\)

\(y_{t}=\beta_{0}+\epsilon_{t}\)

Autoregressive models can be assigned with orders above 1 such as order 2. An AR(2) model would mean it uses both the lag 1 values and lag 2 values in order to predict the next observation.

AR(2):

\(y_{t}=\beta_{0}+\beta_{1}y_{t-1}+\beta_{2}y_{t-2}+\epsilon_{t}\)

When to Use Autoregressive Model

The data must be stationary. This means a constant mean and variance which can be located with control charts.

\(E[Y_{t}]\)=\(\frac{\beta_{0}}{1-\beta_{1}}\)

\(Var[Y_{t}]\)=\(\frac{\sigma^{2}}{1-\beta_{1}^{2}}\)

There should be some relationship between observations and the lagged version of itself. For an AR(1) model there should be some relationship with the lag 1 version of itself. This can be located with scatter plots and the autocorrelation function.
There is a relationship between the autocorrelation and \(\beta_{1}\) parameter. The lag k autocorrelation should be near equal to value of \(\beta_{1}^{k}\). This means the autocorrelations should form a geometric series as the value of the lag increases. (James et al.)

Example:

If we have the equation \(y_{t}=\beta_{0}+(.8)y_{t-1}+\epsilon_{t}\), where our \(\beta_{1}\) value is .8, our autocorrelation value of lag 1 will be near .8, lag 2 \(.8^{2}\)=.64, lag 3 \(.8^{3}\)=.512, and etc.

Estimating Parameters Analytically

The parameters of the AR(1) model can be found in the same way as with SLR. The equation for \(b_{1}\) is:

\(b_{1}\):

\(\frac{\sum_{t=2}^{n} (y_{t-1}-\bar{y}_{-})(y_{t}-\bar{y}_{+})}{\sum_{t=2}^{n} (y_{t-1}-\bar{y}_{-})^{2}}\)

which is identical to SLR where the predictor term x is instead replaced by the lagged variable.

\(b_{0}\):

\(\bar{y}_{+}=b_{0}+b_{1}\bar{y}_{-}\)

\(\bar{y}_{+}\) is simply the mean of the dependent variable while the \(\bar{y}_{-}\) is the mean of the independent variable, which is the lagged variable. (James et al.)

Estimating Parameters Non-Analytically

There is also a method to estimate \(b_{0}\) and \(b_{1}\) non analytically. The \(b_{1}\) parameter can be estimated as being equal to \(r_{1}\), which is the autocorrelation of lag 1. Then, the \(b_{0}\) parameter can be estimated with the formula \(\bar{y}=b_{0}+r_{1}\bar{y}\). This formula works because over time \(\bar{y}_{-}\) will approach \(\bar{y}_{+}\) as there is only one observation difference. (James et al.)

Predictions Using AR(1)

Predictions of the AR(1) model can be found recursively:

\(y_{t}=b_{0}+b_{1}y_{t-1}\)

The standard errors of the predictions are:

\(se_{\hat{y}_{t}}=s\sqrt{1+b_{1}^{2}+b_{1}^{4}+...+b_{1}^{2(l-1)}}\)

Prediction Interval:

\(\hat{y}_{t} \pm se_{\hat{y}_{t}}*t_{a/2,df}\)

s: The variance from the white noise terms, estimated using MSE

Sources:

James, G., Witten, D., Hastie, T., & Tibshirani, R. (n.d.). An introduction to statistical learning: With applications in R.

SRM Chapter 4.3

theBudgetActuary

Autocorrelation

Lag 1

AR(1) Model

When to Use Autoregressive Model

Estimating Parameters Analytically

Estimating Parameters Non-Analytically

Predictions Using AR(1)

Sources: