Last section we introduced the white noise process where a time series predictions and standard errors remain the same over time. This is generally not the assumption when working with time series analysis. In this section we will introduce the random walk model where the predictions and standard errors will change based on how far ahead we are predicting into the future.

Random Walk Model

The random walk model uses the partial sum of a white noise process. Each individual increase or decrease for the next time period of a random walk model is i.i.d. This means the past time period increase or decrease can not be used to predict the next time period increase or decrease. However, the value at time t is closely related to the value at time t+1.

An example would be a stock price. If the stock price is currently 50 dollars at time 1, it could either increase to 52 dollars or decrease to 48 dollars with equal probability. The value of the stock at time 2 would then be between 48 and 52 dollars. So the new range of possible values (48-52), is closely tied to the past value of the stock. However, now at time 2, the increase or decrease of the stock is again random.

Forecast formula:

\(y_{t}=y_{t-1}+c_{t}\)

-\(y_{t}\): Value at time t

-\(y_{t-1}\): Value at time t-1

-\(c_{t}\): White noise process

Then if we wanted to predict more than one interval away, we could use the equation:

\(\hat{y}_{n+l}=y_{n}+l\bar{y}\)

-\(y_{n}\): Last recorded observation

-\(\bar{y}\): Mean of white noise terms

-l: Intervals away from last recorded observation

Standard error for forecasted values:

\(se_{\hat{y}}\)=\(se_{c}*sqrt{l}\)

-\(se_{c}\): Standard error of white noise terms

-l: Intervals away from last recorded observation

Prediction interval for forecasted values:

\(y_{t}+l\bar{c}\pm t_{a/2,df}*se_{c}\sqrt{l}\)

How to Identify if a Series is a Random Walk

First, a series must be non-stationary. This means either the variance or mean changes over time. The variance increases as we make predictions farther and farther into the future. If we look at the stock example it started at a value of 50. If we were to say at each time interval it could either increase or decrease by 2, this would mean at time 10 the range of possible values would be 40-60. Then, at time 20 the range of possible values would be 30-70. With the increase in the range of the possible values, the standard error will increase.

The next proponent of a random walk is that the differenced series, which is the change in observation values, should depict a white noise process.

Finally, the standard deviation of the original series should be greater than the standard deviation of the differenced series.

Random Walk Transformations

We learned with linear regression that if we take the natural log of our dependent variable it was a way to counter heteroscedasticity where our variance was increasing with the value of y. This can also be applied to our random walk model, as when we make farther and farther predictions the variance will increase. So if take the log of \(\hat{y}_{t}\), it will make the model homescedastic.

If we difference our random walk model we learned that it is a collection of white noise processes. So if we difference our white noise series it will make the mean constant.

Finally if we both take the log of our \(\hat{y}_{t}\) and difference our series, it will become a series with a constant mean and constant variance. The reason for doing this is often a series that is stationary is easier to model.

Performance Metrics for Time Series

Just like we split our data when we wanted to create a validation set for regression or classification, we can create a validatoin set for our time series. Instead of grouping observations randomly, the n observations in the series will be used to train the model, and then the remaining observations will be used to test the model. This is because generally the last observations in the series will be more closely related to the predicted observations so will give more realistic prediction numbers.

Prediction Scores:

Mean Error =\(\frac{1}{T_{2}} \overset{T_{1}+T_{2}}{\underset{t=T_{1}+1}{\sum}} e_{t}\)

Mean Percentage Error = \(\frac{100}{T_{2}} \overset{T_{1}+T_{2}}{\underset{t=T_{1}+1}{\sum}} \frac{e_{t}}{y_{t}}\)

Mean Square Error = \(\frac{1}{T_{2}} \overset{T_{1}+T_{2}}{\underset{t=T_{1}+1}{\sum}} e_{t}^{2}\)

Mean Absolute Error = \(\frac{1}{T_{2}} \overset{T_{1}+T_{2}}{\underset{t=T_{1}+1}{\sum}} |e_{t}|\)

Mean Absolute Percentage Error = \(\frac{100}{T_{2}} \overset{T_{1}+T_{2}}{\underset{t=T_{1}+1}{\sum}} |\frac{e_{t}}{y_{t}}|\)

\(T_{1}\) = Train Set

\(T_{2}\) = Validation Set

\(e_{t}\) = Residual Value