Time-Series Model Building for TSX Stock Prices Using R

Time-series modelling and forecasting was definitely a core concept to learn and one of the more important technical skills to pick up in a masters program in economics. Having primarily focused on economic applications of time-series such as the estimation and prediction of future Canadian Real GDP and Real Interest Rates, I was able to get a sense of the power of time-series modelling.

One of the things I wished these applied econometric courses taught me was how to at least follow some sort of standard procedure in building a good time-series model! It can be misleading to have a time-series model project that you worked really hard on, knowing that you followed every step in the textbook, but realize in practice that it is a horrible model!

Luckily, I was able to pick up on one of the fundamental statistical model building procedures: splitting your data into a training and testing set with time-series data.

In this mini-project, I use a data set of 141 observations from Yahoo Finance Canada where each observation represents the S&P/TSX stock price for a particular month. Here, I wanted to demonstrate the importance of building a forecasting model on a training set, using this model to forecast future values, and comparing the forecasted values with actual observed values to see how well the model performed. Take note on the Charizard and Venusaur colour palettes of graphs during the read!

The following describes the procedures taken to create this model:

1. Perform decomposition of time-series using LOESS on full data set of stock prices.
2. Obtain a vector of seasonally adjusted stock prices (remainder) after the seasonal and trend components are removed.
3. Separate this remainder into a training and test set where the training set consists of all seasonally adjusted stock prices between January 2006 to December 2015.
4. Build an ARIMA model on the training set and obtain predicted values into the present month (October 2016).
5. Compare predicted values to actual values observed and assess model performance.

Using the above procedures, I was able to obtain the following graph using an ARIMA(5,0,0) model or more simply an MA(5) model.

When we see the estimated model (Fitted line) compared to the actual behaviour of the stocks, the two are seemingly close. It is evident though that the model predictions into 2016 come close to what was actually observed but still over-predict the seasonally adjusted stock prices. About 6 months into the forecast, the predictions start to decrease and under-predict actual stock performance. Now, forecasts such as these are not meant to go out for too long as they become unreliable. For demonstration purposes, it is nice to see that the model can reliably predict behaviour for a few months into the future.

Since I do not want to put complete faith into this one model, I also ran a Simple Exponential Smoothing time-series model using HoltWinters.

The following describes the procedures taken to create this model:

1. Separate the raw time-series data into a training and test set where the training set consists of all seasonally adjusted stock prices between January 2006 to December 2015.
2. Build an Simple Exponential Smoothing (SES) model using Holt-Winters on the training set and obtain predicted values into the present month (October 2016).
3. Compare predicted values to actual values observed and assess model performance.

Using the above procedures, I was able to obtain the following graph:

I am happy with the results of this graph. First, one needs to note that this model takes in raw stock prices and as such should be interpreted as their raw prices. The SES model fits closely with the actual values and the forecasting behaviour is similar to that obtained from the ARIMA(5,0,0) model. The confidence intervals shaded in blue show some boundaries to these values which is also very nice.

Further Work

In this post, I addressed the idea of being able to split a time-series data set into a training and testing set and model accordingly. In the first model, I used LOESS, a decomposition method to make the stock price data stationary by removing the seasonal and trending components. This allowed me to reliably apply an ARIMA model and make subsequent predictions. In the second model, I directly applied the Holt-Winters method and obtained similar results.

Although one could see that the time-series models were trained well and to a certain degree able to predict well, there is much more left to be said here. It is much better practice to compare the performance between the two models through calculations of their prediction errors. To take it even further, we could apply more advanced time-series modelling via neural networks.

Source Code

This project has been done in R. The source code can be found at my github here.