This PhD dissertation comprises four essays on forecasting financial markets with unsupervised predictive analytics techniques, most notably time series extrapolation methods and artificial neural networks. Key objectives of the research were reproducibility and replicability, which are fundamental principles in management science and, as such, the implementation of all of the suggested algorithms has been fully automated and completely unsupervised in R.
As with any predictive analytics exercise, computational intensiveness is a significant challenge and criterion of performance and, thus, both forecasting accuracy and uncertainty as well as computational times are reported in all essays. Multiple horizons, multiple methods and benchmarks and multiple metrics are employed as dictated by good practice in empirical forecasting exercises.
The essays evolve in nature as each one is based on the previous one, testing one more condition as the essays progress, outlined in sequence as follows: which method wins overall in a very extensive evaluation over five frequencies (yearly, quarterly, monthly, weekly and daily data) over 18 time series of stocks with the biggest capitalization from the FTSE 100, over the last 20 years (first essay); the impact of horizon in this exercise and how this promotes different winners for different horizons (second essay); the impact of using uncertainty in the form of maximum-minimum values per period, despite still being interested in forecasting the mean expected value over the next period; and introducing a second variable capturing all other aspects of the behavioural nature of the financial environment – the trading volume – and evaluating whether this improves forecasting performance or not.
The whole endeavour required the use of the High Performance Computing Wales (HPC Wales) for a significant amount of time, incurring computational costs that ultimately paid off in terms of increased forecasting accuracy for the AI approaches; the whole exercise for one series can be repeated on a fast laptop device (i7 with 16 GB of memory).
Overall (forecasting) horses for (data) courses were once again proved to perform best, and the fact that one method cannot win under all conditions was once more evidenced. The introduction of uncertainty (in terms of range for every period), as well as volume as a second variable capturing environmental aspects, was beneficial with regard to forecasting accuracy and, overall, the research provided empirical evidence that predictive analytics approaches have a future in such a forecasting context.
Given this was a predictive analytics exercise, focus was placed on forecasting levels (monetary values) and not log-returns; and out-of-sample forecasting accuracy, rather than causality, was a primary objective, thus multiple regression models were not considered as benchmarks.
As in any empirical predicting analytics exercise, more time series, more artificial intelligence methods, more metrics and more data can be employed so as to allow for full generalization of the results, as long as all of these can be fully automated and forecast unsupervised in a freeware environment – in this thesis that being R.