--- title: "Getting Started with NNS: Forecasting" author: "Fred Viole" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with NNS: Forecasting} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- ```{r setup, include=FALSE, message=FALSE} knitr::opts_chunk$set(echo = TRUE) library(NNS) library(data.table) data.table::setDTthreads(2L) options(mc.cores = 1) Sys.setenv("OMP_THREAD_LIMIT" = 2) ``` ```{r setup2, message=FALSE, warning = FALSE} library(NNS) library(data.table) require(knitr) require(rgl) ``` # Forecasting The underlying assumptions of traditional autoregressive models are well known. The resulting complexity with these models leads to observations such as, *\`\`We have found that choosing the wrong model or parameters can often yield poor results, and it is unlikely that even experienced analysts can choose the correct model and parameters efficiently given this array of choices.''* `NNS` simplifies the forecasting process. Below are some examples demonstrating **`NNS.ARMA`** and its **assumption free, minimal parameter** forecasting method. ## Linear Regression **`NNS.ARMA`** has the ability to fit a linear regression to the relevant component series, yielding very fast results. For our running example we will use the `AirPassengers` dataset loaded in base R. We will forecast 44 periods `h = 44` of `AirPassengers` using the first 100 observations `training.set = 100`, returning estimates of the final 44 observations. We will then test this against our validation set of `tail(AirPassengers,44)`. Since this is monthly data, we will try a `seasonal.factor = 12`. Below is the linear fit and associated root mean squared error (RMSE) using `method = "lin"`. ```{r linear,fig.width=5,fig.height=3,fig.align = "center", warning=FALSE} nns_lin = NNS.ARMA(AirPassengers, h = 44, training.set = 100, method = "lin", plot = TRUE, seasonal.factor = 12, seasonal.plot = FALSE) sqrt(mean((nns_lin - tail(AirPassengers, 44)) ^ 2)) ``` ## Nonlinear Regression Now we can try using a nonlinear regression on the relevant component series using `method = "nonlin"`. ```{r nonlinear,fig.width=5,fig.height=3,fig.align = "center", eval = FALSE} nns_nonlin = NNS.ARMA(AirPassengers, h = 44, training.set = 100, method = "nonlin", plot = FALSE, seasonal.factor = 12, seasonal.plot = FALSE) sqrt(mean((nns_nonlin - tail(AirPassengers, 44)) ^ 2)) ``` ```{r nonlinearres, eval = FALSE} [1] 19.21812 ``` ## Cross-Validation We can test a series of `seasonal.factors` and select the best one to fit. The largest period to consider would be `0.5 * length(variable)`, since we need more than 2 points for a regression! Remember, we are testing the first 100 observations of `AirPassengers`, not the full 144 observations. ```{r seasonal test, eval=TRUE} seas = t(sapply(1 : 25, function(i) c(i, sqrt( mean( (NNS.ARMA(AirPassengers, h = 44, training.set = 100, method = "lin", seasonal.factor = i, plot=FALSE) - tail(AirPassengers, 44)) ^ 2) ) ) ) ) colnames(seas) = c("Period", "RMSE") seas ``` Now we know `seasonal.factor = 12` is our best fit, we can see if there's any benefit from using a nonlinear regression. Alternatively, we can define our best fit as the corresponding `seas$Period` entry of the minimum value in our `seas$RMSE` column. ```{r best fit, eval=TRUE} a = seas[which.min(seas[ , 2]), 1] ``` Below you will notice the use of `seasonal.factor = a` generates the same output. ```{r best nonlinear,fig.width=5,fig.height=3,fig.align = "center", eval=TRUE} nns = NNS.ARMA(AirPassengers, h = 44, training.set = 100, method = "nonlin", seasonal.factor = a, plot = TRUE, seasonal.plot = FALSE) sqrt(mean((nns - tail(AirPassengers, 44)) ^ 2)) ``` **Note:** You may experience instances with monthly data that report `seasonal.factor` close to multiples of 3, 4, 6 or 12. For instance, if the reported `seasonal.factor = {37, 47, 71, 73}` use `(seasonal.factor = c(36, 48, 72))` by setting the `modulo` parameter in **`NNS.seas(..., modulo = 12)`**. The same suggestion holds for daily data and multiples of 7, or any other time series with logically inferred cyclical patterns. The nearest periods to that `modulo` will be in the expanded output. ```{r modulo, eval=TRUE} NNS.seas(AirPassengers, modulo = 12, plot = FALSE) ``` ## Cross-Validating All Combinations of `seasonal.factor` NNS also offers a wrapper function **`NNS.ARMA.optim()`** to test a given vector of `seasonal.factor` and returns the optimized objective function (in this case RMSE written as `obj.fn = expression( sqrt(mean((predicted - actual)^2)) )`) and the corresponding periods, as well as the **`NNS.ARMA`** regression method used. Alternatively, using external package objective functions work as well such as `obj.fn = expression(Metrics::rmse(actual, predicted))`. **`NNS.ARMA.optim()`** will also test whether to regress the underlying data first, `shrink` the estimates to their subset mean values, include a `bias.shift` based on its internal validation errors, and compare different `weights` of both linear and nonlinear estimates. Given our monthly dataset, we will try multiple years by setting `seasonal.factor = seq(12, 60, 6)` every 6 months based on our **NNS.seas()** insights above. ```{r best optim, eval=FALSE} nns.optimal = NNS.ARMA.optim(AirPassengers, training.set = 100, seasonal.factor = seq(12, 60, 6), obj.fn = expression( sqrt(mean((predicted - actual)^2)) ), objective = "min", pred.int = .95, plot = TRUE) nns.optimal ``` ```{r optimres, eval=FALSE} [1] "CURRNET METHOD: lin" [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:" [1] "NNS.ARMA(... method = 'lin' , seasonal.factor = c( 12 ) ...)" [1] "CURRENT lin OBJECTIVE FUNCTION = 35.3996540135277" [1] "BEST method = 'lin', seasonal.factor = c( 12 )" [1] "BEST lin OBJECTIVE FUNCTION = 35.3996540135277" [1] "CURRNET METHOD: nonlin" [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:" [1] "NNS.ARMA(... method = 'nonlin' , seasonal.factor = c( 12 ) ...)" [1] "CURRENT nonlin OBJECTIVE FUNCTION = 19.2181153361782" [1] "BEST method = 'nonlin' PATH MEMBER = c( 12 )" [1] "BEST nonlin OBJECTIVE FUNCTION = 19.2181153361782" [1] "CURRNET METHOD: both" [1] "COPY LATEST PARAMETERS DIRECTLY FOR NNS.ARMA() IF ERROR:" [1] "NNS.ARMA(... method = 'both' , seasonal.factor = c( 12 ) ...)" [1] "CURRENT both OBJECTIVE FUNCTION = 19.9790337412655" [1] "BEST method = 'both' PATH MEMBER = c( 12 )" [1] "BEST both OBJECTIVE FUNCTION = 19.9790337412655" $periods [1] 12 $weights NULL $obj.fn [1] 19.21812 $method [1] "nonlin" $shrink [1] FALSE $nns.regress [1] FALSE $bias.shift [1] 10.3416 $errors [1] -12.0495905 -19.5023885 -18.2981119 -30.4665605 -21.9967015 -16.3628298 -12.6732257 -4.2894621 -2.6001984 2.4174837 16.6574755 24.0964052 12.0029210 7.8864972 [15] -0.7526824 -26.4198893 13.6743157 1.1898601 9.1072756 24.6494525 6.3543872 4.5198310 3.7511736 6.9241735 -13.4927319 10.9518474 -12.5758246 -38.6502806 [29] -9.6293956 -16.2385122 -15.9817320 -12.1192381 -21.4941585 -18.2787520 22.4564209 -26.8238096 -33.5539336 -11.7710337 -42.4668107 -43.6993219 -18.9496482 -40.3338256 [43] -17.1561519 -8.7338598 $results [1] 364.5996 431.5868 472.7811 463.4085 406.1696 348.7588 311.4594 351.7984 358.0857 341.3710 402.3232 393.3122 400.8951 479.1979 522.1577 511.8352 447.0831 381.2570 341.6514 [20] 385.4619 390.2282 370.9257 437.1106 427.1677 436.0942 526.1620 574.7971 559.7077 488.0471 415.7932 372.1043 421.7103 422.7644 400.6785 473.1211 461.8718 472.1623 573.5495 [39] 625.3008 607.9430 528.9780 448.9954 402.3864 456.2667 $lower.pred.int [1] 311.9511 378.9384 420.1327 410.7600 353.5212 296.1104 258.8110 299.1500 305.4372 288.7226 349.6748 340.6638 348.2466 426.5495 469.5092 459.1867 394.4347 328.6086 289.0030 [20] 332.8135 337.5797 318.2773 384.4622 374.5193 383.4458 473.5135 522.1487 507.0593 435.3987 363.1447 319.4559 369.0618 370.1160 348.0301 420.4727 409.2233 419.5139 520.9011 [39] 572.6524 555.2945 476.3296 396.3469 349.7380 403.6183 $upper.pred.int [1] 398.9146 465.9018 507.0961 497.7235 440.4846 383.0738 345.7744 386.1134 392.4007 375.6860 436.6382 427.6272 435.2101 513.5129 556.4727 546.1502 481.3981 415.5720 375.9664 [20] 419.7769 424.5432 405.2407 471.4256 461.4827 470.4092 560.4770 609.1121 594.0227 522.3621 450.1082 406.4193 456.0253 457.0794 434.9936 507.4361 496.1868 506.4773 607.8645 [39] 659.6159 642.2580 563.2930 483.3104 436.7015 490.5818 ```
![](images/ARMA_optim.png){width="600" height="400"}
## Extension of Estimates We can forecast another 50 periods out-of-sample (`h = 50`), by dropping the `training.set` parameter while generating the 95% prediction intervals. ```{r extension,results='hide',fig.width=5,fig.height=3,fig.align = "center", eval=FALSE} NNS.ARMA.optim(AirPassengers, seasonal.factor = seq(12, 60, 6), obj.fn = expression( sqrt(mean((predicted - actual)^2)) ), objective = "min", pred.int = .95, h = 50, plot = TRUE) ```
![](images/ARMA_optim_h_50.png){width="600" height="400"}
## Brief Notes on Other Parameters - `seasonal.factor = c(1, 2, ...)` We included the ability to use any number of specified seasonal periods simultaneously, weighted by their strength of seasonality. Computationally expensive when used with nonlinear regressions and large numbers of relevant periods. - `weights` Instead of weighting by the `seasonal.factor` strength of seasonality, we offer the ability to weight each per any defined compatible vector summing to 1.\ Equal weighting would be `weights = "equal"`. - `pred.int` Provides the values for the specified prediction intervals within [0,1] for each forecasted point and plots the bootstrapped replicates for the forecasted points. - `seasonal.factor = FALSE` We also included the ability to use all detected seasonal periods simultaneously, weighted by their strength of seasonality. Computationally expensive when used with nonlinear regressions and large numbers of relevant periods. - `best.periods` This parameter restricts the number of detected seasonal periods to use, again, weighted by their strength. To be used in conjunction with `seasonal.factor = FALSE`. - `modulo` To be used in conjunction with `seasonal.factor = FALSE`. This parameter will ensure logical seasonal patterns (i.e., `modulo = 7` for daily data) are included along with the results. - `mod.only` To be used in conjunction with `seasonal.factor = FALSE & modulo != NULL`. This parameter will ensure empirical patterns are kept along with the logical seasonal patterns. - `dynamic = TRUE` This setting generates a new seasonal period(s) using the estimated values as continuations of the variable, either with or without a `training.set`. Also computationally expensive due to the recalculation of seasonal periods for each estimated value. - `plot` , `seasonal.plot` These are the plotting arguments, easily enabled or disabled with `TRUE` or `FALSE`. `seasonal.plot = TRUE` will not plot without `plot = TRUE`. If a seasonal analysis is all that is desired, `NNS.seas` is the function specifically suited for that task. # Multivariate Time Series Forecasting The extension to a generalized multivariate instance is provided in the following documentation of the **`NNS.VAR()`** function: - [Multivariate Time Series Forecasting: Nonparametric Vector Autoregression Using NNS](https://doi.org/10.2139/ssrn.3489550) # References If the user is so motivated, detailed arguments and proofs are provided within the following: - [Nonlinear Nonparametric Statistics: Using Partial Moments](https://github.com/OVVO-Financial/NNS/blob/NNS-Beta-Version/examples/index.md) - [Forecasting Using NNS](https://doi.org/10.2139/ssrn.3382300) ```{r threads, echo = FALSE} Sys.setenv("OMP_THREAD_LIMIT" = "") ```