Lake Huron data set
data("LakeHuron")
years <- time(LakeHuron)
fit <- fastTS(LakeHuron, n_lags_max = 3)
fit
#> An endogenous PACF-based fastTS model.
#>
#> PF_gamma AICc_d BIC_d
#> 0.00 *0* 0.54
#> 0.25 <0.01 0.54
#> 0.50 0.01 0.55
#> 1.00 0.05 0.28
#> 2.00 0.66 *0*
#> 4.00 4.46 0.89
#> 8.00 4.46 0.89
#> 16.00 4.46 0.89
#>
#> AICc_d and BIC_d are the difference from the minimum; *0* is best.
#>
#> - Best AICc model: 4 active terms
#> - Best BIC model: 3 active terms
#>
#> Test-set prediction accuracy (20% held-out test set)
#> rmse rsq mae
#> AICc 0.7836646 0.5955089 0.6056737
#> BIC 0.7486619 0.6308355 0.6032140
What does predict
do?
Let \(y_t\) refer to our outcome series, and \(\hat y_t^{(k)}\) refer to the \(k\)-step-ahead prediction for \(y_t\).
The predicted value returned at any time point \(t\) is the model’s prediction for that
point \(\hat y_t\), given the model and
all data up to \(t -\)
n_ahead
. This means that
The 1-step prediction \(\hat y_t^{(1)}\) is computed by using lags of \(y_t\) deemed important by the fitting process.
The 2-step prediction \(\hat y_t^{(2)}\) is computed by using important lags of \(y_t\), but replacing the first lag \(y_{t-1}\) with \(\hat y_{t-1}^{(1)}\).
The 3-step prediction \(\hat y_t^{(3)}\) is computed by replacing the first lag \(y_{t-1}\) with \(\hat y_{t-1}^{(2)}\) and the second lag \(y_{t-2}\) with \(\hat y_{t-2}^{(1)}\).
And so on until the \(k\)-step prediction \(\hat y_t^{(k)}\) is similarly computed by replacing lags of \(y_t\) with predicted values as necessary.
Here is an example with the LakeHuron
data set.
p1 <- predict(fit, n_ahead = 1)
p7 <- predict(fit, n_ahead = 7)
predictions <- tibble(years, LakeHuron, p1, p7)
head(predictions, 10)
#> # A tibble: 10 × 4
#> years LakeHuron p1 p7
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1875 580. NA NA
#> 2 1876 582. NA NA
#> 3 1877 581. NA NA
#> 4 1878 581. 580. NA
#> 5 1879 580. 581. NA
#> 6 1880 580. 579. NA
#> 7 1881 580. 581. NA
#> 8 1882 581. 580. NA
#> 9 1883 581. 581. NA
#> 10 1884 581. 581. 579.
tail(predictions)
#> # A tibble: 6 × 4
#> years LakeHuron p1 p7
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1967 578. 578. 579.
#> 2 1968 579. 579. 579.
#> 3 1969 580. 579. 579.
#> 4 1970 579. 580. 579.
#> 5 1971 580. 579. 578.
#> 6 1972 580. 580. 579.
- The
predict
function returns missing values for the firstn_lags_max
observations for 1-step ahead predictions. The prediction process back-fill real values when necessary for early predictions, but resets to NA before returning predictions. - In 1884, the model’s 1-step prediction, the one that would be made in 1883, is 581.1087408.
- The 7-step prediction for 1884, the one “made” in 1877, is 579.4498549.
Note: there is a “burn-in” component to fastTS
objects
that means the first n_lags_max
observations are
back-filled in.
Forecasting
By default, the predict
function does
not produce forecasts. In order to get forecasts, we
need to set forecast_ahead = TRUE
, which will return
forecasted values at the tail end of the returned vector.
p1 <- predict(fit, n_ahead = 1, forecast_ahead = TRUE)
predictions <- tibble(time = c(1973), p1)
# For 7-step ahead forecasts
p7 <- predict(fit, n_ahead = 7, forecast_ahead = TRUE)
predictions <- tibble(time = c(1973:1979), p7)
predictions
#> # A tibble: 7 × 2
#> time p7
#> <int> <dbl>
#> 1 1973 580.
#> 2 1974 580.
#> 3 1975 579.
#> 4 1976 579.
#> 5 1977 579.
#> 6 1978 579.
#> 7 1979 579.
Finally, the return_intermediate
option allows users to
collect all of the step-ahead predictions up to \(k\):
p1_p7 <- predict(fit, n_ahead = 7, return_intermediate = TRUE)
predictions <- tibble(years, LakeHuron, p1_p7)
tail(predictions)
#> # A tibble: 6 × 9
#> years LakeHuron p1 p2 p3 p4 p5 p6 p7
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1967 578. 578. 578. 578. 578. 579. 579. 579.
#> 2 1968 579. 579. 578. 578. 578. 578. 579. 579.
#> 3 1969 580. 579. 579. 578. 578. 578. 578. 579.
#> 4 1970 579. 580. 579. 579. 578. 578. 578. 579.
#> 5 1971 580. 579. 580. 579. 579. 578. 578. 578.
#> 6 1972 580. 580. 579. 579. 579. 579. 579. 579.