Uses penalized regression to quickly fit time series models with potentially complex seasonal patterns and exogenous variables. Based on methods described in Peterson & Cavanaugh (2024).
Usage
fastTS(
y,
X = NULL,
n_lags_max,
gamma = c(0, 2^(-2:4)),
ptrain = 0.8,
pf_eps = 0.01,
w_endo,
w_exo,
weight_type = c("pacf", "parametric"),
m = NULL,
r = c(rep(0.1, length(m)), 0.01),
plot = FALSE,
ncvreg_args = list(penalty = "lasso", returnX = FALSE, lambda.min = 0.001)
)
# S3 method for fastTS
plot(x, log.l = TRUE, ...)
# S3 method for fastTS
coef(object, choose = c("AICc", "BIC"), ...)
# S3 method for fastTS
print(x, ...)
# S3 method for fastTS
summary(object, choose = c("AICc", "BIC"), ...)
Arguments
- y
univariate time series outcome
- X
matrix of predictors (no intercept)
- n_lags_max
maximum number of lags to consider
- gamma
vector of exponent for weights
- ptrain
prop. to leave out for test data
- pf_eps
penalty factors below this will be set to zero
- w_endo
optional pre-specified weights for endogenous terms
- w_exo
optional pre-specified weights for exogenous terms (details)
- weight_type
type of weights to use for endogenous terms
- m
mode(s) for seasonal lags (used if weight_type = "parametric")
- r
penalty factors for seasonal + local scaling functions (used if weight_type = "parametric")
- plot
logical; whether to plot the penalty functions
- ncvreg_args
additional args to pass through to ncvreg
- x
a fastTS object
- log.l
Should the x-axis (lambda) be logged?
- ...
passed to downstream functions
- object
a fastTS object
- choose
which criterion to use for lambda selection (AICc or BIC)
Value
A list of class fastTS
with elements
- fits
a list of lasso fits
- ncvreg_args
arguments passed to ncvreg
- gamma
the (negative) exponent on the penalty weights, one for each fit
- n_lags_max
the maximum number of lags
- y
the time series
- X
the utilized matrix of exogenous features
- oos_results
results on test data using best of fits
- train_idx
index of observations used in training data
- weight_type
the type of weights used for endogenous terms
- m
the mode(s) for seasonal lags (used if weight_type = "parametric")
- r
penalty factors for seasonal + local scaling functions
- ptrain
the proportion used to train the model
x invisibly
a vector of model coefficients
x (invisibly)
the summary object produced by ncvreg evaluated at the best tuning parameter combination (best AICc).
Details
The default weights for exogenous features will be chosen based on a
similar approach to the adaptive lasso (using bivariate OLS estimates). For
lower dimensional X, it's advised to set w_exo="unpenalized"
,
because this allows for statistical inference on exogenous variable
coefficients via the summary
function.
By default, a seasonal frequency m
must not be specified and the
PACF is used to estimate the weights for endogenous terms. A parametric
version is also available, which allows for a penalty scaling function that
penalizes seasonal and recent lags less according to the penalty scaling
functions described in Peterson & Cavanaugh (2024). See the
penalty_scaler
function for more details, and to plot the penalty
function for various values of m
and r
.
References
Breheny, P. and Huang, J. (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist., 5: 232-253.
Peterson, R.A., Cavanaugh, J.E. (2022) Ranked sparsity: a cogent regularization framework for selecting and estimating feature interactions and polynomials. AStA Adv Stat Anal. https://doi.org/10.1007/s10182-021-00431-7
Peterson, R.A., Cavanaugh, J.E. (2024). Fast, effective, and coherent time series modeling using the sparsity-ranked lasso. Statistical Modelling (accepted). DOI: https://doi.org/10.48550/arXiv.2211.01492
Examples
data("LakeHuron")
fit_LH <- fastTS(LakeHuron)
fit_LH
#> An endogenous PACF-based fastTS model.
#>
#> PF_gamma AICc_d BIC_d
#> 0.00 4.17 6.56
#> 0.25 3.34 3.54
#> 0.50 2.98 3.22
#> 1.00 1.06 1.24
#> 2.00 *0* *0*
#> 4.00 6.79 2.9
#> 8.00 6.79 2.9
#> 16.00 6.79 2.9
#>
#> AICc_d and BIC_d are the difference from the minimum; *0* is best.
#>
#> - Best AICc model: 4 active terms
#> - Best BIC model: 4 active terms
#>
#> Test-set prediction accuracy (20% held-out test set)
#> rmse rsq mae
#> AICc 0.7751 0.6043019 0.5888855
#> BIC 0.7751 0.6043019 0.5888855
coef(fit_LH)
#> 0.00069
#> (Intercept) 111.8740292
#> lag1 1.1003545
#> lag2 -0.4732437
#> lag3 0.1796316
#> lag4 0.0000000
#> lag5 0.0000000
#> lag6 0.0000000
#> lag7 0.0000000
#> lag8 0.0000000
#> lag9 0.0000000
plot(fit_LH)