Fit a ranked-sparsity model with regularized regression
Usage
sparseR(
formula,
data,
family = c("gaussian", "binomial", "poisson", "coxph"),
penalty = c("lasso", "MCP", "SCAD"),
alpha = 1,
ncvgamma = 3,
lambda.min = 0.005,
k = 1,
poly = 1,
gamma = 0.5,
cumulative_k = FALSE,
cumulative_poly = TRUE,
pool = FALSE,
ia_formula = NULL,
pre_process = TRUE,
model_matrix = NULL,
y = NULL,
poly_prefix = "_poly_",
int_sep = "\\:",
pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
filter = c("nzv", "zv"),
extra_opts = list(),
...
)
Arguments
- formula
Names of the terms
- data
Data
- family
The family of the model
- penalty
What penalty should be used (lasso, MCP, or SCAD)
- alpha
The mix of L1 penalty (lower values introduce more L2 ridge penalty)
- ncvgamma
The tuning parameter for ncvreg (for MCP or SCAD)
- lambda.min
The minimum value to be used for lambda (as ratio of max, see ?ncvreg)
- k
The maximum order of interactions to consider
- poly
The maximum order of polynomials to consider
- gamma
The degree of extremity of sparsity rankings (see details)
- cumulative_k
Should penalties be increased cumulatively as order interaction increases?
- cumulative_poly
Should penalties be increased cumulatively as order polynomial increases?
- pool
Should interactions of order k and polynomials of order k+1 be pooled together for calculating the penalty?
- ia_formula
formula to be passed to step_interact (for interactions, see details)
- pre_process
Should the data be preprocessed (if FALSE, must provide model_matrix)
- model_matrix
A data frame or matrix specifying the full model matrix (used if !pre_process)
- y
A vector of responses (used if !pre_process)
- poly_prefix
If model_matrix is specified, what is the prefix for polynomial terms?
- int_sep
If model_matrix is specified, what is the separator for interaction terms?
- pre_proc_opts
List of preprocessing steps (see details)
- filter
The type of filter applied to main effects + interactions
- extra_opts
A list of options for all preprocess steps (see details)
- ...
Additional arguments (passed to fitting function)
Value
an object of class sparseR
containing the following:
- fit
the fit object returned by
ncvreg
- srprep
a
recipes
object used to prep the data- pen_factors
the factor multiple on penalties for ranked sparsity
- results
all coefficients and penalty factors at minimum CV lambda
- results_summary
a tibble of summary results at minimum CV lambda
- results1se
all coefficients and penalty factors at lambda_1se
- results1se_summary
a tibble of summary results at lambda_1se
- data
the (unprocessed) data
- family
the family argument (for non-normal, eg. poisson)
- info
a list containing meta-info about the procedure
Details
Selecting gamma
: higher values of gamma will penalize "group" size more. By
default, this is set to 0.5, which yields equal contribution of prior
information across orders of interactions/polynomials (this is a good
default for most settings).
Additionally, setting cumulative_poly
or cumulative_k
to TRUE
increases
the penalty cumulatively based on the order of either polynomial or
interaction.
The options that can be passed to pre_proc_opts
are: - knnImpute (should
missing data be imputed?) - scale (should data be standardized)? - center
(should data be centered to the mean or another value?) - otherbin (should
factors with low prevalence be combined?) - none (should no preprocessing be
done? can also specify a null object)
The options that can be passed to extra_opts
are: - centers (named numeric
vector which denotes where each covariate should be centered) - center_fn
(alternatively, a function can be specified to calculate center such as min
or median
) - freq_cut, unique_cut (see ?step_nzv - these get used by the
filtering steps) - neighbors (the number of neighbors for knnImpute) -
one_hot (see ?step_dummy), this defaults to cell-means coding which can be
done in regularized regression (change at your own risk) - raw (should
polynomials not be orthogonal? defaults to true because variables are
centered and scaled already by this point by default)
ia_formula
will by default interact all variables with each other up
to order k. If specified, ia_formula will be passed as the terms
argument
to recipes::step_interact
, so the help documentation for that function
can be investigated for further assistance in specifying specific
interactions.