Fit a ranked-sparsity model with regularized regression
Usage
sparseR(
formula,
data,
family = c("gaussian", "binomial", "poisson", "coxph"),
penalty = c("lasso", "MCP", "SCAD"),
alpha = 1,
ncvgamma = 3,
lambda.min = 0.005,
k = 1,
poly = 2,
gamma = 0.5,
cumulative_k = FALSE,
cumulative_poly = TRUE,
pool = FALSE,
ia_formula = NULL,
pre_process = TRUE,
model_matrix = NULL,
y = NULL,
poly_prefix = "_poly_",
int_sep = "\\:",
pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
filter = c("nzv", "zv"),
extra_opts = list(),
...
)
Arguments
- formula
Names of the terms
- data
Data
- family
The family of the model
- penalty
What penalty should be used (lasso, MCP, or SCAD)
- alpha
The mix of L1 penalty (lower values introduce more L2 ridge penalty)
- ncvgamma
The tuning parameter for ncvreg (for MCP or SCAD)
- lambda.min
The minimum value to be used for lambda (as ratio of max, see ?ncvreg)
- k
The maximum order of interactions to consider (default: 1; all pairwise)
- poly
The maximum order of polynomials to consider (default: 2)
- gamma
The degree of extremity of sparsity rankings (see details)
- cumulative_k
Should penalties be increased cumulatively as order interaction increases?
- cumulative_poly
Should penalties be increased cumulatively as order polynomial increases?
- pool
Should interactions of order k and polynomials of order k+1 be pooled together for calculating the penalty?
- ia_formula
formula to be passed to step_interact (for interactions, see details)
- pre_process
Should the data be preprocessed (if FALSE, must provide model_matrix)
- model_matrix
A data frame or matrix specifying the full model matrix (used if !pre_process)
- y
A vector of responses (used if !pre_process)
- poly_prefix
If model_matrix is specified, what is the prefix for polynomial terms?
- int_sep
If model_matrix is specified, what is the separator for interaction terms?
- pre_proc_opts
List of preprocessing steps (see details)
- filter
The type of filter applied to main effects + interactions
- extra_opts
A list of options for all preprocess steps (see details)
- ...
Additional arguments (passed to fitting function)
Value
an object of class sparseR
containing the following:
- fit
the fit object returned by
ncvreg
- srprep
a
recipes
object used to prep the data- pen_factors
the factor multiple on penalties for ranked sparsity
- results
all coefficients and penalty factors at minimum CV lambda
- results_summary
a tibble of summary results at minimum CV lambda
- results1se
all coefficients and penalty factors at lambda_1se
- results1se_summary
a tibble of summary results at lambda_1se
- data
the (unprocessed) data
- family
the family argument (for non-normal, eg. poisson)
- info
a list containing meta-info about the procedure
Details
Selecting gamma
: higher values of gamma will penalize "group" size more. By
default, this is set to 0.5, which yields equal contribution of prior
information across orders of interactions/polynomials (this is a good
default for most settings).
Additionally, setting cumulative_poly
or cumulative_k
to TRUE
increases
the penalty cumulatively based on the order of either polynomial or
interaction.
The options that can be passed to pre_proc_opts
are: - knnImpute (should
missing data be imputed?) - scale (should data be standardized)? - center
(should data be centered to the mean or another value?) - otherbin (should
factors with low prevalence be combined?) - none (should no preprocessing be
done? can also specify a null object)
The options that can be passed to extra_opts
are:
centers (named numeric vector which denotes where each covariate should be centered)
center_fn (alternatively, a function can be specified to calculate center such as
min
ormedian
)freq_cut, unique_cut (see ?step_nzv; these get used by the filtering steps)
neighbors (the number of neighbors for knnImpute)
one_hot (see ?step_dummy), this defaults to cell-means coding which can be done in regularized regression (change at your own risk)
raw (should polynomials not be orthogonal? defaults to true because variables are centered and scaled already by this point by default)
ia_formula
will by default interact all variables with each other up
to order k. If specified, ia_formula will be passed as the terms
argument
to recipes::step_interact
, so the help documentation for that function
can be investigated for further assistance in specifying specific
interactions.