Fit a ranked-sparsity model with regularized regression

## Usage

```
sparseR(
formula,
data,
family = c("gaussian", "binomial", "poisson", "coxph"),
penalty = c("lasso", "MCP", "SCAD"),
alpha = 1,
ncvgamma = 3,
lambda.min = 0.005,
k = 1,
poly = 1,
gamma = 0.5,
cumulative_k = FALSE,
cumulative_poly = TRUE,
pool = FALSE,
ia_formula = NULL,
pre_process = TRUE,
model_matrix = NULL,
y = NULL,
poly_prefix = "_poly_",
int_sep = "\\:",
pre_proc_opts = c("knnImpute", "scale", "center", "otherbin", "none"),
filter = c("nzv", "zv"),
extra_opts = list(),
...
)
```

## Arguments

- formula
Names of the terms

- data
Data

- family
The family of the model

- penalty
What penalty should be used (lasso, MCP, or SCAD)

- alpha
The mix of L1 penalty (lower values introduce more L2 ridge penalty)

- ncvgamma
The tuning parameter for ncvreg (for MCP or SCAD)

- lambda.min
The minimum value to be used for lambda (as ratio of max, see ?ncvreg)

- k
The maximum order of interactions to consider

- poly
The maximum order of polynomials to consider

- gamma
The degree of extremity of sparsity rankings (see details)

- cumulative_k
Should penalties be increased cumulatively as order interaction increases?

- cumulative_poly
Should penalties be increased cumulatively as order polynomial increases?

- pool
Should interactions of order k and polynomials of order k+1 be pooled together for calculating the penalty?

- ia_formula
formula to be passed to step_interact (for interactions, see details)

- pre_process
Should the data be preprocessed (if FALSE, must provide model_matrix)

- model_matrix
A data frame or matrix specifying the full model matrix (used if !pre_process)

- y
A vector of responses (used if !pre_process)

- poly_prefix
If model_matrix is specified, what is the prefix for polynomial terms?

- int_sep
If model_matrix is specified, what is the separator for interaction terms?

- pre_proc_opts
List of preprocessing steps (see details)

- filter
The type of filter applied to main effects + interactions

- extra_opts
A list of options for all preprocess steps (see details)

- ...
Additional arguments (passed to fitting function)

## Value

an object of class `sparseR`

containing the following:

- fit
the fit object returned by

`ncvreg`

- srprep
a

`recipes`

object used to prep the data- pen_factors
the factor multiple on penalties for ranked sparsity

- results
all coefficients and penalty factors at minimum CV lambda

- results_summary
a tibble of summary results at minimum CV lambda

- results1se
all coefficients and penalty factors at lambda_1se

- results1se_summary
a tibble of summary results at lambda_1se

- data
the (unprocessed) data

- family
the family argument (for non-normal, eg. poisson)

- info
a list containing meta-info about the procedure

## Details

Selecting `gamma`

: higher values of gamma will penalize "group" size more. By
default, this is set to 0.5, which yields equal contribution of prior
information across orders of interactions/polynomials (this is a good
default for most settings).

Additionally, setting `cumulative_poly`

or `cumulative_k`

to `TRUE`

increases
the penalty cumulatively based on the order of either polynomial or
interaction.

The options that can be passed to `pre_proc_opts`

are: - knnImpute (should
missing data be imputed?) - scale (should data be standardized)? - center
(should data be centered to the mean or another value?) - otherbin (should
factors with low prevalence be combined?) - none (should no preprocessing be
done? can also specify a null object)

The options that can be passed to `extra_opts`

are: - centers (named numeric
vector which denotes where each covariate should be centered) - center_fn
(alternatively, a function can be specified to calculate center such as `min`

or `median`

) - freq_cut, unique_cut (see ?step_nzv - these get used by the
filtering steps) - neighbors (the number of neighbors for knnImpute) -
one_hot (see ?step_dummy), this defaults to cell-means coding which can be
done in regularized regression (change at your own risk) - raw (should
polynomials not be orthogonal? defaults to true because variables are
centered and scaled already by this point by default)

`ia_formula`

will by default interact all variables with each other up
to order k. If specified, ia_formula will be passed as the `terms`

argument
to `recipes::step_interact`

, so the help documentation for that function
can be investigated for further assistance in specifying specific
interactions.