Preprocess & create a model matrix with interactions + polynomials
Source:R/sparseR_prep.R
sparseR_prep.Rd
Preprocess & create a model matrix with interactions + polynomials
Arguments
- formula
A formula of the main effects + outcome of the model
- data
A required data frame or tibble containing the variables in
formula
- k
Maximum order of interactions to numeric variables
- poly
the maximum order of polynomials to consider
- pre_proc_opts
A character vector specifying methods for preprocessing (see details)
- ia_formula
formula to be passed to step_interact (for interactions, see details)
- filter
which methods should be used to filter out variables with (near) zero variance? (see details)
- extra_opts
extra options to be used for preprocessing
- family
family passed from sparseR
Value
an object of class recipe
; see recipes::recipe()
Details
The pre_proc_opts acts as a wrapper for the corresponding procedures in the
recipes
package. The currently supported options that can be passed to
pre_proc_opts are: knnImpute: Should k-nearest-neighbors be performed (if
necessary?) scale: Should variables be scaled prior to creating interactions
(does not scale factor variables or dummy variables) center: Should variables
be centered (will not center factor variables or dummy variables ) otherbin:
ia_formula
will by default interact all variables with each other up
to order k. If specified, ia_formula will be passed as the terms
argument
to recipes::step_interact
, so the help documentation for that function
can be investigated for further assistance in specifying specific
interactions.
The methods specified in filter are important; filtering is necessary to cut down on extraneous polynomials and interactions (in cases where they really don't make sense). This is true, for instance, when using dummy variables in polynomials , or when using interactions of dummy variables that relate to the same categorical variable.