recipes
implementationR/step_orderNorm.R
step_orderNorm.Rd
`step_orderNorm` creates a specification of a recipe step (see `recipes` package) that will transform data using the ORQ (orderNorm) transformation, which approximates the "true" normalizing transformation if one exists. This is considerably faster than `step_bestNormalize`.
A formula or recipe
One or more selector functions to choose which variables are affected by the step. See [selections()] for more details. For the `tidy` method, these are not currently used.
Not used by this step since no new variables are created.
For recipes functionality
A numeric vector of transformation values. This (was transform_info) is `NULL` until computed by [prep.recipe()].
options to be passed to orderNorm
An integer where data that have less possible values will not be evaluate for a transformation.
For recipes functionality
For recipes functionality
A `step_orderNorm` object.
An updated version of `recipe` with the new step added to the sequence of existing steps (if any). For the `tidy` method, a tibble with columns `terms` (the selectors or variables selected) and `value` (the lambda estimate).
The orderNorm transformation can be used to rescale a variable to be more similar to a normal distribution. See `?orderNorm` for more information; `step_orderNorm` is the implementation of `orderNorm` in the `recipes` context.
As of version 1.7, the `butcher` package can be used to (hopefully) improve scalability of this function on bigger data sets.
Ryan A. Peterson (2019). Ordered quantile normalization: a semiparametric transformation built for the cross-validation era. Journal of Applied Statistics, 1-16.
orderNorm
bestNormalize
,
[recipe()] [prep.recipe()] [bake.recipe()]
library(recipes)
rec <- recipe(~ ., data = as.data.frame(iris))
orq_trans <- step_orderNorm(rec, all_numeric())
orq_estimates <- prep(orq_trans, training = as.data.frame(iris))
orq_data <- bake(orq_estimates, as.data.frame(iris))
plot(density(iris[, "Petal.Length"]), main = "before")
plot(density(orq_data$Petal.Length), main = "after")
tidy(orq_trans, number = 1)
#> # A tibble: 1 × 3
#> terms value id
#> <chr> <dbl> <chr>
#> 1 all_numeric() NA orderNorm_EVEeP
tidy(orq_estimates, number = 1)
#> # A tibble: 4 × 3
#> terms value id
#> <chr> <named list> <chr>
#> 1 Sepal.Length <orderNrm> orderNorm_EVEeP
#> 2 Sepal.Width <orderNrm> orderNorm_EVEeP
#> 3 Petal.Length <orderNrm> orderNorm_EVEeP
#> 4 Petal.Width <orderNrm> orderNorm_EVEeP