`step_center_to` generalizes `step_center` to allow for a different function than the `mean` function to calculate centers. It creates a *specification* of a recipe step that will normalize numeric data to have a 'center' of zero.
Usage
step_center_to(
recipe,
...,
role = NA,
trained = FALSE,
centers = NULL,
center_fn = mean,
na_rm = TRUE,
skip = FALSE,
id = rand_id("center_to")
)
# S3 method for class 'step_center_to'
tidy(x, ...)
Arguments
- recipe
A recipe object. The step will be added to the sequence of operations for this recipe.
- ...
One or more selector functions to choose which variables are affected by the step. See [selections()] for more details. For the `tidy` method, these are not currently used.
- role
Not used by this step since no new variables are created.
- trained
A logical to indicate if the quantities for preprocessing have been estimated.
- centers
A named numeric vector of centers. This is `NULL` until computed by [prep.recipe()] (or it can be specified as a named numeric vector as well?).
- center_fn
a function to be used to calculate where the center should be
- na_rm
A logical value indicating whether `NA` values should be removed during computations.
- skip
A logical. Should the step be skipped when the recipe is baked by [bake.recipe()]? While all operations are baked when [prep.recipe()] is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using `skip = TRUE` as it may affect the computations for subsequent operations
- id
A character string that is unique to this step to identify it.
- x
A `step_center_to` object.
Value
An updated version of `recipe` with the new step added to the sequence of existing steps (if any). For the `tidy` method, a tibble with columns `terms` (the selectors or variables selected) and `value` (the centers).
Details
Centering data means that the average of a variable is subtracted from the data. `step_center_to` estimates the variable centers from the data used in the `training` argument of `prep.recipe`. `bake.recipe` then applies the centering to new data sets using these centers.
Examples
data(biomass, package = "modeldata")
biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]
rec <- recipes::recipe(
HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur,
data = biomass_tr)
center_trans <- rec %>%
step_center_to(carbon, contains("gen"), -hydrogen)
center_obj <- recipes::prep(center_trans, training = biomass_tr)
transformed_te <- recipes::bake(center_obj, biomass_te)
biomass_te[1:10, names(transformed_te)]
#> carbon hydrogen oxygen nitrogen sulfur HHV
#> 15 46.35 5.67 47.20 0.30 0.22 18.275
#> 20 43.25 5.50 48.06 2.85 0.34 17.560
#> 26 42.70 5.50 49.10 2.40 0.30 17.173
#> 31 46.40 6.10 37.30 1.80 0.50 18.851
#> 36 48.76 6.32 42.77 0.20 0.00 20.547
#> 41 44.30 5.50 41.70 0.70 0.20 18.467
#> 46 38.94 5.23 54.13 1.19 0.51 15.095
#> 51 42.10 4.66 33.80 0.95 0.20 16.240
#> 55 29.20 4.40 31.10 0.14 4.90 11.147
#> 65 27.80 3.77 23.69 4.63 1.05 10.750
transformed_te
#> # A tibble: 80 × 6
#> carbon hydrogen oxygen nitrogen sulfur HHV
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -2.00 5.67 8.68 -0.775 0.22 18.3
#> 2 -5.10 5.5 9.54 1.78 0.34 17.6
#> 3 -5.65 5.5 10.6 1.33 0.3 17.2
#> 4 -1.95 6.1 -1.22 0.725 0.5 18.9
#> 5 0.406 6.32 4.25 -0.875 0 20.5
#> 6 -4.05 5.5 3.18 -0.375 0.2 18.5
#> 7 -9.41 5.23 15.6 0.115 0.51 15.1
#> 8 -6.25 4.66 -4.72 -0.125 0.2 16.2
#> 9 -19.2 4.4 -7.42 -0.935 4.9 11.1
#> 10 -20.6 3.77 -14.8 3.56 1.05 10.8
#> # ℹ 70 more rows
recipes::tidy(center_trans)
#> # A tibble: 1 × 6
#> number operation type trained skip id
#> <int> <chr> <chr> <lgl> <lgl> <chr>
#> 1 1 step center_to FALSE FALSE center_to_SwlKL
recipes::tidy(center_obj)
#> # A tibble: 1 × 6
#> number operation type trained skip id
#> <int> <chr> <chr> <lgl> <lgl> <chr>
#> 1 1 step center_to TRUE FALSE center_to_SwlKL