Performs a suite of normalizing transformations, and selects the best one on the basis of the Pearson P test statistic for normality. The transformation that has the lowest P (calculated on the transformed data) is selected. See details for more information.

```
bestNormalize(
x,
standardize = TRUE,
allow_orderNorm = TRUE,
allow_lambert_s = FALSE,
allow_lambert_h = FALSE,
allow_exp = TRUE,
out_of_sample = TRUE,
cluster = NULL,
k = 10,
r = 5,
loo = FALSE,
warn = FALSE,
quiet = FALSE,
tr_opts = list(),
new_transforms = list(),
norm_stat_fn = NULL,
...
)
# S3 method for bestNormalize
predict(object, newdata = NULL, inverse = FALSE, ...)
# S3 method for bestNormalize
print(x, ...)
# S3 method for bestNormalize
tidy(x, ...)
```

- x
A `bestNormalize` object.

- standardize
If TRUE, the transformed values are also centered and scaled, such that the transformation attempts a standard normal. This will not change the normality statistic.

- allow_orderNorm
set to FALSE if orderNorm should not be applied

- allow_lambert_s
Set to FALSE if the lambertW of type "s" should not be applied (see details). Expect about 2-3x elapsed computing time if TRUE.

- allow_lambert_h
Set to TRUE if the lambertW of type "h" should be applied (see details). Expect about 2-3x elapsed computing time.

- allow_exp
Set to TRUE if the exponential transformation should be applied (sometimes this will cause errors with heavy right skew)

- out_of_sample
if FALSE, estimates quickly in-sample performance

- cluster
name of cluster set using

`makeCluster`

- k
number of folds

- r
number of repeats

- loo
should leave-one-out CV be used instead of repeated CV? (see details)

- warn
Should bestNormalize warn when a method doesn't work?

- quiet
Should a progress-bar not be displayed for cross-validation progress?

- tr_opts
a list (of lists), specifying options to be passed to each transformation (see details)

- new_transforms
a named list of new transformation functions and their predict methods (see details)

- norm_stat_fn
if specified, a function to calculate to assess normality (default is the Pearson chi-squared statistic divided by its d.f.)

- ...
not used

- object
an object of class 'bestNormalize'

- newdata
a vector of data to be (reverse) transformed

- inverse
if TRUE, performs reverse transformation

A list of class `bestNormalize`

with elements

- x.t
transformed original data

- x
original data

- norm_stats
Pearson's Pearson's P / degrees of freedom

- method
out-of-sample or in-sample, number of folds + repeats

- chosen_transform
the chosen transformation (of appropriate class)

- other_transforms
the other transformations (of appropriate class)

- oos_preds
Out-of-sample predictions (if loo == TRUE) or normalization stats

The `predict`

function returns the numeric value of the transformation
performed on new data, and allows for the inverse transformation as well.

`bestNormalize`

estimates the optimal normalizing
transformation. This transformation can be performed on new data, and
inverted, via the `predict`

function.

This function currently estimates the Yeo-Johnson transformation,
the Box Cox transformation (if the data is positive), the log_10(x+a)
transformation, the square-root (x+a) transformation, and the arcsinh
transformation. a is set to max(0, -min(x) + eps) by default. If
allow_orderNorm == TRUE and if out_of_sample == FALSE then the ordered
quantile normalization technique will likely be chosen since it essentially
forces the data to follow a normal distribution. More information on the
orderNorm technique can be found in the package vignette, or using
`?orderNorm`

.

Repeated cross-validation is used by default to estimate the out-of-sample
performance of each transformation if out_of_sample = TRUE. While this can
take some time, users can speed it up by creating a cluster via the
`parallel`

package's `makeCluster`

function, and passing the name
of this cluster to `bestNormalize`

via the cl argument. For best
performance, we recommend the number of clusters to be set to the number of
repeats r. Care should be taken to account for the number of observations
per fold; too small a number and the estimated normality statistic could be
inaccurate, or at least suffer from high variability.

As of version 1.3, users can use leave-one-out cross-validation as well for
each method by setting `loo`

to `TRUE`

. This will take a lot of
time for bigger vectors, but it will have the most accurate estimate of
normalization efficacy. Note that if this method is selected, arguments
`k, r`

are ignored. This method will still work in parallel with the
`cl`

argument.

Note that the Lambert transformation of type "h" can be done by setting allow_lambert_h = TRUE, however this can take significantly longer to run.

Use `tr_opts`

in order to set options for each transformation. For
instance, if you want to overide the default a selection for `log_x`

,
set `tr_opts$log_x = list(a = 1)`

.

See the package's vignette on how to use custom functions with
bestNormalize. All it takes is to create an S3 class and predict method for
the new transformation and load it into the environment, then the new
custom function (and its predict method) can be passed to bestNormalize
with `new_transform`

.

```
x <- rgamma(100, 1, 1)
if (FALSE) {
# With Repeated CV
BN_obj <- bestNormalize(x)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)
all.equal(x2, x)
}
if (FALSE) {
# With leave-one-out CV
BN_obj <- bestNormalize(x, loo = TRUE)
BN_obj
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)
all.equal(x2, x)
}
# Without CV
BN_obj <- bestNormalize(x, allow_orderNorm = FALSE, out_of_sample = FALSE)
BN_obj
#> Best Normalizing transformation with 100 Observations
#> Estimated Normality Statistics (Pearson P / df, lower => more normal):
#> - arcsinh(x): 3.624
#> - Box-Cox: 0.712
#> - Center+scale: 7.68
#> - Double Reversed Log_b(x+a): 15.376
#> - Exp(x): 34.33
#> - Log_b(x+a): 0.582
#> - sqrt(x + a): 1.544
#> - Yeo-Johnson: 1.544
#> Estimation method: In-sample
#>
#> Based off these, bestNormalize chose:
#> Standardized Log_b(x + a) Transformation with 100 nonmissing obs.:
#> Relevant statistics:
#> - a = 0
#> - b = 10
#> - mean (before standardization) = -0.2058717
#> - sd (before standardization) = 0.5173331
p <- predict(BN_obj)
x2 <- predict(BN_obj, newdata = p, inverse = TRUE)
all.equal(x2, x)
#> [1] TRUE
```