• Working on new function, bestLogConstant, that uses the same machinery to pick the best value of a constant to use when logging a variable, e.g. the one that makes the distribution look the most normal, especially useful for non-positive or zero-inflated data.
  • Add S3 methods that helps step_orderNorm() to work with parallel processing.
  • Add S3 methods that helps step_best_normalize() to work with parallel processing.
  • Add a new transformation: the double reversed log (@rempsyc #18)
  • Fix issues in CRAN checks
  • updating print functionality to remain compatible with recipes.
  • updated term selection machinery to remain compatible with recipes.
  • improving scalability of boxcox in response to issue 10; thank you to Krzysztof Dyba (kadyb) for the suggestions.
  • improved scalability of yeojohnson, thanks to Emil Hvitfeldt (EmilHvitfeldt) for his work on this problem for the recipes package here.
  • updated tests to remain compatible with new recipes package (>0.1.16)
  • update citation (new R Journal publication!)
  • fix/add features to tidy method to work more generally, provide easy access to chosen transformations (responding to issue 9)
  • added packagedown website here: https://petersonr.github.io/bestNormalize
  • Implemented GH actions (code coverage and R CMD check) via usethis in response to issue 7
  • Improved scalability of ORQ transformation via n_logit_fit argument, with default of 10000. This should substantially decrease memory use of orderNorm while only minimally affecting the out-of-domain approximations.
  • Updated documentation
  • changed step_bestNormalize to step_best_normalize, responding to 8
  • Fixed error in documentation regarding LambertW transformation types (thank you to Georg M. Goerg, the author of LambertW, for pointing this out).
  • Add center_scale transform as default when standardize == TRUE
  • Added error when trying to use repeated CV with much too small of folds
  • Changed a few T and F to TRUE and FALSE
  • Added documentation of how one can use scales and ggplot2 to visualize all transformations.
  • Added butcher and axe functionality in order to improve scalability of step_* functions
  • Improved tidy functionality with bestNormalize and step_best_normalize
  • Fixed bug that was causing simple transforms to fail in bestNormalize
  • Updated to new LambertW version in dependencies (request from CRAN)
  • Added ability to supply user-defined transformations and associated vignette
  • Added in ability to supply user-defined normalization statistics and (the same) associated vignette
  • Take out standardize option from no_transform so x.t always matches input vector.
  • Minor programming improvements
  • Added step_bestNormalize and step_orderNorm functions for implementation within recipes.
  • Changed default to warn = FALSE when calling bestNormalize. If a transformation doesn’t work, warnings will no longer be shown by default unless warn is set to TRUE.
  • Allow options to be passed through bestNormalize to specific transformation functions
  • Slight bug fix to square root transformation (a = 0 by default, not .001)
  • Slight bug fix in the “quiet” argument for bestNormalize with LOO
  • Slight bug fix to plot.bestNormalize which was improperly labeling transformations
  • exp_x having trouble with standardize option, so added option allow_exp_x to bestNormalize to allow a workaround, and changed it so if any infinite values are produced during the transformation, exp_x will not work (that way, bestNormalize will not include this in its results).
  • Progress bar will now only displayed if quiet is FALSE and length(x) > 2000
  • Update citation to point to newly published work.
  • Update maintainer email to new address (same person, new affiliation).
  • Correctly subtract 1/2 from ranks in ORQ transformation to make quantile estimation unbiased (this was a bug in 1.3.0, as ranks start at 1, not zero). Divides by n instead of n+1.
  • Specify the weights for the GLM in the ORQ transformation to be the number of observations. This doesn’t change the transformation but seems to have a bit faster computational speed, and it’s more mathematically tractable.
  • Other various bug fixes to tests and to plotting functions.
  • Add 1/2 to ranks in ORQ transformation to make quantile estimation unbiased (should have minimal impact)
  • Add option loo for leave-one-out cross-validation
  • Add progress bar for cross-validation methods (both with/without parallel)
  • Add “no_transform” function - does the same thing as I(x) but in the syntax of other transformations (this allows the normalization statistics to also be calculated if no transformation is performed).
  • Add support for lambert transforms of type “h” in the bestNormalize function via allow_lambert_h argument.
  • Add “before standardization” to printout of different transforms’ means and sds to clarify output
  • Added other transformations commonly used to normalize a vector
    • exponential, log, square root, arcsinh
  • Lambert WxF is no longer done by default by bestNormalize since it is unstable on certain OS (Linux, Solaris), and does not abide by the CRAN policy.
  • Clarified that the transformations are standardized by default, and providing option to not standardize in transformations
  • Updated tests to run a bit faster and to use proper S3 classes
  • Added references for original papers (Van der Waerden, Bartlett) that cite the basis for the orderNorm transformation, as well as discussion in Beasley (2009)
  • Edited description to clarify that this procedure is a new adaptation of an older technique rather than a new technique in itself
  • Added feature to estimate out-of-sample normality statistics in bestNormalize instead of in-sample ones via repeated cross-validation

    • Note: set out_of_sample = FALSE to maintain backward-compatibility with prior versions and set allow_orderNorm = FALSE as well so that it isn’t automatically selected
  • Improved extrapolation of the ORQ (orderNorm) method

    • Instead of linear extrapolation, it uses binomial (logit-link) model on ranks
    • No more issues with Cauchy transformation
  • Added plotting feature for transformation objects

  • Cleared up some documentation

  • Changed the name of the orderNorm technique to “Ordered Quantile normalization”.
  • Made description more clear in response to comments from CRAN

First submission to CRAN