A Robust Alternative to Cohen’s d via Polynomial Regularization
Authors: Wolfgang Lenhard & Alexandra Lenhard
License: MIT
The standardized mean difference, commonly known as Cohen’s d, is a cornerstone of quantitative research. However, its accuracy is heavily compromised by violations of normality, heavy tails, and outliers—issues prevalent in behavioral and social sciences.
d_reg is a novel, robust estimator that maintains the intuitive interpretation of Cohen's d while being resilient to distributional anomalies. Instead of calculating moments (mean and variance) directly from raw data, d_reg employs a regularization process:
- Transformation: Data is transformed to standard normal quantiles via rank-based inverse normal transformation.
- Smoothing: A polynomial regression is fitted to the inverse quantile function.
- Derivation: Moments are derived analytically from the polynomial coefficients.
Simulation studies show that d_reg outperforms Cohen's d in 95.3% of tested conditions (varying distribution, sample size, and effect size) in terms of Mean Squared Error (MSE), with the most significant benefits seen in sample sizes
Currently, this method is implemented as an R script. To use it, simply download the source code and source it into your R environment.
- Download
R/regularizedD.Rfrom this repository. - Load it into your R session:
source("R/regularizedD.R")The main function is d.reg(). It accepts two numeric vectors representing the groups to be compared.
# Generate sample data (Normal distribution)
set.seed(123)
control_group <- rnorm(30, mean = 100, sd = 15)
treatment_group <- rnorm(30, mean = 108, sd = 15)
# Calculate d_reg
result <- d.reg(control_group, treatment_group)
# Print results
print(result)
# Get detailed summary
summary(result)You can specify the method type, polynomial degree, and confidence intervals:
# Use the combined method (switches to Cohen's d for very large N)
# and calculate a 95% Confidence Interval
result <- d.reg(control_group, treatment_group,
type = "combined",
degree = 5,
CI = 0.95)The robustness of d_reg stems from the smoothing inherent in polynomial approximation, which acts as a form of regularization. The derivation proceeds in four steps:
First, we transform the raw values
where
By the Stone-Weierstrass approximation theorem, the inverse quantile function (mapping
where
Instead of calculating the mean and variance from the raw
We utilize Isserlis' theorem (1918), which allows us to compute moments of the standard normal distribution directly: $$ E[Z^j] = \begin{cases} 0 & \text{if } j \text{ is odd} \ (j-1)!! & \text{if } j \text{ is even} \end{cases} $$ Note: $(j-1)!!$ denotes the double factorial (product of odd integers up to $j-1$).
The Mean (
Since odd moments of the standard normal distribution vanish, this simplifies to summing only even terms:
The Variance (
To find
Finally, we reassemble the effect size using the smoothed means and variances, preserving the scale and interpretation of Cohen's
Where
-
d.reg(x1, x2, ...): The primary function to compute the robust effect size. -
fit_polynomial(x, ...): Helper function that fits the quantile regression$x = f(z)$ . -
get_moments(model): Analytically derives mean and variance from the polynomial model. -
check_monotonicity(model): Verifies that the fitted quantile function is monotonic. -
d.cohen(x1, x2): Standard implementation of Cohen's d (Hedges' g or Glass's$\Delta$ ) included for comparison.
Lenhard, W., & Lenhard, A. (submitted). Distribution-Free Effect Size Estimation: A Robust Alternative to Cohen’s d.