Distribution-Free Effect Size Estimation (`d_reg`)

A Robust Alternative to Cohen’s d via Polynomial Regularization

Authors: Wolfgang Lenhard & Alexandra Lenhard
License: MIT

Overview

The standardized mean difference, commonly known as Cohen’s d, is a cornerstone of quantitative research. However, its accuracy is heavily compromised by violations of normality, heavy tails, and outliers—issues prevalent in behavioral and social sciences.

d_reg is a novel, robust estimator that maintains the intuitive interpretation of Cohen's d while being resilient to distributional anomalies. Instead of calculating moments (mean and variance) directly from raw data, d_reg employs a regularization process:

Transformation: Data is transformed to standard normal quantiles via rank-based inverse normal transformation.
Smoothing: A polynomial regression is fitted to the inverse quantile function.
Derivation: Moments are derived analytically from the polynomial coefficients.

Simulation studies show that d_reg outperforms Cohen's d in 95.3% of tested conditions (varying distribution, sample size, and effect size) in terms of Mean Squared Error (MSE), with the most significant benefits seen in sample sizes $n < 100$ and non-normal distributions.

Installation

Currently, this method is implemented as an R script. To use it, simply download the source code and source it into your R environment.

Download R/regularizedD.R from this repository.
Load it into your R session:

source("R/regularizedD.R")

Usage

The main function is d.reg(). It accepts two numeric vectors representing the groups to be compared.

Basic Example

# Generate sample data (Normal distribution)
set.seed(123)
control_group <- rnorm(30, mean = 100, sd = 15)
treatment_group <- rnorm(30, mean = 108, sd = 15)

# Calculate d_reg
result <- d.reg(control_group, treatment_group)

# Print results
print(result)

# Get detailed summary
summary(result)

Advanced Options

You can specify the method type, polynomial degree, and confidence intervals:

# Use the combined method (switches to Cohen's d for very large N)
# and calculate a 95% Confidence Interval
result <- d.reg(control_group, treatment_group, 
                type = "combined", 
                degree = 5, 
                CI = 0.95)

Statistical Derivation

The robustness of d_reg stems from the smoothing inherent in polynomial approximation, which acts as a form of regularization. The derivation proceeds in four steps:

1. Rank-Based Inverse Normal Transformation

First, we transform the raw values $x_i$ into probabilities $p_i$ using their ranks (specifically, plotting positions to avoid boundaries), and then into standard normal $z$-scores:

$$ z_i = \Phi^{-1}(p_i) \approx \Phi^{-1}\left(\frac{\text{rank}(x_i) - 0.5}{n}\right) $$

where $\Phi^{-1}$ is the probit function (inverse CDF of the standard normal distribution).

2. Polynomial Approximation

By the Stone-Weierstrass approximation theorem, the inverse quantile function (mapping $z$ back to raw $x$) can be approximated by a polynomial of degree $k$. We fit this relationship:

$$ X \approx f(Z) = \beta_0 + \beta_1 Z + \beta_2 Z^2 + \dots + \beta_k Z^k = \sum_{j=0}^k \beta_j Z^j $$

where $Z \sim \mathcal{N}(0,1)$. The coefficients $\beta$ are estimated via Ordinary Least Squares (OLS) regression.

3. Analytical Moment Computation

Instead of calculating the mean and variance from the raw $x$ values (which may be noisy or influenced by outliers), we calculate the expected value and variance of the polynomial function $f(Z)$.

We utilize Isserlis' theorem (1918), which allows us to compute moments of the standard normal distribution directly: $$ E[Z^j] = \begin{cases} 0 & \text{if } j \text{ is odd} \ (j-1)!! & \text{if } j \text{ is even} \end{cases} $$ Note: $(j-1)!!$ denotes the double factorial (product of odd integers up to $j-1$).

The Mean ($\hat{\mu}$):

$$ \hat{\mu} = E[X] = E\left[\sum_{j=0}^k \beta_j Z^j\right] = \sum_{j=0}^k \beta_j E[Z^j] $$

Since odd moments of the standard normal distribution vanish, this simplifies to summing only even terms:

$$ \hat{\mu} = \beta_0 + \beta_2(1) + \beta_4(3) + \beta_6(15) + \dots $$

The Variance ($\hat{\sigma}^2$):

$$ \hat{\sigma}^2 = E[X^2] - (E[X])^2 $$

To find $E[X^2]$, we expand the square of the polynomial:

$$ E[X^2] = E\left[\left(\sum_{i=0}^k \beta_i Z^i\right)\left(\sum_{j=0}^k \beta_j Z^j\right)\right] = \sum_{i=0}^k \sum_{j=0}^k \beta_i \beta_j E[Z^{i+j}] $$

4. The Effect Size

Finally, we reassemble the effect size using the smoothed means and variances, preserving the scale and interpretation of Cohen's $d$:

$$ \hat{d}_{reg} = \frac{\hat{\mu}_2 - \hat{\mu}_1}{\hat{\sigma}_{pooled}} $$

Where $\hat{\sigma}_{pooled}$ is the weighted pooled standard deviation derived from the analytic variances calculated in Step 3.

Functions

d.reg(x1, x2, ...): The primary function to compute the robust effect size.
fit_polynomial(x, ...): Helper function that fits the quantile regression $x = f(z)$.
get_moments(model): Analytically derives mean and variance from the polynomial model.
check_monotonicity(model): Verifies that the fitted quantile function is monotonic.
d.cohen(x1, x2): Standard implementation of Cohen's d (Hedges' g or Glass's $\Delta$) included for comparison.

Citation

Lenhard, W., & Lenhard, A. (submitted). Distribution-Free Effect Size Estimation: A Robust Alternative to Cohen’s d.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
R		R
d.reg		d.reg
simulations		simulations
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Distribution-Free Effect Size Estimation (`d_reg`)

Overview

Installation

Usage

Basic Example

Advanced Options

Statistical Derivation

1. Rank-Based Inverse Normal Transformation

2. Polynomial Approximation

3. Analytical Moment Computation

4. The Effect Size

Functions

Citation

About

Uh oh!

Releases

Packages

Languages

License

WLenhard/dreg

Folders and files

Latest commit

History

Repository files navigation

Distribution-Free Effect Size Estimation (d_reg)

Overview

Installation

Usage

Basic Example

Advanced Options

Statistical Derivation

1. Rank-Based Inverse Normal Transformation

2. Polynomial Approximation

3. Analytical Moment Computation

4. The Effect Size

Functions

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Distribution-Free Effect Size Estimation (`d_reg`)

Packages