-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Dear Dr. Wu,
It is very common that the estimation errors between outcome and exposure GWASs are correlated, especially due to sample overlap. This introduces a second source of bias in Mendelian randomization beyond the usual winner’s curse, and it also affects the Rao-Blackwellization procedure. I would like to raise the following points:
- Joint distribution of hat(Gamma_j) and hat(gamma_j)
Suppose for a given SNP j, the GWAS estimators for outcome and exposure follow a bivariate normal distribution:
(hat_Gamma_j, hat_gamma_j) ~ N( (Gamma_j, gamma_j), Σ )
where the covariance matrix Σ is:
Σ = [ Var(hat_Gamma_j) Cov(hat_Gamma_j, hat_gamma_j) ]
[ Cov(hat_Gamma_j, hat_gamma_j) Var(hat_gamma_j) ]
That is,
Σ = [ sigma_Gamma^2 rho * sigma_Gamma * sigma_gamma ]
[ rho * sigma_Gamma * sigma_gamma sigma_gamma^2 ]
Here, rho represents the correlation between the estimation errors, typically induced by sample overlap.
- Selection region S is implicitly conditioned on hat(Gamma_j)
In the Rao-Blackwell procedure, the selection of instruments is based on randomized values of hat(gamma_j). However, since hat(Gamma_j) is correlated with hat(gamma_j), conditioning only on hat(gamma_j) (and simulating based on it) does not remove the selection bias in estimating Gamma_j. This violates the key assumption that the selection event S is independent of hat(Gamma_j) given hat(gamma_j).
- Empirical illustration
The first boxplot shows causal effect estimates using all true IVs (true value = 0.2).
The second boxplot uses only IVs selected at p < 0.05.
The third boxplot applies Rao-Blackwellization to the selected IVs.
The estimator used is MRBEE (a debiased univariable/multivariable MR method). In this simulation, the correlation between exposure and outcome GWAS estimation errors is rho = 0.5, which introduces substantial bias. Only when this correlation is removed does the RB-corrected estimate become unbiased.
- Implications for Rao-Blackwellization in practice
One possible solution is to correct both hat(gamma_j) and hat(Gamma_j) simultaneously under their joint conditional distribution given the selection event. However, this becomes challenging in multivariable MR, because:
Each exposure selects its own set of instruments;
The outcome model is fitted on the union of all selected instruments;
There is no unique way to determine how to conditionally adjust hat(Gamma_j) unless we know which exposure’s selection caused its inclusion.
Best regards,
Yihe Yang
