tidying

NathanielF · NathanielF · commit 373f011d74cb · 2025-10-31T12:01:33.000Z
Signed-off-by: Nathaniel &lt;NathanielF@users.noreply.github.com&gt;
diff --git a/docs/source/knowledgebase/structural_causal_models.ipynb b/docs/source/knowledgebase/structural_causal_models.ipynb
@@ -89,7 +89,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -110,8 +110,8 @@
     "\n",
     "def simulate_data(n=2500, alpha_true=3.0, rho=0.6, cate_estimation=False):\n",
     "    # Exclusion restrictions:\n",
-    "    # X[0], X[1] affect both Y and D (confounders)\n",
-    "    # X[2], X[3] affect ONLY D (instruments for D)\n",
+    "    # X[0], X[1] affect both Y and T (confounders)\n",
+    "    # X[2], X[3] affect ONLY T (instruments for T)\n",
     "    # X[4] affects ONLY Y (predictor of Y only)\n",
     "\n",
     "    betaY = np.array([0.5, -0.3, 0.0, 0.0, 0.4, 0, 0, 0, 0])  # X[2], X[3] excluded\n",
@@ -1608,7 +1608,7 @@
     "\n",
     "#### Comparing Treatment Estimates\n",
     "\n",
-    "The comparison of models is a form of robustness checks. We want to inspect how consistent our parameter estimates are across different model specifications. "
+    "The comparison of models is a form of robustness checks. We want to inspect how consistent our parameter estimates are across different model specifications. Here we see how the strongly informative priors on $\\rho$ bias the treatment effect estimate.  "
    ]
   },
   {
@@ -1664,14 +1664,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In the plot we can see that the majority of models accurately estimate the true treatment effect $\\alpha$ except in the cases where we have explicitly placed an opinionated prior on the $\\rho$ parameter in the model. These priors pull the $\\alpha$ estimate away from the true data generating process. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Our Bayesian setup here is intentionally structural. We specify how both treatment and outcome arise from common covariates and latent confounding structures. However, the boundary between structural and reduced-form reasoning becomes fluid when we begin to treat latent variables or exclusion restrictions as data-driven “instruments.” In that sense, the structural Bayesian approach can emulate reduced-form logic within a generative model — an idea we’ll develop when we move from unconfounded to confounded data and later when we impute potential outcomes directly. But for now let's continue to examine the relationships between these structural parameters. "
+    "In the plot we can see that the majority of models accurately estimate the true treatment effect $\\alpha$ except in the cases where we have explicitly placed an opinionated prior on the $\\rho$ parameter in the model. These priors pull the $\\alpha$ estimate away from the true data generating process. The variable selection priors considerably shrink the uncertainty in the treatment estimates seemingly picking out the implict instrument structure aping the application of instrumental variables. \n",
+    "\n",
+    "Our Bayesian setup here is intentionally structural. We specify how both treatment and outcome arise from common covariates and latent confounding structures. However, the boundary between structural and reduced-form reasoning becomes fluid when we begin to treat latent variables or exclusion restrictions as data-driven “instruments.” In that sense, the structural Bayesian approach can emulate reduced-form logic within a generative model — an idea we’ll develop further when we move from unconfounded to confounded data and later when we impute potential outcomes directly. \n",
+    "\n",
+    "But for now let's continue to examine the relationships between these structural parameters."
    ]
   },
   {
@@ -2776,13 +2773,7 @@
     "#### Comparing Treatment Estimates\n",
     "\n",
     "The forest plot below compares posterior estimates of the treatment effect ($\\alpha$) and the confounding correlation ($\\rho$) across model specifications when \n",
-    "$\\rho = .6$ in the data-generating process. The baseline normal model (which places diffuse priors on all parameters) clearly reflects the presence of endogeneity. Its posterior mean for $\\alpha$ is biased upward relative to the true value of 3, and the estimated $\\rho$ is positive, confirming that the model detects correlation between treatment and outcome disturbances. This behaviour mirrors the familiar bias of OLS under confounding: without structural constraints or informative priors, the model attributes part of the outcome variation caused by unobserved factors to the treatment itself. This inflates and corrupts our treatment effect estimate. \n",
-    "\n",
-    "By contrast, models that introduce structure through priors—either by tightening the prior range on $\\rho$ or imposing shrinkage on the regression coefficients—perform noticeably better. The tight-$\\rho$ models regularize the latent correlation, effectively limiting the extent to which endogeneity can distort inference, while spike-and-slab and horseshoe priors perform selective shrinkage on the covariates, allowing the model to emphasize variables that genuinely predict the treatment. This helps isolate more valid “instrument-like” components of variation, pulling the posterior of $\\alpha$ closer to the true causal effect. \n",
-    "\n",
-    "The exclusion-restriction specification, which enforces prior beliefs about which covariates affect only the treatment or only the outcome, performs well too. The imposed restrictions recover both the correct treatment effect and a tight estimate of residual correlation. It may be wishful thinking that this precise instrument structure is available to an analyst in the applied setting, but instrument variable designs and their imposed exclusion restrictions should be motivated by theory. Where that theory is plausible we can hope for such precise estimates.\n",
-    "\n",
-    "Together, these results illustrate the power of Bayesian joint modelling: even in the presence of confounding, appropriate prior structure enables partial recovery of causal effects. Importantly, the priors do not simply “fix” the bias—they make explicit the trade-offs between flexibility and identification. This transparency is one of the key advantages of Bayesian causal inference over traditional reduced-form methods."
+    "$\\rho = .6$ in the data-generating process. The baseline normal model (which places diffuse priors on all parameters) clearly reflects the presence of endogeneity. Its posterior mean for $\\alpha$ is biased upward relative to the true value of 3, and the estimated $\\rho$ is positive, confirming that the model detects correlation between treatment and outcome disturbances. This behaviour mirrors the familiar bias of OLS under confounding: without structural constraints or informative priors, the model attributes part of the outcome variation caused by unobserved factors to the treatment itself. This inflates and corrupts our treatment effect estimate. "
    ]
   },
   {
@@ -2838,6 +2829,12 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "By contrast, models that introduce structure through priors—either by tightening the prior range on $\\rho$ or imposing shrinkage on the regression coefficients—perform noticeably better. The tight-$\\rho$ models regularize the latent correlation, effectively limiting the extent to which endogeneity can distort inference, while spike-and-slab and horseshoe priors perform selective shrinkage on the covariates, allowing the model to emphasize variables that genuinely predict the treatment. This helps isolate more valid “instrument-like” components of variation, pulling the posterior of $\\alpha$ closer to the true causal effect. \n",
+    "\n",
+    "The exclusion-restriction specification, which enforces prior beliefs about which covariates affect only the treatment or only the outcome, performs well too. The imposed restrictions recover both the correct treatment effect and a tight estimate of residual correlation. It may be wishful thinking that this precise instrument structure is available to an analyst in the applied setting, but instrument variable designs and their imposed exclusion restrictions should be motivated by theory. Where that theory is plausible we can hope for such precise estimates.\n",
+    "\n",
+    "Together, these results illustrate the power of Bayesian joint modelling: even in the presence of confounding, appropriate prior structure enables partial recovery of causal effects. Importantly, the priors do not simply “fix” the bias—they make explicit the trade-offs between flexibility and identification. This transparency is one of the key advantages of Bayesian causal inference over traditional reduced-form methods.\n",
+    "\n",
     "We can see similar patterns in the below pair plots"
    ]
   },
@@ -5573,7 +5570,7 @@
     ":::{admonition} Advice for the Practitioner\n",
     ":class: tip\n",
     "\n",
-    "We have seen a number of ways in which to model the structural relationships between treatment and outcome for causal inference. In `CausalPy` we will add a flexible API to capture some of these options, but no API can be fully robust for each and every niche problem. You may wish to prioritise one or more of these components in your own modelling. Our main advice here is to model the parameters that matter - the ones that give insight into the structure of your problem. Use them to diagnose the degree of confounding. Use variable selection priors with care as a diagnostic aid for theoretical instruments. Assess your model in context with a range of reasonable alternatives and report the variation honestly. This process, the careful craft of statistical modelling, underwrites contemporary Bayesian workflow and sound causal inference.\n",
+    "We have seen a number of ways in which to model the structural relationships between treatment and outcome for causal inference. In `CausalPy` we will add a flexible API to capture some of these options, but no API can be fully robust for each and every niche problem. You may wish to prioritise one or more of these components in your own modelling. Our main advice here is to model the parameters that matter - the ones that give insight into _the structure of your problem_. Use the structural parameters to diagnose the degree of confounding. Use variable selection priors with care as a diagnostic aid for theoretical instruments. Assess your model in context with a range of reasonable alternatives and report the variation honestly. This process, the careful craft of statistical modelling, underwrites contemporary Bayesian workflow and sound causal inference.\n",
     "\n",
     ":::"
    ]