|
84 | 84 | "\n", |
85 | 85 | "### Simulating the Source of Truth\n", |
86 | 86 | "\n", |
87 | | - "Before we fit any models, we need data whose causal structure is known. Simulation gives us a controlled environment where we can specify how treatments and outcomes are generated, introduce confounding deliberately, and then test whether our methods recover the truth. The function below constructs such a dataset." |
| 87 | + "Every causal claim rests on untestable assumptions about the data-generating process. Before we can trust our methods in the wild, we must test them in controlled conditions where truth is known. The simulation below constructs such a laboratory: we specify the causal structure explicitly, introduce confounding deliberately, and then ask whether our Bayesian models recover what we seeded in the data. " |
88 | 88 | ] |
89 | 89 | }, |
90 | 90 | { |
|
391 | 391 | "\n", |
392 | 392 | "The model is built using PyMC and organized through the function `make_joint_model()`. Each version shares the same generative logic but differs in how the priors handle variable selection and identification. We can think of these as different “dial settings” for how strongly the model shrinks irrelevant coefficients or searches for valid instruments. Four prior configurations are explored:\n", |
393 | 393 | "\n", |
394 | | - "- A normal prior, serving as a baseline regularized regression with weakly informative priors on all coefficients.\n", |
| 394 | + "- A normal prior, weak regularization with no variable selection. If the model succeeds here, the causal structure is identified through the joint modeling alone.\n", |
395 | 395 | "\n", |
396 | 396 | "- A spike-and-slab prior, which aggressively prunes away variables unlikely to matter, allowing the model to discover which features are true confounders or instruments.\n", |
397 | 397 | "\n", |
398 | | - "- A horseshoe prior, offering continuous shrinkage that downweights noise while preserving large signals.\n", |
| 398 | + "- A horseshoe prior, offering continuous shrinkage that downweights noise while preserving large signals. This is a middle path that downweights weak predictors without forcing them exactly to zero.\n", |
399 | 399 | "\n", |
400 | 400 | "- An exclusion-restriction prior, explicitly encoding which variables are allowed to influence the treatment but not the outcome, mimicking an instrumental-variable design.\n", |
401 | 401 | "\n", |
402 | | - "In the unconfounded case, the treatment and outcome errors are independent, so the joint model effectively decomposes into two connected regressions. The treatment effect $\\alpha$ then captures the causal impact of the treatment on the outcome, and under this setting, its posterior should center around the true value of 3. The goal is not to solve confounding yet but to show that when the world is simple and well-behaved, the Bayesian model recovers the truth just as OLS does—but with richer uncertainty quantification and a coherent probabilistic structure.\n", |
| 402 | + "Each prior embodies a different epistemological stance on how much structure the data can learn versus how much the analyst must impose. In the unconfounded case, the treatment and outcome errors are independent, so the joint model effectively decomposes into two connected regressions. The treatment effect $\\alpha$ then captures the causal impact of the treatment on the outcome, and under this setting, its posterior should center around the true value of 3. The goal is not to solve confounding yet but to show that when the world is simple and well-behaved, the Bayesian model recovers the truth just as OLS does—but with richer uncertainty quantification and a coherent probabilistic structure.\n", |
403 | 403 | "\n", |
404 | 404 | "The following code defines the model and instantiates it under several prior choices. The model’s graphical representation, produced by `pm.model_to_graphviz()`, visualizes its structure: covariates feed into both the treatment and the outcome equations, the treatment coefficient $\\alpha$ links them, and the two residuals \n", |
405 | 405 | "$U$ and $V$ are connected through a correlation parameter $\\rho$, which we can freely set to zero or more substantive values. These parameterisations offer us a way to derive insight into the structure of the causal system under study. \n", |
|
0 commit comments