|
26 | 26 | "source": [ |
27 | 27 | "## Bayesian Structural Causal Inference\n", |
28 | 28 | "\n", |
29 | | - "When we ask \"What is the effect of a medical treatment?\" or \"Does quitting smoking cause weight gain?\" or \"Do job training programs increase earnings?\", we are not simply asking about the treatment itself. We are asking: What world are we operating in? This perspective is more easily seen if you imagine a causal analyst as a pet-shop owner introducing a new fish to one of their many acquariums. The new fish's survival and behavior depend less on its intrinsic properties than on how it fits within this complex, interconnected system. In which tank will the new fish thrive? \n", |
| 29 | + "When we ask \"What is the effect of a medical treatment?\" or \"Does quitting smoking cause weight gain?\" or \"Do job training programs increase earnings?\", we are not simply asking about the treatment itself. We are asking: What world are we operating in? This perspective is more easily seen if you imagine a causal analyst as a pet-shop owner introducing a new fish to one of their many acquariums. The new fish's survival and behavior depend less on its intrinsic properties than on how it fits within this complex, interconnected system of PH balances and predators. In which tank will the new fish thrive? \n", |
30 | 30 | "\n", |
31 | | - "There are a number of complementary paradigms in the causal inference literature, and in `CausalPy` we have not sought to be advocate for any one view over another. There are valuable lessons to be learned from econometrics, psychology and statistics whether they adopt a Pearlian or Potential outcomes framing for their causal work. Where the methods are statistically sound and practical we will seek to adopt and evangelise their usage. See {cite:t}`pearl2000causality` or {cite:t}`angrist2009mostly` for more detailed distinctions. In this article we want to focus on the idea of a causal model as a probabilistic program. An inferential routine designed to explicitly yield insights into the effect of some intervention or treatment on an outcome of interest. \n", |
| 31 | + "Different causal methods make different choices about how much of this system to model explicitly. Some methods succeed by not modeling the full system: instrumental variables isolate causal effects through credible exclusion restrictions; difference-in-differences leverages parallel trends; interrupted time-series assumes stationarity. These design-based approaches gain power by minimizing modeling assumptions about the data-generating process. See {cite:t}`pearl2000causality` or {cite:t}`angrist2009mostly` for more detailed distinctions. The unifying thread between these diverse methods is the idea of a causal model as a _probabilistic program_ : an inferential routine designed to explicitly yield insights into the effect of some intervention or treatment on the system of interest. Whether design based or model-based, causal inference methods assume a data generating process - the distinction between these methods is how explicitly the system is rendered.\n", |
32 | 32 | "\n", |
33 | 33 | "#### Modelling Worlds and Counterfactual Worlds\n", |
34 | 34 | "\n", |
35 | | - "Some of the algorithms and routines available in `CausalPy` focus on deriving insight from a known environment e.g. stationarity with interrupted time-series, parallel trends in difference-in-differences, positivity in propensity score weighting and strong instruments with instrumental variable designs. These methods rely on stated assumptions or facts about the environment in which the treatment takes place to justify their conclusions as causal claims. In Bayesian structural causal inference the focus is slightly different in that we wish to model both the treatment but also the environment i.e. the fish and the fishtank. In this article we'll outline a species of modelling that tries to infer structural attributes of the environment to underwrite causal claims. \n", |
| 35 | + "Bayesian structural modeling attempts to parameterize the system itself. Where design-based methods answer \"what is the causal effect under these identification assumptions?\", structural models ask \"what is the most plausible data-generating process, and how do interventions propagate through it?\". In Bayesian structural causal inference the focus is slightly different in that we wish to model both the treatment and the environment i.e. the fish and the fishtank. The trade-off is transparency for complexity. You must specify more of the data-generating process, which creates more opportunities for model misspecification. But every assumption becomes an explicit, testable model component rather than an implicit background condition.\n", |
36 | 36 | "\n", |
37 | | - "This is a two step move in the Bayesian paradigm. First we infer \"backwards\" what is the most plausible state of the world $w$ conditioned on the observable data. Then we assess the probabilistic predictive distribution of treatment and outcome at the plausible range of worlds. \n", |
| 37 | + "This is a two step move in the Bayesian paradigm. First we infer \"backwards\" what is the most plausible state of the world $w$ conditioned on the observable data. The \"world\" of the model is defined by: (1) a causal graph relating variables, (2) likelihood functions specifying how each variable depends on its causes, and (3) prior distributions over parameters. Optionally, this may include latent confounders, measurement models, and selection mechanisms—each adding structural detail but also complexity. With this world in place, we continue to assess the probabilistic predictive distribution of treatment and outcome at the plausible range of counterfactual worlds. \n", |
38 | 38 | "\n", |
39 | 39 | "\n", |
40 | 40 | "\n", |
41 | | - "The important point is that we characterise the plausible worlds by how much structure we learn about in the model specification. The more structure we seek to infer, the more we risk model misspecification, but simultaneously, the more structure we learn the more useful and transparent our conclusions. The \"world\" of the model is defined by the graph structure, latent confounders, link functions, measurement models and the implementation of selection mechanisms. A full picture of the data generating process. Contrast this with the simpler case. \n", |
| 41 | + "The important point is that we characterise the plausible worlds by how much structure we learn about in the model specification. The more structure we seek to infer, the more we risk model misspecification, but simultaneously, the more structure we learn the more useful and transparent our conclusions. This structural commitment contrasts sharply with reduced-form approaches that minimize explicit modeling.\n", |
42 | 42 | "\n", |
43 | | - "#### Not mere Association\n", |
| 43 | + "#### Minimalism and Structural Maximalism\n", |
| 44 | + "\n", |
| 45 | + "The term \"reduced form\" originates from econometric simultaneous equations models. Early economists wanted to model supply and demand as functions of price, but faced a problem: quantities also determine price in competitive markets. Because these structural relationships are mutually determined, the system is hard to solve directly. The solution was algebraic transformation: solve for the 'reduced form' that expresses endogenous variables purely as functions of exogenous ones.\n", |
| 46 | + "\n", |
| 47 | + "Reduced form systems are transformed systems of interest designed to estimate the focal parameters by leveraging observable and tractable data. These approaches eschew \"theory driven\" model specifications in favour of models with precise _identifiable estimands_. This approach - transforming complex structural systems into tractable estimating equations - reflects a broader methodological commitment. It is for this minimalist preference that they are typically contrasted with structural models that aim to express the \"fuller\" data generating process. Design based causal inference methods typically adopt this focus on identifiability within a regression framework. For richer discussion in this vein see {cite:t}`hansenEconometrics` or {cite:t}`aronowFoundations`. \n", |
44 | 48 | "\n", |
45 | 49 | "When we regress an outcome $Y$ on a treatment $T$ and a set of covariates $X$,\n", |
46 | 50 | "\n", |
47 | 51 | "$$Y = \\alpha T + X \\beta + \\epsilon$$\n", |
48 | 52 | "\n", |
49 | | - "the coefficient $\\alpha$ captures the average change in Y associated with a one-unit change in $T$ — but only under strong assumptions can it be interpreted as a causal effect. In real-world settings, those assumptions (like exogeneity of $T$) are fragile:\n", |
| 53 | + "the coefficient $\\alpha$ captures the average change in Y associated with a one-unit change in $T$. Only under strong assumptions, however, can we interpret this as a causal effect. In real-world settings, those assumptions (like exogeneity of $T$) are fragile:\n", |
50 | 54 | "\n", |
51 | 55 | "- Confounding: Unobserved or omitted variables affect both \n", |
52 | 56 | "$T$ and $Y$.\n", |
|
55 | 59 | "\n", |
56 | 60 | "- Measurement uncertainty: Model parameters and predictions have uncertainty not captured by point estimates.\n", |
57 | 61 | "\n", |
58 | | - "Many methods conceived in _the credibility revolution_ aimed to overcome these limitations by placing constraints to improve parameter identification under threat of confounding. See See {cite:t}`angrist2009mostly`. Bayesian probabilistic causal inference addresses these challenges by explicitly modelling the data-generating process and quantifying all sources of uncertainty. Rather than point estimates, we infer full posterior distributions over causal parameters and even over counterfactual outcomes. Rather than isolating the outcome equation from the treatment equation, we model them together as parts of a single generative system. This approach mirrors how interventions occur in the real world: treatments have causes, and outcomes respond to both those treatments and shared confounders. When we fit such a model, we learn about every component simultaneously—the effect of the treatment, the influence of confounders, and the uncertainty that ties them together. Once fitted, Bayesian models can generate posterior predictive draws for “what if” scenarios. This capacity lets us compute causal estimands like the ATE or individual treatment effects directly from the posterior.\n", |
| 62 | + "The innovative methods of inference (like Two-stage least squares, propensity score weighting or DiD designs) that came to define the _credibility revolution_ in the social sciences, seek to overcome this risk of confounding with constraints or assumptions to bolster identification of the causal parameters. See See {cite:t}`angrist2009mostly`. Bayesian probabilistic causal inference addresses these challenges by explicitly modelling the data-generating process and quantifying all sources of uncertainty. Rather than point estimates and design assumptions, we infer full posterior distributions over causal parameters and even over counterfactual outcomes. Rather than isolating the outcome equation from the treatment equation, we model them together as parts of a single generative system. This approach mirrors how interventions occur in the real world. The propensity for adopting a treatment can be predicted by the same factors which determine treatment outcomes. This structure creates the risk of confounding because the efficacy of the treatment is obscured by the influence of these shared predictors. When we fit such a model, we learn about every component simultaneously—the effect of the treatment, the influence of confounders, and the uncertainty that ties them together. Once fitted, Bayesian models can generate posterior predictive draws for “what if” scenarios. This capacity lets us compute causal estimands like the ATE or individual treatment effects directly from the posterior.\n", |
59 | 63 | "\n", |
60 | | - "In this tutorial, we’ll move step by step from data simulation to Bayesian estimation:\n", |
| 64 | + "In this tutorial, we’ll move step by step from data simulation to Structural Bayesian Causal models:\n", |
61 | 65 | "\n", |
62 | 66 | ":::{admonition} The Structure of the Document\n", |
63 | 67 | ":class: tip\n", |
|
339 | 343 | "source": [ |
340 | 344 | "This loop re-simulates the dataset ten times, each with a different value of $\\rho$, ranging from –1 to 1. For each dataset, it fits two OLS regressions: one for the continuous treatment, and another for the binary treatment, both controlling for all observed covariates. The estimated coefficient on the treatment variable `T_cont` or `T_bin`—represents what OLS believes to be the causal effect. By collecting these estimates in df_params, we can plot them against the true correlation to see how endogeneity distorts inference.\n", |
341 | 345 | "\n", |
342 | | - "When $\\rho = 0$ the treatment and outcome errors are independent, and OLS recovers the true causal effect of 3. But as $\\rho$ grows, the estimates drift away from the truth, sometimes dramatically. The direction of bias depends on the sign of if unobserved factors push both treatment and outcome in the same direction, OLS overstates the effect; if they push in opposite directions, it understates it. Even though we’ve controlled for all observed features, the unobserved correlation sneaks bias into our estimates." |
| 346 | + "When $\\rho = 0$ the treatment and outcome errors are independent, and OLS recovers the true causal effect of 3. But as $\\rho$ grows, the estimates drift away from the truth, sometimes dramatically. The direction of bias depends on the sign of the unobserved relationship. If hidden factors push both treatment and outcome the same way, OLS overstates the effect. If they act in opposite directions, it understates it. Even though we’ve controlled for all observed features, the unobserved correlation sneaks bias into our estimates." |
343 | 347 | ] |
344 | 348 | }, |
345 | 349 | { |
|
1602 | 1606 | "In econometric terms, what we’ve done so far sits squarely within the structural modelling tradition. We’ve written down a joint model for both the treatment and the outcome, specified their stochastic dependencies explicitly, and interpreted the slope $\\alpha$ as a structural parameter — a feature of the data-generating process itself. This parameter has a causal meaning only insofar as the model is correctly specified: if the structural form reflects how the world actually works, \n", |
1603 | 1607 | "$\\alpha$ recovers the true causal effect. By contrast, reduced-form econometrics focuses less on modelling the underlying mechanisms and more on identifying causal effects through observable associations research design — instrumental variables, difference-in-differences, or randomization. Reduced-form approaches avoid the need to specify the joint distribution of unobservables but often sacrifice interpretability: they estimate relationships that are valid for specific interventions or designs, not necessarily structural primitives.\n", |
1604 | 1608 | "\n", |
1605 | | - ":::{admonition} Reduced Forms and Structural Modelling\n", |
1606 | | - ":class: tip\n", |
1607 | | - "\n", |
1608 | | - "The canonical example of a reduced form modelling strategy stems from econometrics where the system of interest was a set of simultaneous equations. The equations tried to model supply and demand as a function of price. However the quantities of supply and demand also determine the price value in a competitive market place. The \"structural relationships\" between supply and price are the focal interest of the economist, but because they are mututally determined the system is hard to solve as stated. Instead, the economists realised that an algebraic transformation would allow them to solve the \"reduced form\" of the simultaneous equations. The usage of the term is somewhat muddled, but generally when we use it we mean to say that we've transformed the system of interest to estimate the focal parameters by primarily leveraging observable data. Eschewing \"theory driven\" model specifications in favour of models with precise _identifiable estimands_, reduced form modelling has an aesthetic and conservative commitment to minimalist assumptions. It is for this minimalist preference that they are typically contrasted with structural models that aim to express the \"fuller\" data generating process. For richer discussion in this vein see {cite:t}`hansenEconometrics` or {cite:t}`aronowFoundations`. \n", |
1609 | | - "\n", |
1610 | | - "\n", |
1611 | | - ":::\n", |
1612 | | - "\n", |
1613 | 1609 | "#### Comparing Treatment Estimates\n", |
1614 | 1610 | "\n", |
1615 | 1611 | "The comparison of models is a form of robustness checks. We want to inspect how consistent our parameter estimates are across different model specifications. " |
|
0 commit comments