Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .claude/skills/designing-experiments/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
name: designing-experiments
description: Selects the appropriate quasi-experimental method (DiD, ITS, SC) based on data structure and research questions. Use when the user is unsure which method to apply.
---

# Designing Experiments

Helps select the appropriate causal inference method.

## Decision Framework

1. **Control Group?**
* **Yes**: Go to Step 2.
* **No**: Consider **Interrupted Time Series (ITS)**.

2. **Unit Structure?**
* **Single Treated Unit**:
* With multiple controls: **Synthetic Control (SC)**.
* No controls: **ITS**.
* **Multiple Treated Units**:
* With control group: **Difference-in-Differences (DiD)**.

3. **Time Structure?**
* **Panel Data** (Multiple units over time): Required for DiD and SC.
* **Time Series** (Single unit over time): Required for ITS.

## Method Quick Reference

* **Difference-in-Differences (DiD)**: Compares trend changes between treated and control groups. Assumes **Parallel Trends**.
* **Interrupted Time Series (ITS)**: Analyzes trend/level change for a single unit after intervention. Assumes **Trend Continuity**.
* **Synthetic Control (SC)**: Constructs a synthetic counterfactual from weighted control units. Assumes **Convex Hull** (treated unit within range of controls).
30 changes: 30 additions & 0 deletions .claude/skills/loading-datasets/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
name: loading-datasets
description: Loads internal CausalPy example datasets. Use when the user needs example data or asks about available demos.
---

# Loading Datasets

Loads example datasets provided with CausalPy.

## Usage

```python
import causalpy as cp
df = cp.load_data("dataset_name")
```

## Available Datasets

| Key | Description |
| :--- | :--- |
| `did` | Generic Difference-in-Differences |
| `its` | Generic Interrupted Time Series |
| `sc` | Generic Synthetic Control |
| `banks` | DiD (Banks) |
| `brexit` | Synthetic Control (Brexit) |
| `covid` | ITS (Covid) |
| `drinking` | Regression Discontinuity (Drinking Age) |
| `rd` | Generic Regression Discontinuity |
| `geolift1` | GeoLift (Single cell) |
| `geolift_multi_cell` | GeoLift (Multi cell) |
28 changes: 28 additions & 0 deletions .claude/skills/performing-causal-analysis/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: performing-causal-analysis
description: Fits causal models, estimates impacts, and plots results using CausalPy. Use when performing analysis with DiD, ITS, SC, or RD.
---

# Performing Causal Analysis

Executes causal analysis using CausalPy experiment classes.

## Workflow

1. **Load Data**: Ensure data is in a Pandas DataFrame.
2. **Initialize Experiment**: Use the appropriate class (see References).
3. **Fit & Model**: Models are fitted automatically upon initialization if arguments are provided.
4. **Analyze Results**: Use `summary()`, `print_coefficients()`, and `plot()`.

## Core Methods

* `experiment.summary()`: Prints model summary and main results.
* `experiment.plot()`: Visualizes observed vs. counterfactual.
* `experiment.print_coefficients()`: Shows model coefficients.

## References

Detailed usage for specific methods:
* [Difference-in-Differences](reference/diff_in_diff.md)
* [Interrupted Time Series](reference/interrupted_time_series.md)
* [Synthetic Control](reference/synthetic_control.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Causal Difference-in-Differences (DiD)

Difference-in-Differences (DiD) estimates the causal effect of a treatment by comparing the changes in outcomes over time between a treatment group and a control group.

## Class: `DifferenceInDifferences`

```python
causalpy.experiments.DifferenceInDifferences(
data,
formula,
time_variable_name,
group_variable_name,
post_treatment_variable_name="post_treatment",
model=None,
**kwargs
)
```

### Parameters
* **`data`** (`pd.DataFrame`): Input dataframe containing panel data.
* **`formula`** (`str`): Statistical formula (e.g., `"y ~ 1 + group * post_treatment"`).
* **`time_variable_name`** (`str`): Column name for the time variable.
* **`group_variable_name`** (`str`): Column name for the group indicator (0=Control, 1=Treated). **Must be dummy coded**.
* **`post_treatment_variable_name`** (`str`): Column name indicating the post-treatment period (0=Pre, 1=Post). Default is `"post_treatment"`.
* **`model`**: A PyMC model (e.g., `cp.pymc_models.LinearRegression`) or a Scikit-Learn Regressor.

### How it Works
1. **Fit**: The model fits all available data (pre/post, treatment/control).
2. **Counterfactual**: Predicted by setting the interaction term between `group` and `post_treatment` to 0.
3. **Impact**: The causal impact is the difference between observed and counterfactual.

### Example

```python
import causalpy as cp
import causalpy.pymc_models as cp_pymc

df = cp.load_data("did")

result = cp.DifferenceInDifferences(
df,
formula="y ~ 1 + group*post_treatment",
time_variable_name="t",
group_variable_name="group",
model=cp_pymc.LinearRegression(sample_kwargs={"target_accept": 0.9})
)

result.summary()
result.plot()
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Causal Interrupted Time Series (ITS)

Interrupted Time Series (ITS) analyzes the effect of an intervention on a single time series by comparing the trend before and after the intervention.

## Class: `InterruptedTimeSeries`

```python
causalpy.experiments.InterruptedTimeSeries(
data,
treatment_time,
formula,
model=None,
**kwargs
)
```

### Parameters
* **`data`** (`pd.DataFrame`): Input dataframe. Index should ideally be a `pd.DatetimeIndex`.
* **`treatment_time`** (`Union[int, float, pd.Timestamp]`): The point in time when the intervention occurred.
* **`formula`** (`str`): Statistical formula (e.g., `"y ~ 1 + t + C(month)"`).
* **`model`**: A PyMC model (e.g., `cp.pymc_models.LinearRegression`) or a Scikit-Learn Regressor.

### How it Works
1. **Split**: Data is split into pre- and post-intervention.
2. **Fit**: Model is trained **only on pre-intervention data**.
3. **Predict**: Fitted model predicts the outcome for the post-intervention period.
4. **Impact**: Difference between observed post-intervention data and counterfactual predictions.

### Example

```python
import causalpy as cp
import causalpy.pymc_models as cp_pymc
import pandas as pd

df = cp.load_data("its")
df["date"] = pd.to_datetime(df["date"])
df.set_index("date", inplace=True)

treatment_time = pd.to_datetime("2017-01-01")

result = cp.InterruptedTimeSeries(
df,
treatment_time,
formula="y ~ 1 + t + C(month)",
model=cp_pymc.LinearRegression()
)

result.summary()
result.plot()
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Causal Synthetic Control (SCG)

Synthetic Control constructs a "synthetic" counterfactual unit using a weighted combination of untreated control units.

## Class: `SyntheticControl`

```python
causalpy.experiments.SyntheticControl(
data,
treatment_time,
control_units,
treated_units,
model=None,
**kwargs
)
```

### Parameters
* **`data`** (`pd.DataFrame`): Input dataframe containing panel data.
* **`treatment_time`** (`Union[int, float, pd.Timestamp]`): The time of intervention.
* **`control_units`** (`List[str]`): List of column names representing the control units.
* **`treated_units`** (`List[str]`): List of column names representing the treated unit(s).
* **`model`**: A PyMC model (typically `cp.pymc_models.WeightedSumFitter`) or a Scikit-Learn Regressor.

### How it Works
1. **Fit**: Model learns weights for `control_units` to approximate `treated_units` using **only pre-intervention data**.
2. **Predict**: Weights are applied to `control_units` in post-intervention period.
3. **Impact**: Difference between observed treated unit and synthetic counterfactual.

### Example

```python
import causalpy as cp
import causalpy.pymc_models as cp_pymc

df = cp.load_data("sc")
treatment_time = 70

result = cp.SyntheticControl(
df,
treatment_time,
control_units=["a", "b", "c", "d", "e"],
treated_units=["actual"],
model=cp_pymc.WeightedSumFitter()
)

result.summary()
result.plot()
```
25 changes: 25 additions & 0 deletions .claude/skills/running-placebo-analysis/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
name: running-placebo-analysis
description: Performs placebo-in-time sensitivity analysis to validate causal claims. Use when checking model robustness, verifying lack of pre-intervention effects, or ensuring observed effects are not spurious.
---

# Running Placebo Analysis

Executes placebo-in-time sensitivity analysis to validate causal experiments.

## Workflow

1. **Define Experiment Factory**: Create a function that returns a fitted CausalPy experiment (e.g., ITS, DiD, SC) given a dataset and time boundaries.
2. **Configure Analysis**: Initialize `PlaceboAnalysis` with the factory, dataset, intervention dates, and number of folds (cuts).
3. **Run Analysis**: Execute `.run()` to fit models on pre-intervention data folds.
4. **Evaluate Results**: Compare placebo effects (which should be null) to the actual intervention effect. Use histograms and hierarchical models to quantify the "status quo" distribution.

## Key Concepts

* **Placebo-in-time**: Simulating an intervention at a time when none occurred to check if the model falsely detects an effect.
* **Fold**: A slice of pre-intervention data used to test a placebo period.
* **Factory Pattern**: Decouples the placebo logic from the specific CausalPy experiment type.

## References

* [Placebo-in-time Implementation](reference/placebo_in_time.md): Full code for the `PlaceboAnalysis` class, usage examples, and hierarchical status-quo modeling.
Loading