-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Summary
Analysis of the Minnesota (MN) state dataset reveals significant discrepancies from actual Minnesota statistics and a data quality issue with multi-tax-unit household distribution.
Issues Found
1. Population and Household Undercount
| Metric | MN Dataset | Real/Target | Difference |
|---|---|---|---|
| Population | ~4.1M | 5.74M | -29% |
| Households | 1,254,857 | ~2,344,432 | -46% |
Sources:
2. Multi-Tax-Unit Households Concentrated in Top Income Deciles
When analyzing a CTC reform that should only affect low-income households (phases out at ~\0-40k), we found unexpected impacts in the 8th, 9th, and 10th income deciles.
Investigation revealed that affected high-income households have dramatically more tax units than unaffected ones:
| Metric | Affected Top-Decile HH | Unaffected Top-Decile HH |
|---|---|---|
| Avg tax units per household | 6.41 | 1.55 |
| Household count | 50,907 | 325,752 |
Distribution of tax units among affected top-decile households:
- 53% have 8 tax units
- 40% have 5 tax units
- Only 6% have 2 tax units
This is likely a bug - multi-tax-unit households should be distributed across the income spectrum, not concentrated in the top deciles.
Impact
This causes misleading results when analyzing policies that target low-income populations (like CTCs, EITC, etc.), as the impacts appear to affect wealthy households when they shouldn't.
Investigation Details
Full analysis documented in: PolicyEngine/analysis-notebooks#108
Suggested Fix
Review the household/tax-unit mapping and weighting in the state dataset calibration to ensure:
- Population and household counts match targets
- Multi-tax-unit households are distributed realistically across income deciles