Skip to content

Conversation

@anth-volk
Copy link
Contributor

@anth-volk anth-volk commented Nov 27, 2025

Fixes #190

Summary

This PR adds support for US state-level and Congressional district-level simulations by:

  • Adding utilities to construct GCS paths for state and district datasets (get_us_state_dataset_path, get_us_congressional_district_dataset_path)
  • Integrating parse_gs_url() from policyengine-core for consistent GCS URL parsing with version support
  • Creating a centralized .datasets/ directory for all downloaded datasets, supporting nested paths (e.g., .datasets/states/RI.h5, .datasets/districts/CA-01.h5)
  • Refactoring _set_data() and _apply_region_to_simulation() for cleaner US region handling
  • Replacing the deprecated SimplifiedGoogleStorageClient with VersionAwareStorageClient supporting both generation-based and metadata-based versioning
  • Adding CI/CD workflow support for the 0.x maintenance branch

Important Note

Only the Rhode Island state-level dataset (states/RI.h5) currently exists in the proper GCS location. While this code correctly implements the feature, it won't run correctly in production for other states or Congressional districts until those datasets are uploaded to gs://policyengine-us-data/states/ and gs://policyengine-us-data/districts/.

Changes

New Files

  • policyengine/utils/data/version_aware_storage_client.py - Storage client with dual versioning support
  • tests/utils/data/test_version_aware_storage_client.py - Tests for the new client
  • tests/utils/data/test_datasets.py - Tests for dataset path utilities

Modified Files

  • policyengine/simulation.py - Refactored _set_data(), _set_data_from_gs(), and _apply_region_to_simulation() for US regions
  • policyengine/utils/data/datasets.py - Added state/district path utilities, integrated parse_gs_url()
  • policyengine/utils/data/caching_google_storage_client.py - Updated to use VersionAwareStorageClient
  • policyengine/utils/google_cloud_bucket.py - Added .datasets/ directory support, renamed parameters for clarity (filepathgcs_key, returns local_path)
  • policyengine/utils/data_download.py - Updated parameter naming for clarity
  • policyengine/utils/maps.py - Fixed download calls (was using invalid repo= parameter)
  • policyengine/outputs/macro/comparison/calculate_economy_comparison.py - Updated download calls

Deleted Files

  • policyengine/utils/data/simplified_google_storage_client.py - Replaced by VersionAwareStorageClient
  • tests/utils/data/test_simplified_google_storage_client.py - Removed with deprecated client

Test plan

  • Verify Rhode Island state simulation runs correctly with existing dataset
  • Verify downloaded files are stored in .datasets/ directory
  • Verify unit tests pass for new storage client and dataset utilities
  • Verify other states/districts fail gracefully until datasets are uploaded

🤖 Generated with Claude Code

@anth-volk anth-volk linked an issue Nov 27, 2025 that may be closed by this pull request
@anth-volk anth-volk merged commit 6a31426 into 0.x Nov 27, 2025
3 checks passed
@anth-volk anth-volk deleted the feat/enable-us-districts branch November 27, 2025 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable US districts on maintenance branch

2 participants