feat: Add US state and Congressional district dataset support #191
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #190
Summary
This PR adds support for US state-level and Congressional district-level simulations by:
get_us_state_dataset_path,get_us_congressional_district_dataset_path)parse_gs_url()frompolicyengine-corefor consistent GCS URL parsing with version support.datasets/directory for all downloaded datasets, supporting nested paths (e.g.,.datasets/states/RI.h5,.datasets/districts/CA-01.h5)_set_data()and_apply_region_to_simulation()for cleaner US region handlingSimplifiedGoogleStorageClientwithVersionAwareStorageClientsupporting both generation-based and metadata-based versioning0.xmaintenance branchImportant Note
Only the Rhode Island state-level dataset (
states/RI.h5) currently exists in the proper GCS location. While this code correctly implements the feature, it won't run correctly in production for other states or Congressional districts until those datasets are uploaded togs://policyengine-us-data/states/andgs://policyengine-us-data/districts/.Changes
New Files
policyengine/utils/data/version_aware_storage_client.py- Storage client with dual versioning supporttests/utils/data/test_version_aware_storage_client.py- Tests for the new clienttests/utils/data/test_datasets.py- Tests for dataset path utilitiesModified Files
policyengine/simulation.py- Refactored_set_data(),_set_data_from_gs(), and_apply_region_to_simulation()for US regionspolicyengine/utils/data/datasets.py- Added state/district path utilities, integratedparse_gs_url()policyengine/utils/data/caching_google_storage_client.py- Updated to useVersionAwareStorageClientpolicyengine/utils/google_cloud_bucket.py- Added.datasets/directory support, renamed parameters for clarity (filepath→gcs_key, returnslocal_path)policyengine/utils/data_download.py- Updated parameter naming for claritypolicyengine/utils/maps.py- Fixed download calls (was using invalidrepo=parameter)policyengine/outputs/macro/comparison/calculate_economy_comparison.py- Updated download callsDeleted Files
policyengine/utils/data/simplified_google_storage_client.py- Replaced byVersionAwareStorageClienttests/utils/data/test_simplified_google_storage_client.py- Removed with deprecated clientTest plan
.datasets/directory🤖 Generated with Claude Code