This tool rebuilds shapes.txt for a GTFS feed by map-matching trip stops to OpenStreetMap (OSM) data. It supports both road (bus) and rail (train, tram, subway) networks.
- Automatic Mode Detection: Infers whether a route is "road" or "rail" based on
route_typeand keywords in route names. - Hybrid Graph Building: Builds separate routing graphs for road and rail networks from a single OSM PBF file.
- Map Matching: Uses a Hidden Markov Model (HMM) approach (via
networkxshortest paths between candidates) to find the most likely path through the street/rail network. This guarantees the shapes will find a way to get stop-to-stop, but the real shape can obviously vary if that trip actually takes a path between stops which is not the shortest. - Shape Simplification: Simplifies the resulting polylines to reduce file size while maintaining accuracy.
- Deduplication: Assigns shared
shape_ids to trips with identical geometries to keepshapes.txtcompact. - Visualizer: Includes a web-based viewer to watch the graph building and shape generation process in real-time.
- Python 3.9+
- A valid GTFS feed (unpacked directory).
- An OpenStreetMap PBF (OSM.PBF) file covering the region of the GTFS feed.
- Clone the repository.
- Create a virtual environment:
python -m venv .venv source .venv/bin/activate # Linux/Mac .venv\Scripts\activate # Windows
- Install dependencies:
Note: You may need to install
pip install -r requirements.txt
osmiumdependencies separately if the pip install fails (e.g.,libosmiumon Linux).
To rebuild shapes for a GTFS feed using an OSM PBF file:
python main.py --gtfs /path/to/gtfs_dir --osm /path/to/region.osm.pbfThis will:
- Load the GTFS data.
- Compute the bounding box of all stops.
- Build road and/or rail graphs from the PBF file within that bounding box.
- Process every trip in
trips.txt, generating a shape. - Write a new
shapes.txtto the GTFS directory. - Update
trips.txtwith the newshape_ids (backing up the original astrips.txt.bak). - Generate a
shape_id_map.csvin the GTFS directory, mapping each trip to its assigned shape ID.
| Argument | Description | Default |
|---|---|---|
--gtfs |
Path to the unpacked GTFS directory. | trgtfs |
--osm |
Path to the OSM PBF file. | lazio.osm.pbf |
--modes |
Which graphs to build: road, rail, or both. |
both |
--dry-run |
Run without writing changes to disk. | False |
--max-trips |
Limit the number of trips to process (for testing). | None |
--tolerance-road |
Simplification tolerance (meters) for road shapes. | 5.0 |
--tolerance-rail |
Simplification tolerance (meters) for rail shapes. | 3.0 |
--with-viewer |
Launch the web visualizer. | False |
If you only want to process rail trips (e.g., for a train-only feed or to save time):
python main.py --gtfs ./my_gtfs --osm ./italy.osm.pbf --modes railAny trips identified as "road" (bus) will be skipped.
To watch the process in real-time:
python main.py --with-viewerThis will open a web browser at http://127.0.0.1:1890. Click "Build Graphs (Live View)" to start.
Note: The visualizer is a basic tool for debugging and watching progress. It is not highly optimized and may struggle with very large datasets. It is "not one of the bests", but it gets the job done for monitoring.
-
OSM Coverage: You MUST provide an OSM PBF that covers the entire area of your GTFS feed. If the PBF is too small, stops outside the area will not be matched, and trips may fail or result in straight lines.
- Tip: Download a larger region (e.g., the whole country or municipality) from Geofabrik. The script automatically filters the graph to the bounding box of your stops, so using a large PBF is efficient.
-
Graph Connectivity: The script assumes the OSM network is connected. If stops are far from any road/rail (e.g., bad stop coordinates or missing OSM data), the map matching may fail or produce straight lines between those stops.
-
Performance:
- Building graphs from large PBFs can take time and memory (RAM), and may as well cause CPU strain.
- Processing thousands of trips can take a while, especially if they are road ones. Use
--max-tripsto test on a subset first.
-
Route Mode Inference: The script uses a "smart" heuristic to determine if a route is Road (Bus) or Rail (Train/Tram/Subway).
- Priority 1: Keywords: It checks
route_id,route_short_name, androute_long_namefor keywords.- Road keywords: "bus", "autobus", "pullman"
- Rail keywords: "rail", "train", "metro", "subway", "tram", "ferrovia", "metropolitana"
- Priority 2: GTFS
route_type: If no keywords are found, it falls back to the standard GTFSroute_typefield.3= Road (Bus)0(Tram),1(Subway),2(Rail) = Rail
- Default: If neither matches, it defaults to Road.
You can edit keywords to your liking in the
route_modefunction.If your GTFS uses non-standard
route_typevalues (e.g. extended types like 700 for bus) or lacks clear names, you may need to edit theroute_modefunction inmain.py. - Priority 1: Keywords: It checks
- "ModuleNotFoundError: No module named 'pandas'": Ensure you have activated your virtual environment and installed requirements.
- Straight lines in output: This usually means the map matching failed for those segments. Check if:
- The OSM PBF covers that area.
- The stops are close enough to roads/rails.
- The correct
--modeswere enabled.
- Script crashes with MemoryError: Try using a smaller PBF (cropped to your region) or a machine with more RAM. Unfortunately, for big areas such as entire regions or even entire nations, not much can be done to not make your computer crash at build time.
CC-BY-NC-SA @Ciospettw