-
Notifications
You must be signed in to change notification settings - Fork 35
feat: Add /catalogs route for Federated STAC API Support #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
great feature @jonhealy1 that I really missed. Probably would be great to add what you wronte in the readme in the descrition of the PR. My 2c |
|
I'm not really sure what this adds that is not already possible with the existing specs. Can someone explain this a bit better? |
|
@m-mohr I left a response to your comment on #308. In summary, a STAC API generally represents one root Catalog. Users have asked for this extension to avoid running numerous separate API instances. Discoverability across numerous running API instances, representing multiple catalogs, would be practically impossible. While traversing child links is possible, it treats the API like a static file server. It is inefficient for discovery and technically infeasible for extensions. Applying server-side operations (like Aggregations, Sort, or Filter) to a recursive link crawl is functionally impossible without massive latency. |
|
Then please have a look at the STAC API - Children extension, it solves your issue of latency in a very similar way (just with a different name). |
|
@m-mohr Thanks for pointing us towards the Children extension. I think this is something that we will definitely want to add to this project in the near future. |
|
The PR says it implements federated STAC API support, but then it also says it only works on a single infrastructure. How is this federation working? If I have e.g. three STAC APIs already and I want to offer them via a single API, does this PR solve the issue? |
|
If I understand what you're saying, yes. You would have three routes ie. |
|
So if I have three instances that should remain separate instances on different machines, but we want an additional proxy that can query all three instances at a time, this PR doesn't provide a solution, right? You'd need to change them to a single instance? |
|
Correct. It would be interesting to explore creating something like that - something that could query api instances across multiple machines - it could be done asynchronously and then the central api would gather the results. Pagination would be difficult. Sorting too I guess. It could be done though. |
|
There are tools that implement this already. I just wanted to understand the scope of the PR, thanks. |
|
@m-mohr I think the reality for many users is that they do not want to run 3 separate API and database instances just to host 3 logical Catalogs. They would rather maintain and fund one infrastructure. While scaling can be handled via cloud infrastructure, the application architecture needs to support this consolidation. The /catalogs extension provides the necessary routing to host these multiple contexts within that single, cost-effective instance. |
Related Issue(s):
Description
This PR introduces the Catalogs Extension, enabling a federated "Hub and Spoke" architecture within
stac-fastapi.Currently, the API assumes a single Root Catalog containing a flat list of Collections. This works for simple deployments but becomes unwieldy for large-scale implementations aggregating multiple providers, missions, or projects. This change adds a
/catalogsendpoint that acts as a Registry, allowing the API to serve multiple distinct sub-catalogs from a single infrastructure.Key Features
New Endpoints: Implements the full suite of hierarchical endpoints:
GET /catalogs(List all sub-catalogs)POST /catalogs(Create new sub-catalog)DELETE /catalogs/{catalog_id}(Delete a catalog (supports ?cascade=true to delete child collections))GET /catalogs/{catalog_id}(Sub-catalog Landing Page)GET /catalogs/{catalog_id}/collections(Scoped collections)POST /catalogs/{catalog_id}/collections(Create a new collection directly linked to a specific catalog)GET /catalogs/{catalog_id}/collections/{collection_id}(Get one collection)GET /catalogs/{catalog_id}/collections/{collection_id}/items(Scoped item search)GET /catalogs/{catalog_id}/collections/{collection_id}/items/{item_id}(Get one item)Serialization: Updates Pydantic models and serializers to support
type: "Catalog"objects within the API tree (previously restricted to Collections).Configuration: Controlled via
ENABLE_CATALOGS_ROUTEenvironment variable (default:false).Storage Strategy (Non-Breaking)
To ensure zero breaking changes and avoid complex database migrations, this implementation stores
Catalogobjects within the existingcollectionsindex.typefield (type: "Catalog"vs.type: "Collection").Architectural Alignment
This implementation follows the proposed STAC API Catalogs Endpoint Extension (Community Extension).
It addresses the "Data Silo" problem by allowing organizations to host distinct catalogs on a single API instance, rather than deploying separate containers for every project or provider.
Changes
stac_fastapi/core/extensions/catalogs.py: Added the main extension logic and router.stac_fastapi/core/models/: AddedCatalogPydantic models.stac_fastapi/elasticsearch/database_logic.py: Added CRUD logic filtering bytype: "Catalog".tests/: Added comprehensive test suite (test_catalogs.py) covering CRUD operations and hierarchical navigation.PR Checklist:
pre-commit run --all-files)make test)