Skip to content

Conversation

@jonhealy1
Copy link
Collaborator

@jonhealy1 jonhealy1 commented Dec 2, 2025

Related Issue(s):

Description

This PR introduces the Catalogs Extension, enabling a federated "Hub and Spoke" architecture within stac-fastapi.

Currently, the API assumes a single Root Catalog containing a flat list of Collections. This works for simple deployments but becomes unwieldy for large-scale implementations aggregating multiple providers, missions, or projects. This change adds a /catalogs endpoint that acts as a Registry, allowing the API to serve multiple distinct sub-catalogs from a single infrastructure.

Key Features

  • New Endpoints: Implements the full suite of hierarchical endpoints:

    • GET /catalogs (List all sub-catalogs)
    • POST /catalogs (Create new sub-catalog)
    • DELETE /catalogs/{catalog_id} (Delete a catalog (supports ?cascade=true to delete child collections))
    • GET /catalogs/{catalog_id} (Sub-catalog Landing Page)
    • GET /catalogs/{catalog_id}/collections (Scoped collections)
    • POST /catalogs/{catalog_id}/collections (Create a new collection directly linked to a specific catalog)
    • GET /catalogs/{catalog_id}/collections/{collection_id} (Get one collection)
    • GET /catalogs/{catalog_id}/collections/{collection_id}/items (Scoped item search)
    • GET /catalogs/{catalog_id}/collections/{collection_id}/items/{item_id} (Get one item)
  • Serialization: Updates Pydantic models and serializers to support type: "Catalog" objects within the API tree (previously restricted to Collections).

  • Configuration: Controlled via ENABLE_CATALOGS_ROUTE environment variable (default: false).

Storage Strategy (Non-Breaking)

To ensure zero breaking changes and avoid complex database migrations, this implementation stores Catalog objects within the existing collections index.

  • Differentiation: Objects are distinguished using the type field (type: "Catalog" vs. type: "Collection").
  • Backward Compatibility: Existing queries for Collections remain unaffected as they continue to function on the same index structure.
  • No Overhead: No new Elasticsearch/OpenSearch indices or infrastructure changes are required to enable this feature.

Architectural Alignment

This implementation follows the proposed STAC API Catalogs Endpoint Extension (Community Extension).

It addresses the "Data Silo" problem by allowing organizations to host distinct catalogs on a single API instance, rather than deploying separate containers for every project or provider.

Changes

  • stac_fastapi/core/extensions/catalogs.py: Added the main extension logic and router.
  • stac_fastapi/core/models/: Added Catalog Pydantic models.
  • stac_fastapi/elasticsearch/database_logic.py: Added CRUD logic filtering by type: "Catalog".
  • tests/: Added comprehensive test suite (test_catalogs.py) covering CRUD operations and hierarchical navigation.

PR Checklist:

  • Code is formatted and linted (run pre-commit run --all-files)
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable
  • Changes are added to the changelog

@luipir
Copy link

luipir commented Dec 2, 2025

great feature @jonhealy1 that I really missed. Probably would be great to add what you wronte in the readme in the descrition of the PR. My 2c

@m-mohr
Copy link

m-mohr commented Dec 2, 2025

I'm not really sure what this adds that is not already possible with the existing specs. Can someone explain this a bit better?
Adding just /catalogs adds one more level in addition to catalogs, but there are already way how you can enable multi-level grouping of collections as described in #308 (comment)

@jonhealy1
Copy link
Collaborator Author

jonhealy1 commented Dec 3, 2025

@m-mohr I left a response to your comment on #308. In summary, a STAC API generally represents one root Catalog. Users have asked for this extension to avoid running numerous separate API instances. Discoverability across numerous running API instances, representing multiple catalogs, would be practically impossible.

While traversing child links is possible, it treats the API like a static file server. It is inefficient for discovery and technically infeasible for extensions. Applying server-side operations (like Aggregations, Sort, or Filter) to a recursive link crawl is functionally impossible without massive latency.

@m-mohr
Copy link

m-mohr commented Dec 3, 2025

Then please have a look at the STAC API - Children extension, it solves your issue of latency in a very similar way (just with a different name).

@jonhealy1 jonhealy1 changed the title Add /catalogs route feat: Add /catalogs route for Federated STAC API Support Dec 3, 2025
@jonhealy1
Copy link
Collaborator Author

@m-mohr Thanks for pointing us towards the Children extension. I think this is something that we will definitely want to add to this project in the near future.

@m-mohr
Copy link

m-mohr commented Dec 3, 2025

The PR says it implements federated STAC API support, but then it also says it only works on a single infrastructure. How is this federation working? If I have e.g. three STAC APIs already and I want to offer them via a single API, does this PR solve the issue?

@jonhealy1
Copy link
Collaborator Author

jonhealy1 commented Dec 3, 2025

If I understand what you're saying, yes. You would have three routes ie. catalogs/catalog1/collections, catalogs/catalog2/collections, catalogs/catalog3/collections that would effectively replace running 3 separate API instances. Each catalog would be like the root catalog of each of your deployed API instances.

@m-mohr
Copy link

m-mohr commented Dec 3, 2025

So if I have three instances that should remain separate instances on different machines, but we want an additional proxy that can query all three instances at a time, this PR doesn't provide a solution, right? You'd need to change them to a single instance?

@jonhealy1
Copy link
Collaborator Author

jonhealy1 commented Dec 3, 2025

Correct. It would be interesting to explore creating something like that - something that could query api instances across multiple machines - it could be done asynchronously and then the central api would gather the results. Pagination would be difficult. Sorting too I guess. It could be done though.

@m-mohr
Copy link

m-mohr commented Dec 3, 2025

There are tools that implement this already. I just wanted to understand the scope of the PR, thanks.

@m-mohr m-mohr mentioned this pull request Dec 3, 2025
@jonhealy1
Copy link
Collaborator Author

@m-mohr I think the reality for many users is that they do not want to run 3 separate API and database instances just to host 3 logical Catalogs. They would rather maintain and fund one infrastructure.

While scaling can be handled via cloud infrastructure, the application architecture needs to support this consolidation. The /catalogs extension provides the necessary routing to host these multiple contexts within that single, cost-effective instance.

@jonhealy1 jonhealy1 marked this pull request as ready for review December 3, 2025 16:47
@jonhealy1 jonhealy1 merged commit 4139cb4 into stac-utils:main Dec 8, 2025
8 checks passed
@jonhealy1 jonhealy1 deleted the add-catalog-route branch December 8, 2025 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants