Skip to content

Conversation

@ungarj
Copy link
Member

@ungarj ungarj commented Oct 27, 2025

fixes #9

  • core

    • driver configuration: replace archive with source
    • Sentinel-2 driver configuration:
      • removed archive, cat_baseurl and max_cloud_cover parameters
      • Sentinel2Source: new configuration element to customize a source
      • EarthSearch sentinel-2-c1-l2a collection is now the default
    • removed archives, known_catalogs, platforms.sentinel2.archives modules
    • io.items.get_item_property(): added default kwarg which behaves similar to dict.get(); property now also accepts a tuple of strings which are checked in that order and the first matching property is being returned
    • io.path.asset_mpath(): asset now accepts a tuple of strings and the first matching asset path is being returned
    • platforms.sentinel2: added preconfigured_sources module which holds all custom mapper functions between collections and data/metadata archives
    • removed platforms.sentinel2.path_mappers module
    • removed geometry module
    • make time parameter optional
    • settings: add lazy_load_stac_items option
  • packaging

    • added cql2 to dependencies

@Scartography
Copy link
Member

  • source_mappers.py in sentinel2 platform; the collections are immutable for now when set here, mapchete config wont get passed
  • query cant be just string, it should be a List[str] to be accepted via pystac_client search kwargs :)
  • ...

@Scartography
Copy link
Member

The KNOWN_SOURCES and the ingestion into Source base class still feels like it has defined params and overall it feels confusing how it is used as there are now multiple configs/settings for sentinel-2 Driver alone. The whole Sentinel2DriverConfig and source now feels more separated from the baseclasses, which is good, however still feels like a "sentient" config, which does more than it should (idk)

@Scartography
Copy link
Member

  • assets and or eo_band configs should be validated, get, parsed and co. before reading products so that they are matched/parsed from collection or items before reading products as xarrays, numpy arrays, currently quite nested in the IO/read modules, functions.

@ungarj ungarj changed the title WIP: rework catalog/collections/archives customizations rework catalog/collections/archives customizations Nov 7, 2025

class StacSearchConfig(BaseModel):
max_cloud_cover: float = 100.0
query: Optional[str] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query as List[str] ? for pystac_client from the get go? If string will need a parser, ideally here as well

@Scartography
Copy link
Member

When passing whole huge polygons to stac search as area the STAC API on the other side rejects it with InternalServerError; fix in my PR to this via just making bounds from area area=box(*area.bounds),

DEV make EOSTAC driver work, some typing for searching and adding basic test for COPERNICUS DEM
@Scartography
Copy link
Member

With this thee mhub job workers have quite large RAM usage, also tasks dont seem to be passed to scheduler proprly (might be too large entities); this would indicate potencial materializing of some metadata objects.

Also investigating if the search returns proper number of products, seeing quite large numbers.

@ungarj
Copy link
Member Author

ungarj commented Nov 12, 2025

With this thee mhub job workers have quite large RAM usage, also tasks dont seem to be passed to scheduler proprly (might be too large entities); this would indicate potencial materializing of some metadata objects.

Also investigating if the search returns proper number of products, seeing quite large numbers.

Please investigate, I'll do the same.

@Scartography
Copy link
Member

Buffer also applies to footprint mask it feels which should not be the case (https://github.com/mapchete/mapchete-eo/blob/main/mapchete_eo/platforms/sentinel2/product.py#L556).

@ungarj
Copy link
Member Author

ungarj commented Nov 13, 2025

Buffer also applies to footprint mask it feels which should not be the case (https://github.com/mapchete/mapchete-eo/blob/main/mapchete_eo/platforms/sentinel2/product.py#L556).

true. especially as we are already provide a mechanism to buffer the footprint separately.

ideally, we should buffer each individual mask before combining them but until we have found a more performant (i.e. raster based) implementation for buffer_array(), i opt against fixing this now.

@ungarj ungarj merged commit 5079eee into main Nov 18, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rethink handling different Sentinel-2 archives

3 participants