Skip to content

Conversation

@jotamayo97
Copy link
Contributor

@jotamayo97 jotamayo97 commented Apr 17, 2025

closes #56
The way I see it, since this endpoint is meant to accept all the same parameters as /observations, it’s essentially just an observation search grouped by places.

Instead of returning individual observations, it returns a list of places along with the count of observations that match the given filters .If place_ids are specified, only observations from those places will be considered, so only those places will appear in the results. However, it's still possible to filter by any other parameter.
image

Places with zero observations are not included.
I've only implemented it in v1 for now. I haven't fully understood the purpose and current state of v2 yet, but I can implement it there as well if you prefer.

"name": "United States",
"slug": "united-states",
"display_name_autocomplete": "United States",
"display_name": "United States",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I used "CorePlace" as a place object for my response, I need to add this to the test for working. I'm open to suggestions about this, maybe I should use another object

by_place: {
terms: {
field: "place_ids",
size: 1000,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have enough context to define the appropriate size for a search like this — 1000 seems like a reasonable limit to me, but my reasoning isn't very strong here. I'm open to suggestions.

Copy link
Contributor

@kueda kueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • let's go with observations/place_counts, not observations/places_counts
  • this still needs a param that allows the user to specify which places are being bucketed, so something like count_place_id

@jotamayo97
Copy link
Contributor Author

I chose the plural places by imitating /observations/species_counts. I think it's better to aim for consistency as much as possible, but I'm fine with changing it.

As for the count_place_id param, maybe I’m misunderstanding something, but since I’m using the same parameters as /observations, I can already filter by place_id to specify which places I want to query. What would be the difference between that and a separate count_place_id?

In this new endpoint, results are ALWAYS bucketed by place. You can choose to limit which places appear in the output or not, and that's controlled via the place_id parameter.

@jotamayo97 jotamayo97 requested a review from kueda May 11, 2025 19:52
Copy link
Contributor

@kueda kueda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the plural places by imitating /observations/species_counts. I think it's better to aim for consistency as much as possible, but I'm fine with changing it.

One the many weird and annoying things about English is that "species" is both the singular and plural form, e.g. "1 species, 2 species, 3 species."

As for the count_place_id param, maybe I’m misunderstanding something, but since I’m using the same parameters as /observations, I can already filter by place_id to specify which places I want to query. What would be the difference between that and a separate count_place_id?

I think one of us is misunderstanding, so maybe @loarie can clarify. My understanding of the request is to enable a chloropleth map, so given the observation query parameters, show the number of matching observations in an arbitrary collection of places, e.g. "show me the number of plants observations in France and Spain and don't show me counts for any other places". You can't achieve that with what you've got here and place_id=france,spain b/c you get a lot of irrelevant places.

@kfc35
Copy link

kfc35 commented Oct 3, 2025

I believe the issue is that there is not an additional filter on the aggregation itself in elastic search, hence why kueda is saying you will get a lot of irrelevant places returned in the results.

Each observation has an array of place id’s, and from what I can see, it can span different granularities (For example, Europe > Spain > Andalucía > Cádiz). This observation, for example, has 16 place id’s returned with this observation via the API, including some beyond the 4 I described (community curated ones like the Mediterranean — you can look at the full list by clicking under the map, on “Details”). If I call the observation API with just one of its 16 place id’s, this observation will be returned with its list of 16 place id’s.

If I’m reading this pull request correctly, the logic will not just return the counts for the requested place_ids, but also counts for the other additional place id’s in each observations’ place id arrays (i.e., instead of just getting a count for my request for observations in Andalucía, I will get counts for Cádiz, for Spain, for the Mediterranean etc. as different entries… but these counts will be restricted to observations returned with the original place_id filter).

So, it seems to me that there are two ways forward with the pull request:
1: Make an additional parameter for “what place_id buckets are filtered in” called count_place_id (to distinguish from “what place_id’s are filtered for when looking for the observations”) , which I believe is what kueda is asking, and then use that new parameter for a filter on the aggregation
2: OR Apply the same place_id’s used to filter in the observations endpoint for a filter on the aggregation, which I think is what you were intending to do.
Either way, I think you’d need to hook in placeFilterForUser from observation_query_builder.js into the aggregation

I think the official iNat team can chime in about why it’s important to distinguish between place_id and count_place_id in the endpoint with a separate parameter, cause I agree that the API might be less confusing to outside users if they were to be the same parameter. Perhaps count_place_id can be an optional parameter that defaults to place_id if not specified?

If this pull request is stale / you’d like to move on, I’d be happy to continue your work and pull your branch into my fork @jotamayo97

@jotamayo97
Copy link
Contributor Author

Thanks for the explanation @kfc35 .
Of course, you can continue the task if you want 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

observation counts in places API endpoint

3 participants