Skip to content

Conversation

@DomGarguilo
Copy link
Member

Fixes #5667

This PR adds the following:

multi-range tablet info API:

  • added an overload of TableOperations.getTabletInformation() which now accepts a List<Range> and the single range overload now delegates to this new method
  • added BalancerEnvironment.getTabletInformation(TableId, List<Range>, TabletInformation.Field...) so balancer plugins can request only the tablets they need
    Created new TabletInformationCollector class to centralize logic
  • overlapping ranges are merged, adds required Feilds and reads tablets via Ample

TableOperationsImpl and BalancerEnvironmentImpl now delegate to TabletInformationCollector for both single- and multi-range calls

@DomGarguilo DomGarguilo self-assigned this Dec 18, 2025
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, was there anything else you were planning to do before taking out of draft?

@DomGarguilo
Copy link
Member Author

This looks good, was there anything else you were planning to do before taking out of draft?

The main thing I wanted to look into was how we can improve things in TabletInformationCollector.getTabletInformation(). Right now we are looping over each range returned after merging the given range list and getting a new TabletsMetadata for each of those ranges. I think overall the new way of doing things here is an improvement but it would be really nice if we could get a single TabletsMetadata for that whole list of Ranges. This isn't possible right now and we would need to add a new builder option that accepts multiple Ranges.

Maybe this shouldn't block things here though since this new code is already an improvement and we can look into adding that new feature in a follow on.

@keith-turner
Copy link
Contributor

Right now we are looping over each range returned after merging the given range list and getting a new TabletsMetadata for each of those ranges. I think overall the new way of doing things here is an improvement but it would be really nice if we could get a single TabletsMetadata for that whole list of Ranges.

There is something similar to this here but it take extents and not row ranges. It expects those exact extents to exist. The impl for this uses a batch scanner. So we could have something like that, however the batch scanner will deliver things out of order. As this is done now, maybe it gives things back in the order the ranges were given?

I think what is here is a good start. Could have a follow on issue for optimization, but maybe that could only be done if needed.

@keith-turner
Copy link
Contributor

Re optimizing this, for now we could make the javadoc mention there is no expected order for returned tablets. That leaves open future optimizations that would return tablets in arbitrary order.

@DomGarguilo DomGarguilo marked this pull request as ready for review December 19, 2025 01:53
@DomGarguilo
Copy link
Member Author

@keith-turner I marked this ready for review. I think things are ready to go as-is. Two potential follow on tasks that could improve things here are:

  1. use RowRange in place of Range in these new/changes methods
  2. potentially optimize by creating a way to get TabletsMetadata from multiple input Ranges.

@DomGarguilo DomGarguilo merged commit c7b9c3d into apache:main Dec 22, 2025
8 checks passed
@DomGarguilo DomGarguilo deleted the tabletInfoSubset branch December 22, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide a mechanism to retrieve a subset of tablet information in the balancer SPI

2 participants