-
Notifications
You must be signed in to change notification settings - Fork 3.4k
HBASE-29735 Fix miscalculated timeoutInSeconds when using MoveWithAck #7519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
5bd7327 to
30a50d2
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Hi @jcongithub, Thanks for your contribution. Please also update the PR title to start with the Jira ticket ID. |
30a50d2 to
ae470c1
Compare
|
Thanks @PDavid Updated |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
| int retries = (ack) ? maxRetries : 1; | ||
| long timeoutInSeconds = regionsToMove.size() | ||
| * admin.getConfiguration().getLong(MOVE_WAIT_MAX_KEY, DEFAULT_MOVE_WAIT_MAX); | ||
| * admin.getConfiguration().getLong(MOVE_WAIT_MAX_KEY, DEFAULT_MOVE_WAIT_MAX) * retries; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems better to use this.conf instead of admin.getConfiguration(), conf is an inherited field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @guluo2016 for reviewing the PR.
While this.conf would also work, I chose to continue using admin.getConfiguration() for consistency and correctness:
The admin object is constructed from this.conf (see RegionMover lines 130–131), so admin.getConfiguration() should reflect the same configuration.
The original code already uses admin.getConfiguration() here, and several other places in this class also rely on the admin instance for configuration.
MoveWithAck follows the same pattern and uses admin to retrieve the configuration.
To keep the code consistent across the class and related logic, I think it’s cleaner to continue using admin.getConfiguration().
Please let me what you think. Thanks
|
Is it possible to add a unit test for this? |
|
@guluo2016 I also looked into mocking MoveWithAck to force a delay/timeout, but I haven’t found a reliable approach there either. If you know of an existing mechanism or have suggestions on how to simulate a slow region move in the current test framework, I’m happy to add a unit test. Otherwise, adding one may require non-trivial changes to the testing infrastructure. |
The PR tries to fix an issue in RegionMover when ack mode is enabled. It is related to HBASE-29734
Issues
In RegionMover, timeoutInSeconds was calculated without considering retry attempts MoveWithAck. So waitMoveTasksToFinish may exit even though there are remaining retries in MoveWithAck.
Solution
In RegionMover, use regions * wait_per_region * retries to calculate timeoutInSeconds