-
Notifications
You must be signed in to change notification settings - Fork 160
Increase retry session defaults and add environment variable override #4408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
227ca41 to
97df79a
Compare
LecrisUT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you run pre-commit. We don't have the bot to do that for you.
tmt/utils/__init__.py
Outdated
| DEFAULT_RETRY_SESSION_RETRIES: int = 10 | ||
| RETRY_SESSION_RETRIES: int = configure_constant( | ||
| DEFAULT_RETRY_SESSION_RETRIES, 'TMT_RETRY_SESSION_RETRIES') | ||
|
|
||
| DEFAULT_RETRY_SESSION_BACKOFF_FACTOR: float = 1 | ||
| RETRY_SESSION_BACKOFF_FACTOR: float = configure_float_constant( | ||
| DEFAULT_RETRY_SESSION_BACKOFF_FACTOR, 'TMT_RETRY_SESSION_BACKOFF_FACTOR') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default maximum is 17 min (of the last retry), which is even bigger than DEFAULT_BACKOFF_MAX. Some adjustments of the numbers is needed. Would have been good to have more control of the power as well since we probably shouldn't be doing the retry 10 times, but instead have a more aggressive power
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to discuss the actual values, for now I have just copied ENVFILE defaults below. The previous value allowed only the delays 0.1 s, 0.2 s, 0.4 s which are IMHO way too small if we are dealing with systems under a load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after a discussion and given the default BACKOFF_MAX value, I have decided to stick with 10 retries which result in a cumulative delay being 10 minutes and 7 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after a discussion and given the default BACKOFF_MAX value, I have decided to stick with 10 retries which result in a cumulative delay being 10 minutes and 7 seconds.
How did you come up with this number. My maths is 1 * (2^11 - 1) [s] = 34.11 [min] . No bug opinions on what the correct value should be, just make sure you also adjust the backoff_max used in creating the retry. Given that this is user configurable I think it should do a similar mathematical evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I did the math wrong, the cumulated sleep time is 487 seconds, which is 8 minutes and 7 seconds.
The DEFAULT_BACKOFF_MAX is 120 seconds. Therefore, even though 2^7 > 120, the delay would be 120.
Therefore, following the formula backoff_factor×2^(retry_number−1) we are getting
retry 1: 1s
retry 2: 2s
retry 3: 4s
...
retry 7: 64
retry 8: 120s (capped by default BACKOFF_MAX)
retry 9: 120s (capped)
retry 10: 120s (capped)
with cumulative delay being 487 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's (2 ^ (retry_number + 1) - 1) excluding capoff. But if the user alter the values from env-var, I do not expect them to account for the built-in backoff_max, thus the suggestion to eliminate it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not. Check backoff_factor documentation at https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html
It states {backoff factor} * (2 ** ({number of previous retries}))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not. Check
backoff_factordocumentation at https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html It states{backoff factor} * (2 ** ({number of previous retries}))
Aware of that, was off by one due to missing 0-index, should have been (2 ^ (retry_number) - 1) as in 2^0 + 2^1 + ... + 2^n = 2^(n+1) - 1
tmt/utils/__init__.py
Outdated
| # Defaults for HTTP/HTTPS retries for getting environment file | ||
| # Retry with exponential backoff, maximum duration ~511 seconds | ||
| ENVFILE_RETRY_SESSION_RETRIES: int = 10 | ||
| ENVFILE_RETRY_SESSION_BACKOFF_FACTOR: float = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please clarify? The constant is used in the very same file but since it was not concert I didn't touch unrelated settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new values of DEFAULT_RETRY_* match the values for these so these can be replaced with the common one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, unless we want to keep those settings separated on purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most like we do not want separate setting. There is no good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the motivation why the settings has been previously introduced specifically for the ENVFILE, most likely it seemed to be necessary to be more benevolent in this particular case. Given that the new settings could be overriden through environment variables, it could happen that someone will set the value too low which will reintroduce the issue with obtaining ENVFILE.
I mean, I can drop these lines, but if there is no reason to distinguish between the two, why it has been introduced as a separate settings at the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In June 2022 it seemed like a good idea, but not anymore. One set to rule them all, envvar file isn't special enough.
7e75c7f to
0e5bd6b
Compare
|
Checked with Ondrej that it's not super-urgent but let's fix as soon as possible. Proposing, together with the issue, for the next sprint. |
The default retry session configuration has been updated to address
frequent 503 error responses from ReportPortal:
- Increased DEFAULT_RETRY_SESSION_RETRIES from 3 to 10
- Increased DEFAULT_RETRY_SESSION_BACKOFF_FACTOR from 0.1 to 1
resulting in the following delays:
retry 1: 1s
retry 2: 2s
retry 3: 4s
...
retry 8: 120s (capped by default BACKOFF_MAX)
retry 9: 120s (capped)
retry 10: 120s (capped)
with cumulative delay being 8 minutes and 7 seconds.
These values are now configurable via environment variables using
the configure_constant() pattern:
- TMT_RETRY_SESSION_RETRIES overrides retry count
- TMT_RETRY_SESSION_BACKOFF_FACTOR overrides backoff factor
Added configure_float_constant() function to support float-type
environment variable configuration, following the pattern of
configure_constant() for integers.
This addresses errors like:
HTTPSConnectionPool: Max retries exceeded with url: /api/v1/...
(Caused by ResponseError('too many 503 error responses'))
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
0e5bd6b to
002f210
Compare
1. DEFAULT_RETRY_SESSION_RETRIES from 9 back to 8 in /home/ksrot/devel/github/tmt/tmt/utils/__init__.py 2. DEFAULT_RETRY_SESSION_BACKOFF_FACTOR from 1.0 to 2.0 in /home/ksrot/devel/github/tmt/tmt/utils/__init__.py 3. Documentation in /home/ksrot/devel/github/tmt/docs/overview.rst to reflect 8 retries with backoff factor 2.0 and the correct delay sequence: 2s, 4s, 8s, 16s, 32s, 64s, 128s, 256s With these settings (8 retries, backoff factor 2.0), the cumulative sleep time is 510 seconds (8 minutes 30 seconds). Assisted-by: Claude AI
The default retry session configuration has been updated to address frequent 503 error responses from ReportPortal:
These values are now configurable via environment variables using the configure_constant() pattern:
Added configure_float_constant() function to support float-type environment variable configuration, following the pattern of configure_constant() for integers.
This addresses errors like:
HTTPSConnectionPool: Max retries exceeded with url: /api/v1/... (Caused by ResponseError('too many 503 error responses'))
🤖 Generated with Claude Code
Pull Request Checklist