-
Notifications
You must be signed in to change notification settings - Fork 18
Tweak parallelism from A/B config #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
What sort of cluster size are you talking about? Do you have example of test runs where Coiled failed to provision the desired number of instances? |
|
I don't know what you mean by AWS DoS, there are lots of limits and I don't see anything in #355 that describes what you were seeing. Maybe let's talk before you start making multiple 1000 worker clusters. That's fine to do occasionally, but we should make sure we aren't doing this way more than necessary, and 20 1000 worker clusters is a lot (even if you're making them one at a time). |
|
My above comment also applies to 100 workers. This PR makes the difference between 100 workers and 2000 workers. |
|
Sure, I'm not saying this PR has a problem, I'm just saying me/platform should be in the loop when you're thinking about doing larger scale testing. |
hendrikmakait
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @crusaderky! I have one nit about naming, feel free to ignore.
AB_environments/config.yaml
Outdated
| # Number of parallel test_*.py modules per A/B test job. | ||
| # Each module typically spawns one Coiled cluster at a time. | ||
| # Set to 1 to disable pytest-xdist. | ||
| modules_per_job: 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I first ran into the variable name without the description and it took me a moment to figure out how the number of modules would impact parallelism. Would it make sense to rename this to something like pytest_workers_per_job to highlight that this is about pytest-xdist parallelism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ncclementi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just left a small comment
| - runtime | ||
| - benchmarks | ||
| - stability | ||
| max_parallel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same way we explain opther parameters (e.g category, test_null_hypothesis, etc ) in the README, can we explain these two in the README as well?
I know there are some comments in the config.yml but it'll be useful having them explain in the readme too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
When one significantly increases the cluster size (#549), the current hardcoded levels of parallelism can cause Coiled to fail to allocate all the necessary workers.
Additionally, pytest-xdist tends to eat the pytest output, making debugging more difficult.
Expose parallelisation levels as knobs in the A/B config file and optionally disable pytest-xdist completely.