-
Notifications
You must be signed in to change notification settings - Fork 749
[Executorch] make slice_copy parallel #15830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/kimishpatel/214/base
Are you sure you want to change the base?
Conversation
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15830
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 3 Pending, 1 Unrelated FailureAs of commit 117b6e7 with merge base 4aaa7ff ( NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) ghstack-source-id: 323355686 Pull Request resolved: #15830
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds parallel processing to the slice_copy operation in ExecutorTorch to improve performance during large prefills in LLMs, where slice_copy can take 5-10% of execution time (primarily from rope implementation slicing).
Key Changes:
- Added multithreading support to
compute_slicefunction with workload-based thresholds - Parallel execution distributes work across leading dimensions using
parallel_for - Single-threaded fallback maintained for smaller workloads
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| kernels/portable/cpu/util/targets.bzl | Adds threadpool dependency required for parallel execution support |
| kernels/portable/cpu/util/slice_util.cpp | Implements parallel slice_copy with multithreading when leading_dims ≥ 8 and total_elements ≥ 32768, maintaining single-threaded fallback for smaller workloads |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (use_multithreading) { | ||
| // Use parallel_for to distribute work across leading dimensions | ||
| // Calculate grain size based on number of elements per leading dimension | ||
| const int64_t elements_per_leading_dim = length * trailing_dims; |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable elements_per_leading_dim is calculated but never used. It appears this was intended for grain size calculation but MIN_LEADING_DIMS_FOR_MT is used instead. Consider removing this unused variable or using it to calculate a more dynamic grain size based on workload characteristics.
| const int64_t elements_per_leading_dim = length * trailing_dims; |
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) ghstack-source-id: 324784683
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 324975499 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 325110790 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 325206643 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 325251276 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 325262631 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 325372208 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 325397489 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 325668666 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 326932760 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 326932760 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 326986639 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 327095994 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 327095994 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 327106882 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 327110500 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 327113306 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 327186826 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 327614758 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/) [ghstack-poisoned]
Pull Request resolved: #15830 When doing large prefills in LLMs, slice_copy takes about 5-10% time. Mainly coming from slicing in the rope implementation. ghstack-source-id: 327630241 Differential Revision: [D85532081](https://our.internmc.facebook.com/intern/diff/D85532081/)
Stack from ghstack (oldest at bottom):
When doing large prefills in LLMs, slice_copy takes about 5-10% time.
Mainly coming from slicing in the rope implementation.
Differential Revision: D85532081