-
Notifications
You must be signed in to change notification settings - Fork 357
[Pipeline] Refactor buffer allocation in Inject Pipeline Pass #1525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Added BufferUsageCollector class to identify and collect buffers used in pipeline loop bodies, enabling proper multi-versioning for software pipelining. * Updated PipelineRewriter to handle local and outer block buffer allocations more effectively, ensuring that only necessary buffers are included in the pipeline. * Enhanced buffer remapping logic to prevent conflicts when buffers from outer blocks are used in multiple pipeline loops. This update improves the efficiency and correctness of buffer management during software pipelining.
* Adjusted formatting of buffer allocation declarations for improved readability. * Ensured consistent style in the codebase by aligning variable declarations. This change enhances code clarity without altering functionality.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughIntroduces BufferUsageCollector to identify buffers actually used in a statement and extends PipelineRewriter to accept local_allocs, enabling correct remapping/allocation of buffers across pipeline and outer blocks; adds conflict detection and propagation of remappings to outer block alloc_buffers. Changes
Sequence DiagramsequenceDiagram
participant PI as PipelineInjector
participant BUC as BufferUsageCollector
participant PR as PipelineRewriter
participant OB as OuterBlock (alloc_buffers)
PI->>BUC: Collect(stmt) %% gather used buffers
BUC-->>PI: pipeline_allocs
PI->>PI: Build local_allocs (track nested vs outer)
PI->>PI: Check conflicts across pipelines
PI->>PR: Construct(pipeline_allocs, local_allocs, ...)
PR->>PR: Compute versions (skip non-accessed buffers)
PR-->>PI: buffer_remap
PI->>OB: Apply buffer_remap to alloc_buffers
OB->>OB: Update alloc_buffers entries
OB->>OB: Erase old mappings
OB-->>PI: Remappings applied
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/transform/inject_pipeline.cc (2)
79-85: Inconsistent buffer addition for nested block allocations.In
VisitStmt_(BlockNode*), buffers fromop->alloc_buffersare added directly toused_buffers_without checkingallocated_buffers_, unlikeAddBuffer()which filters throughallocated_buffers_. This appears intentional since nested block buffers need multi-versioning regardless, but please verify this asymmetry is correct.If this is intentional, consider adding a brief comment explaining why nested block allocations bypass the
allocated_buffers_check.
1148-1157: Consider extracting nested block iteration into a helper.The pattern of iterating over
pipeline_body_seq->seqto find nested blocks withSeqStmtbodies appears three times in this function (lines 1122-1133, 1148-1157, and 1216-1226). Consider extracting this into a helper function or consolidating the operations into a single pass.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/transform/inject_pipeline.cc
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
Repo: tile-ai/tilelang PR: 794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.
Applied to files:
src/transform/inject_pipeline.cc
🧬 Code graph analysis (1)
src/transform/inject_pipeline.cc (2)
src/transform/multi_version_buffer_rewriter.cc (2)
buffer(255-265)buffer(255-255)src/transform/inject_tma_barrier.cc (3)
collector(416-416)collector(477-477)rewriter(96-96)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (5)
src/transform/inject_pipeline.cc (5)
293-315: LGTM!The constructor documentation clearly explains the distinction between
pipeline_allocs(all buffers needing multi-versioning) andlocal_allocs(buffers for re-allocation in the rewritten block). This separation properly handles outer-block buffers that should not be re-allocated locally.
410-414: LGTM!Clean accessor method that properly exposes the buffer remapping for updating outer block allocations.
1159-1170: Good addition of conflict detection for multi-versioned buffers.This check correctly prevents undefined behavior when the same external buffer would be multi-versioned across multiple pipeline loops. The error message is clear and actionable.
1277-1296: LGTM!The pending buffer remap logic correctly applies remappings to outer block
alloc_buffersafter visiting children. Erasing frompending_buffer_remap_after applying prevents duplicate remapping. The traversal order (children first viaStmtExprMutator::VisitStmt_, then remap application) ensures pipeline rewriting inForNodevisitor happens before this remap is applied.
1325-1331: LGTM!The new member variables are appropriately typed and serve clear purposes:
allocated_buffers_: enables the collector to filter function I/O bufferspending_buffer_remap_: defers remap application to outer block processingbuffers_used_in_pipeline_: tracks external buffers for conflict detection
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@regression-perf |
Performance Benchmark ReportTriggered by: @LeiWang1999 Results
Artifacts
|
as title.
Summary by CodeRabbit
Refactor
Tests
✏️ Tip: You can customize this high-level summary in your review settings.