Skip to content

Conversation

@Rachmanino
Copy link
Collaborator

@Rachmanino Rachmanino commented Dec 22, 2025

Summary by CodeRabbit

  • Refactor

    • Simplified loop/context nesting to reduce complexity and improve maintainability.
    • Added default template parameters for rasterization helpers so common usage no longer requires an explicit offset.
  • Chores

    • Cache storage path now includes the application version to avoid cross-version cache conflicts.
  • Tests / CI

    • CI test discovery now excludes known problematic tests to stabilize runs.

✏️ Tip: You can customize this high-level summary in your review settings.

…nd streamline parser.py for better readability
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link

coderabbitai bot commented Dec 22, 2025

Walkthrough

Added default template parameter offset = 0 to two CUDA swizzle functions; combined frame and iterator binding in a parser For-handling fast path; included package __version__ in sparse cache path; and added temporary CI test discovery filters to exclude sink, vs_sparse, tilelibrary_gemm, and jit_gemm_ctypes from certain non-distributed runs.

Changes

Cohort / File(s) Change Summary
CUDA template signature defaults
src/tl_templates/cuda/threadblock_swizzle.h
Added default template argument offset = 0 to rasterization2DRow() and rasterization2DColumn() declarations (changed template <int panel_width, int offset>template <int panel_width, int offset = 0). Minor formatting tweak in Column declaration.
Parser context-manager refactor
tilelang/language/overrides/parser.py
Combined frame context and iterator binding into a single with for non-SerialStepSpec stepped-serial For handling; moved stepped_index binding and related assignments into that combined block; removed a trailing space in an extent calculation.
Sparse cache versioning
tilelang/utils/sparse.py
Imported __version__ and included it in the sparse cache directory path so caches are separated by package version.
CI test filtering (temporary)
.github/workflows/ci.yml
Added grep -vE filters to test discovery in multiple non-distributed blocks to temporarily exclude tests matching `sink

Sequence Diagram(s)

(omitted — changes do not introduce a new multi-component control flow requiring a sequence diagram)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title references adding a default parameter to the swizzle2D template, which directly matches the main change in threadblock_swizzle.h where default offset=0 was added to rasterization2D functions.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch wt/fix-l2-swizzle

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 189ed6f and 75b67b0.

📒 Files selected for processing (1)
  • .github/workflows/ci.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test-nvidia
🔇 Additional comments (2)
.github/workflows/ci.yml (2)

125-125: Clarify whether test exclusions are related to this PR or should be tracked separately.

The "sink" and "vs_sparse" test exclusions are marked as temporary, but no swizzle2D template references exist in the excluded test files (test_example_attention_sink.py, test_vs_sparse_attn.py). This suggests these exclusions may be unrelated to the swizzle2D bug fix.

If these are pre-existing issues, consider tracking them separately rather than including them in this PR. If they are related to the changes, a brief explanation of the connection would clarify the intent.


151-151: Clarify the relationship between GEMM test exclusions and swizzle changes, and ensure tracking exists for re-enabling.

The exclusion pattern targets two tests with different characteristics:

  • tilelibrary_gemm: Uses swizzle functionality (make_swizzled_layout), so swizzle changes could directly impact it
  • jit_gemm_ctypes: Does not use swizzle; marked as problematic for different reasons related to ctypes-based JIT execution

Both are marked "temporarily disable problematic tests" with no tracking issue or timeline visible. Please verify:

  1. The specific failure modes for each test
  2. Whether the swizzle changes in the codebase impact tilelibrary_gemm specifically
  3. If there's a tracking issue for re-enabling these tests and the expected resolution timeline

Likely an incorrect or invalid review comment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe7bbdd and 7cce21c.

📒 Files selected for processing (2)
  • src/tl_templates/cuda/threadblock_swizzle.h
  • tilelang/language/overrides/parser.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/language/overrides/parser.py (2)
tilelang/language/parser/parser.py (2)
  • bind_for_value (78-111)
  • bind_assign_value (114-160)
tilelang/language/tir/ir.py (1)
  • serial (25-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format-check
🔇 Additional comments (3)
tilelang/language/overrides/parser.py (2)

167-170: LGTM! Clean refactoring that reduces nesting.

Combining the two context managers into a single with statement improves readability while maintaining identical execution semantics.


198-206: LGTM! Consistent refactoring pattern.

The consolidation of context managers follows the same clean pattern used in the fast path above, improving maintainability.

src/tl_templates/cuda/threadblock_swizzle.h (1)

26-27: Default offset parameter is correct and backward compatible.

Both rasterization2DColumn and rasterization2DRow now consistently define template <int panel_width, int offset = 0>. All existing call sites instantiate or invoke these functions with only the panel_width argument, making them fully compatible with the new default parameter.

namespace tl {

template <int panel_width, int offset> TL_DEVICE dim3 rasterization2DRow() {
template <int panel_width, int offset = 0> TL_DEVICE dim3 rasterization2DRow() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

rg -nP --type=cpp --type=cu --type=h -C3 'rasterization2DRow\s*<'

Repository: tile-ai/tilescale

Length of output: 87


🏁 Script executed:

rg -nP --type=cpp --type=h -C3 'rasterization2DRow\s*<'

Repository: tile-ai/tilescale

Length of output: 43


🏁 Script executed:

find . -name "threadblock_swizzle.h" -type f 2>/dev/null | head -20

Repository: tile-ai/tilescale

Length of output: 151


🏁 Script executed:

cat -n ./src/tl_templates/cuda/threadblock_swizzle.h | head -50

Repository: tile-ai/tilescale

Length of output: 2344


🏁 Script executed:

rg -n 'rasterization2DRow' --type cpp --type h -B2 -A2

Repository: tile-ai/tilescale

Length of output: 956


🏁 Script executed:

rg 'rasterization2DRow\s*<' --type cpp --type h -C5

Repository: tile-ai/tilescale

Length of output: 43


🏁 Script executed:

rg 'rasterization2DRow' --type cpp --type h | grep -v 'template' | head -20

Repository: tile-ai/tilescale

Length of output: 43


🏁 Script executed:

rg 'rasterization2D' --type cpp --type h -C3

Repository: tile-ai/tilescale

Length of output: 2416


🏁 Script executed:

rg 'rasterization2DRow' --type cpp --type h --glob '!threadblock_swizzle.h'

Repository: tile-ai/tilescale

Length of output: 43


🏁 Script executed:

cat -n ./src/tl_templates/hip/threadblock_swizzle.h

Repository: tile-ai/tilescale

Length of output: 2357


Address API inconsistency between CUDA and HIP implementations.

The default parameter offset = 0 was added to the CUDA version, but the HIP version in src/tl_templates/hip/threadblock_swizzle.h remains unchanged without the offset parameter. Both rasterization2DRow() and rasterization2DColumn() have diverged signatures. Update the HIP version to match the CUDA signature for API consistency, or document why they intentionally differ.

🤖 Prompt for AI Agents
In src/tl_templates/cuda/threadblock_swizzle.h around line 7 the CUDA template
signature template<int panel_width, int offset = 0> dim3 rasterization2DRow()
introduces a default offset parameter that the HIP implementation does not have;
to restore API consistency, update src/tl_templates/hip/threadblock_swizzle.h to
add the matching default template parameter (offset = 0) to the
rasterization2DRow() declaration (and make the same change for
rasterization2DColumn() if present) so both CUDA and HIP signatures match
exactly, keeping behavior identical and leaving a comment only if divergence is
intentional.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tilelang/utils/sparse.py (1)

15-15: Excellent improvement for cache isolation.

Version-aware cache paths prevent incompatible compiled extensions from different package versions from conflicting. This is the correct approach for managing native extension caches.

The only minor consideration is that old version caches will accumulate over time, but this is an acceptable tradeoff for cache safety.

Optional: Verify version format is filesystem-safe

While semantic versioning (e.g., "1.0.0") is typically safe for filesystem paths, you could optionally verify that __version__ doesn't contain problematic characters:

#!/bin/bash
# Verify that __version__ follows a safe format for filesystem paths
python -c "from tilelang import __version__; import re; assert re.match(r'^[0-9]+\.[0-9]+\.[0-9]+', __version__), f'Version format may not be filesystem-safe: {__version__}'; print(f'Version format is safe: {__version__}')"
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7cce21c and da0eea1.

📒 Files selected for processing (1)
  • tilelang/utils/sparse.py
🔇 Additional comments (1)
tilelang/utils/sparse.py (1)

9-11: LGTM!

The import and comment clearly document the purpose of version-aware caching. This is a good practice for ensuring cache isolation between different package versions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da0eea1 and ed1f4b7.

📒 Files selected for processing (1)
  • .github/workflows/ci.yml

@Rachmanino Rachmanino force-pushed the wt/fix-l2-swizzle branch 4 times, most recently from a4d8df2 to 189ed6f Compare December 23, 2025 07:20
@Rachmanino
Copy link
Collaborator Author

closed as part of #40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants