Skip to content

Conversation

@sgjzfzzf
Copy link
Contributor

@sgjzfzzf sgjzfzzf commented Dec 20, 2025

This PR resolves #1469. It moves most of the if-else checkers for multi-backends in tilelang/cache/kernel_cache.py, and dispatches the different implementations by inheritance in different subclasses.

We retained some specialized implementations for CuteDSL and considered removing them in the future once this backend is ready.

Summary by CodeRabbit

  • New Features

    • Added CuTeDSL backend plus backend-specific cache implementations for TVM FFI, Cython, NVRTC, Torch and CuTeDSL.
    • Centralized dispatch routing for selecting the appropriate backend at runtime.
  • Changes

    • execution_backend options updated (removed "auto"; "tvm_ffi" is the default; "cutedsl" added).
    • Improved on-disk cache handling, path/IO consistency, and adapter-specific save/load flows.
  • Tests / CI

    • Added CUDA JIT test runs including a cache-disabled pass to catch cache-related regressions.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 20, 2025

📝 Walkthrough

Walkthrough

Refactors kernel caching from a singleton to a dispatch map selecting backend-specific KernelCache subclasses; splits disk I/O into helper methods on KernelCache; adds backend adapters (TVM FFI, NVRTC, CuTeDSL, Cython, Torch) implementing backend-specific save/load logic; updates cached(...) routing and CI to run cache/no-cache CUDA JIT tests.

Changes

Cohort / File(s) Summary
Cache dispatcher & public API
tilelang/cache/__init__.py
Replaced singleton _kernel_cache_instance with _dispatch_map; cached(...) now resolves target/execution_backend, applies env/defaults, logs resolution when verbose, and routes to backend-specific cache; updated execution_backend literal to include cutedsl.
Base KernelCache refactor
tilelang/cache/kernel_cache.py
Moved module-level path constants into KernelCache instance attributes; added helpers _save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, _save_so_cubin_to_disk, _get_required_files, _load_kernel_source, _set_adapter_cache_path, _build_kernel; _save_kernel_to_disk/_load_kernel_from_disk now delegate to helpers.
TVM FFI adapter
tilelang/jit/adapter/kernel_cache.py
Added TVMFFIKernelCache subclass with kernel_lib_path = "executable.so" and implementations for _save_wrapper_kernel_code_to_disk and _save_so_cubin_to_disk.
NVRTC adapter
tilelang/jit/adapter/nvrtc/kernel_cache.py
Added NVRTCKernelCache subclass (kernel_lib_path = "kernel.cubin", kernel_py_path = "kernel.py") that writes the Python wrapper prior to saving the binary.
CuTeDSL adapter
tilelang/jit/adapter/cutedsl/kernel_cache.py
Added CuTeDSLKernelCache subclass (paths: host_kernel_path, launcher_lib_path, launcher_cpp_path); no-op device source save; saves launcher lib/cpp; custom _get_required_files, _load_kernel_source (returns empty), _set_adapter_cache_path, and _build_kernel that can create JITKernel from DB.
Cython adapter
tilelang/jit/adapter/cython/kernel_cache.py
Added CythonKernelCache subclass inheriting KernelCache (no overrides).
Torch adapter
tilelang/jit/adapter/torch/kernel_cache.py
Added TorchKernelCache subclass inheriting KernelCache (no overrides).
CI updates
.github/workflows/ci.yml
Added CUDA JIT test steps (cache-enabled and cache-disabled passes) and generalized ignore patterns for CUDA JIT tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Caller
  participant Dispatcher as CacheDispatcher (_dispatch_map)
  participant Backend as KernelCache (backend-specific)
  participant Adapter
  participant FS as Filesystem

  Caller->>Dispatcher: cached(func, target, execution_backend, ...)
  alt backend found
    Dispatcher->>Backend: select backend instance
    Dispatcher->>Backend: cached(..., set cache path)
    Backend->>Backend: _set_adapter_cache_path(kernel, cache_path)
    Backend->>Adapter: request sources/lib/params
    Backend->>FS: _get_required_files(cache_path)
    opt files missing
      Backend->>FS: _save_kernel_source_code_to_disk(...)
      Backend->>FS: _save_wrapper_kernel_code_to_disk(...)
      Backend->>FS: _save_so_cubin_to_disk(...)
    end
    Backend->>Backend: _build_kernel(...) -> JITKernel | None
    Backend-->>Caller: return JITKernel or None
  else unknown backend
    Dispatcher-->>Caller: raise ValueError
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

"I hopped through code with whiskers twitching,
dispatch maps laid out, no more singleton hitching.
Each backend a burrow, paths neat and tidy,
kernels tucked safe, build steps not mighty.
— a rabbit, caching carrots 🥕🐇"

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.28% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly summarizes the main change: refactoring KernelCache to use an inheritance-based design.
Linked Issues check ✅ Passed The pull request addresses all objectives from issue #1469: eliminates if-else branching via dispatch-based routing in init.py, introduces inheritance-based subclasses for backend-specific behavior, enables adding new backends without modifying core logic, and preserves correct behavior across existing backends.
Out of Scope Changes check ✅ Passed All changes are scoped to the kernel cache refactoring objectives. Updates to .github/workflows/ci.yml for kernel cache testing are necessary validation of the refactored system and are in scope.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6195770 and 631861c.

📒 Files selected for processing (2)
  • tilelang/cache/__init__.py
  • tilelang/cache/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • tilelang/cache/__init__.py
🪛 Ruff (0.14.10)
tilelang/cache/kernel_cache.py

439-439: Unused method argument: kernel

(ARG002)


439-439: Unused method argument: cache_path

(ARG002)

tilelang/cache/__init__.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (8)
tilelang/cache/__init__.py (4)

5-18: LGTM: Clean import structure with proper circular import handling.

The use of TYPE_CHECKING for the base KernelCache import avoids circular dependencies while maintaining type hint accuracy. All backend-specific subclasses are imported and will be used in the dispatch map.


20-27: LGTM: Dispatch map correctly implements backend routing.

The _dispatch_map provides clean separation of backend-specific caching logic. Each backend has its own singleton instance, which is appropriate for the cache pattern.


44-72: Well-structured environment variable handling and backend resolution.

The centralized environment variable processing at lines 44-51 correctly establishes a single source of truth. The target normalization and backend resolution flow (lines 53-59) properly handles the "auto" case via resolve_execution_backend. The conditional logging (lines 60-72) provides useful diagnostics without noise when the user explicitly specified a backend.


73-86: Dispatch logic correctly routes to backend-specific cache implementations.

The dispatch check (line 73) and delegation (lines 74-84) properly route kernel caching to the appropriate backend. The error handling (lines 85-86) provides a clear message when an unsupported backend is requested.

tilelang/cache/kernel_cache.py (4)

39-42: LGTM: Instance path attributes enable backend-specific customization.

The introduction of instance-level path attributes (device_kernel_path, host_kernel_path, kernel_lib_path, params_path) provides a clean extension point for backend-specific subclasses to override file naming conventions (e.g., .cubin for NVRTC vs. .so for TVM FFI).


263-310: LGTM: Clean delegation to extensible helper methods.

The refactoring of _save_kernel_to_disk to delegate to _save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, and _save_so_cubin_to_disk provides clear extension points for backend-specific subclasses while maintaining proper error handling.


312-376: LGTM: Well-structured load flow with clear separation of concerns.

The refactored _load_kernel_from_disk cleanly separates:

  1. Required file validation via _get_required_files() (line 346)
  2. Optional source loading via _load_kernel_source() (line 352)
  3. Parameter deserialization (lines 354-362)
  4. Kernel reconstruction via _build_kernel() (lines 364-376)

This provides clear extension points for subclasses to customize each phase.


439-441: Static analysis false positive: unused arguments are intentional.

The static analysis warning about unused kernel and cache_path parameters is a false positive. This is a base implementation designed to be overridden by subclasses (e.g., NVRTCKernelCache may set adapter state using these parameters). The parameters must remain for interface consistency.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@sgjzfzzf
Copy link
Contributor Author

I tested them on CUDA with tvm-ffi, cython, and ctypes, and it works very well(except for one bug that existed in the previous version, which leads to an exception on ctypes) on the environment I have. They may need more tests on other backends.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tilelang/cache/__init__.py (1)

39-39: Type hint is inconsistent with dispatch pool keys.

The Literal type is missing "cutedsl" which exists in _dispatch_pool. Also, None is allowed by the type hint but would cause a KeyError at runtime since there's no None key in the pool.

Suggested fix
-    execution_backend: Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "torch"] | None = "auto",
+    execution_backend: Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "cutedsl", "torch"] = "auto",
🧹 Nitpick comments (6)
tilelang/cache/torch_kernel_cache.py (1)

4-6: Superfluous *args, **kwargs in __new__ signature.

The base class KernelCache.__new__(cls) does not accept additional arguments. Since these cache classes are instantiated without arguments in the dispatch pool, the *args, **kwargs parameters are unnecessary and could be removed for clarity.

This applies to all similar subclasses in this PR.

Suggested simplification
 class TorchKernelCache(KernelCache):
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
+    def __new__(cls):
+        return super().__new__(cls)
tilelang/cache/__init__.py (1)

47-57: Consider adding a guard for invalid backend keys.

If an unsupported execution_backend value is passed, _dispatch_pool[execution_backend] will raise a raw KeyError. A more descriptive error could improve debuggability.

Optional: Add validation with clearer error
+    if execution_backend not in _dispatch_pool:
+        raise ValueError(
+            f"Unsupported execution_backend '{execution_backend}'. "
+            f"Allowed: {list(_dispatch_pool.keys())}"
+        )
     return _dispatch_pool[execution_backend].cached(
tilelang/cache/tvm_ffi_kernel_cache.py (1)

10-11: Remove redundant __new__ method.

The __new__ method simply delegates to the parent without any custom logic. Since the parent KernelCache already implements singleton behavior, this override is unnecessary.

🔎 Proposed fix
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
-
tilelang/cache/nvrtc_kernel_cache.py (1)

11-12: Remove redundant __new__ method.

The __new__ method simply delegates to the parent without any custom logic, making it unnecessary.

🔎 Proposed fix
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
-
tilelang/cache/cutedsl_kernel_cache.py (2)

16-17: Remove redundant __new__ method.

The __new__ method simply delegates to the parent without any custom logic, making it unnecessary.

🔎 Proposed fix
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
-

42-43: Consider using unpacking for list concatenation.

As suggested by the static analysis tool, unpacking provides cleaner syntax for combining lists.

🔎 Proposed refactor using unpacking
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 168aec7 and 334a13d.

📒 Files selected for processing (9)
  • tilelang/cache/__init__.py (2 hunks)
  • tilelang/cache/auto_kernel_cache.py (1 hunks)
  • tilelang/cache/ctypes_kernel_cache.py (1 hunks)
  • tilelang/cache/cutedsl_kernel_cache.py (1 hunks)
  • tilelang/cache/cython_kernel_cache.py (1 hunks)
  • tilelang/cache/kernel_cache.py (4 hunks)
  • tilelang/cache/nvrtc_kernel_cache.py (1 hunks)
  • tilelang/cache/torch_kernel_cache.py (1 hunks)
  • tilelang/cache/tvm_ffi_kernel_cache.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
tilelang/cache/cython_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/torch_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/tvm_ffi_kernel_cache.py (3)
tilelang/cache/kernel_cache.py (4)
  • KernelCache (25-449)
  • _save_wrapper_kernel_code_to_disk (414-418)
  • _safe_write_file (256-263)
  • _save_so_cubin_to_disk (420-425)
tilelang/cache/cutedsl_kernel_cache.py (1)
  • _save_so_cubin_to_disk (24-39)
tilelang/cache/nvrtc_kernel_cache.py (1)
  • _save_so_cubin_to_disk (23-30)
tilelang/cache/cutedsl_kernel_cache.py (3)
tilelang/cache/kernel_cache.py (6)
  • KernelCache (25-449)
  • _save_kernel_source_code_to_disk (407-412)
  • _save_so_cubin_to_disk (420-425)
  • _safe_write_file (256-263)
  • _load_binary (250-253)
  • _get_required_files (427-430)
tilelang/cache/nvrtc_kernel_cache.py (2)
  • _save_kernel_source_code_to_disk (14-21)
  • _save_so_cubin_to_disk (23-30)
tilelang/cache/tvm_ffi_kernel_cache.py (1)
  • _save_so_cubin_to_disk (19-24)
tilelang/cache/auto_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/ctypes_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/kernel_cache.py (6)
tilelang/cache/cutedsl_kernel_cache.py (3)
  • _save_kernel_source_code_to_disk (20-21)
  • _save_so_cubin_to_disk (24-39)
  • _get_required_files (42-43)
tilelang/cache/tvm_ffi_kernel_cache.py (2)
  • _save_wrapper_kernel_code_to_disk (13-17)
  • _save_so_cubin_to_disk (19-24)
tilelang/jit/kernel.py (3)
  • JITKernel (39-794)
  • kernel_source (644-645)
  • get_kernel_source (465-476)
tilelang/jit/adapter/cutedsl/adapter.py (1)
  • get_kernel_source (206-214)
tilelang/jit/adapter/nvrtc/adapter.py (1)
  • get_kernel_source (179-190)
tilelang/jit/adapter/ctypes/adapter.py (2)
  • get_kernel_source (296-302)
  • libpath (282-284)
tilelang/cache/__init__.py (8)
tilelang/cache/auto_kernel_cache.py (1)
  • AutoKernelCache (4-6)
tilelang/cache/ctypes_kernel_cache.py (1)
  • CTypesKernelCache (4-6)
tilelang/cache/cutedsl_kernel_cache.py (1)
  • CuTeDSLKernelCache (10-49)
tilelang/cache/cython_kernel_cache.py (1)
  • CythonKernelCache (4-6)
tilelang/cache/nvrtc_kernel_cache.py (1)
  • NVRTCKernelCache (7-30)
tilelang/cache/torch_kernel_cache.py (1)
  • TorchKernelCache (4-6)
tilelang/cache/tvm_ffi_kernel_cache.py (1)
  • TVMFFIKernelCache (7-24)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/nvrtc_kernel_cache.py (2)
tilelang/cache/kernel_cache.py (4)
  • _save_kernel_source_code_to_disk (407-412)
  • _safe_write_file (256-263)
  • _load_binary (250-253)
  • _save_so_cubin_to_disk (420-425)
tilelang/cache/tvm_ffi_kernel_cache.py (1)
  • _save_so_cubin_to_disk (19-24)
🪛 Ruff (0.14.8)
tilelang/cache/cutedsl_kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (7)
tilelang/cache/auto_kernel_cache.py (1)

4-6: LGTM, same note as TorchKernelCache.

The same optional simplification of removing *args, **kwargs applies here.

tilelang/cache/ctypes_kernel_cache.py (1)

4-6: LGTM, same note as TorchKernelCache.

The same optional simplification of removing *args, **kwargs applies here.

tilelang/cache/cython_kernel_cache.py (1)

4-6: LGTM, same note as TorchKernelCache.

The same optional simplification of removing *args, **kwargs applies here.

tilelang/cache/__init__.py (2)

22-30: Dispatch pool design looks good for the multi-backend refactor.

The dictionary-based dispatch cleanly replaces the if-else pattern as intended by issue #1469. Each backend gets its own singleton cache instance.


75-76: Verify this module-level behavior is intentional.

If TILELANG_CLEAR_CACHE env var is set, this will raise RuntimeError during import, preventing any use of the module. This seems intentional as a safeguard, but confirm this is the desired UX.

tilelang/cache/kernel_cache.py (2)

39-45: Good refactoring: centralized path configuration.

Moving from module-level path constants to per-class attributes enables backend-specific path customization without modifying the base class. This aligns well with the PR's goal of eliminating conditional logic.


407-449: Well-structured I/O abstraction.

The new helper methods (_save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, _save_so_cubin_to_disk, _get_required_files, _load_kernel_source) provide clean extension points for subclasses while centralizing common disk I/O patterns. This significantly improves maintainability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tilelang/cache/__init__.py (1)

47-58: Consider defensive error handling for invalid backend keys.

The direct dictionary lookup on line 48 will raise a KeyError if an invalid backend_key is provided. While the Literal type hint provides compile-time protection, runtime validation would offer better error messages when type checking is bypassed or when backend names are dynamically constructed.

🔎 Proposed enhancement
 backend_key: str = execution_backend if execution_backend is not None else "auto"
+if backend_key not in _dispatch_pool:
+    valid_backends = ", ".join(_dispatch_pool.keys())
+    raise ValueError(
+        f"Invalid execution_backend '{backend_key}'. "
+        f"Valid options are: {valid_backends}"
+    )
 return _dispatch_pool[backend_key].cached(
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 334a13d and 676b4d0.

📒 Files selected for processing (2)
  • examples/gemm/example_gemm.py (1 hunks)
  • tilelang/cache/__init__.py (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • examples/gemm/example_gemm.py
📚 Learning: 2025-12-18T04:50:00.512Z
Learnt from: silentCoder-dev
Repo: tile-ai/tilelang PR: 1464
File: testing/python/language/test_tilelang_language_rand.py:14-14
Timestamp: 2025-12-18T04:50:00.512Z
Learning: In `testing/python/language/test_tilelang_language_rand.py`, the TileLang kernel uses `blk_M = M` (single block) and calls `rng_rand()` four times per element to align results with the Triton implementation, which uses `blk_M = 128` (multiple blocks) and calls the RNG once per element. These differences compensate for internal RNG behavior differences between TileLang and Triton.

Applied to files:

  • examples/gemm/example_gemm.py
🧬 Code graph analysis (1)
examples/gemm/example_gemm.py (1)
tilelang/jit/kernel.py (1)
  • out_idx (636-637)
🔇 Additional comments (2)
tilelang/cache/__init__.py (1)

5-20: Clean TYPE_CHECKING usage to avoid circular imports.

The use of TYPE_CHECKING to import KernelCache only for type hints is a good pattern that avoids potential circular import issues at runtime while maintaining type safety.

examples/gemm/example_gemm.py (1)

5-5: Good example demonstrating the new backend selection API.

The addition of execution_backend="ctypes" effectively demonstrates the new multi-backend dispatch mechanism. Based on the PR description, this has been tested successfully on CUDA.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tilelang/cache/cutedsl_kernel_cache.py (2)

16-17: Remove unnecessary __new__ override.

The __new__ method simply delegates to the parent without adding any custom logic. Python automatically uses the parent's __new__ if not overridden, making this dead code.

🔎 Proposed fix
-    def __new__(cls, *args, **kwargs):
-        return super().__new__(cls, *args, **kwargs)
-

41-43: Consider using iterable unpacking for list extension.

The list concatenation works correctly, but iterable unpacking is more idiomatic and slightly more efficient.

🔎 Proposed fix
     @override
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 676b4d0 and e28dba1.

📒 Files selected for processing (1)
  • tilelang/cache/cutedsl_kernel_cache.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/cache/cutedsl_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (7)
  • KernelCache (25-449)
  • _save_kernel_source_code_to_disk (407-412)
  • _save_so_cubin_to_disk (420-425)
  • _safe_write_file (256-263)
  • _load_binary (250-253)
  • _get_required_files (427-430)
  • _load_kernel_source (432-449)
🪛 Ruff (0.14.8)
tilelang/cache/cutedsl_kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (3)
tilelang/cache/cutedsl_kernel_cache.py (3)

19-21: Verify no-op implementation is intentional.

The method intentionally skips saving kernel source code, deviating from the base class behavior. Confirm this is the correct design for CuTeDSL—does the backend not generate saveable kernel source, or is source code unnecessary for cache restoration?


23-39: LGTM! Launcher artifact handling is well-implemented.

The method correctly saves CuTeDSL-specific launcher artifacts (library and optional C++ source) using defensive attribute checks and the parent class's safe write utilities.


45-47: Verify empty string return values are correct.

The method returns empty strings rather than None. The base class can return None to indicate loading failure. Since _save_kernel_source_code_to_disk is a no-op and no source is saved, confirm that returning empty strings (rather than None) is the expected behavior for downstream code.

Note: The previous review issue regarding method naming (_load_kernel_from_cache vs _load_kernel_source) has been correctly resolved.

Copy link
Member

@LeiWang1999 LeiWang1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?

@sgjzfzzf
Copy link
Contributor Author

Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?

For sure I could do that.

@sgjzfzzf
Copy link
Contributor Author

Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?

@LeiWang1999 Hi Lei! I tried to do that, but a horrible point is that the Adapter construction depends on the KernelCache. It'll lead to a cycle reference if we put KernelCache inside Adapter. Do you want me to refactor the whole cached part on the kernel construction, or just move KernelCache codes into the Adapter directory?

@silentCoder-dev
Copy link
Collaborator

Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?

@LeiWang1999 Hi Lei! I tried to do that, but a horrible point is that the Adapter construction depends on the KernelCache. It'll lead to a cycle reference if we put KernelCache inside Adapter. Do you want me to refactor the whole cached part on the kernel construction, or just move KernelCache codes into the Adapter directory?

It seems sufficient to move this into adapter to better support maintaining multiple backends.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)

14-30: Code duplication between methods.

Lines 15-20 and 24-29 contain identical logic for saving the Python wrapper file. This was previously flagged.

tilelang/jit/adapter/kernel_cache.py (1)

19-24: Fix logging to display file path instead of object.

Line 23 logs the executable object instead of kernel_lib_path. The debug message should show where the file is being saved (the path), not the object being saved.

🔎 Proposed fix
     def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
         kernel_lib_path = os.path.join(cache_path, self.kernel_lib_path)
         executable = kernel.adapter.executable
         if verbose:
-            self.logger.debug(f"Saving kernel executable to file: {executable}")
+            self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}")
         KernelCache._safe_write_executable(executable, kernel_lib_path)
🧹 Nitpick comments (3)
tilelang/jit/adapter/cutedsl/kernel_cache.py (2)

41-43: Consider iterable unpacking for clarity.

Per static analysis (RUF005), prefer unpacking over concatenation for combining lists.

🔎 Suggested refactor
     @override
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]

45-47: Returning empty strings works but is subtly different from None.

The base class's _load_kernel_from_disk condition if ((host_kernel_source and device_kernel_source) or self.execution_backend == "cutedsl") treats empty strings as falsy, so this works due to the cutedsl backend check. However, returning (None, None) would be more explicit and consistent with the base class's error-handling pattern that returns None on failure.

🔎 Suggested alternative
     @override
     def _load_kernel_source(self, device_kernel_path: str, host_kernel_path: str, verbose: bool = False) -> tuple[str | None, str | None]:
-        return "", ""
+        return None, None
tilelang/jit/adapter/kernel_cache.py (1)

10-11: Consider whether new override is necessary.

The __new__ method simply delegates to the superclass without additional logic. While this pattern is consistent across all backend implementations in the codebase (CTypes, Cython, Torch, NVRTC), it appears redundant if no customization is needed.

If this pattern serves a specific purpose (e.g., enforcing singleton behavior or future extensibility), consider adding a comment explaining the rationale. Otherwise, the method could be removed to reduce boilerplate.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e28dba1 and 81dd6eb.

📒 Files selected for processing (7)
  • tilelang/cache/__init__.py (2 hunks)
  • tilelang/jit/adapter/ctypes/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/cutedsl/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/cython/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/nvrtc/kernel_cache.py (1 hunks)
  • tilelang/jit/adapter/torch/kernel_cache.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/jit/adapter/cutedsl/kernel_cache.py (3)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/jit/adapter/nvrtc/kernel_cache.py (2)
  • _save_kernel_source_code_to_disk (14-21)
  • _save_so_cubin_to_disk (23-30)
tilelang/jit/adapter/kernel_cache.py (1)
  • _save_so_cubin_to_disk (19-24)
tilelang/jit/adapter/torch/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-449)
tilelang/cache/__init__.py (5)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
  • CTypesKernelCache (4-6)
tilelang/jit/adapter/cython/kernel_cache.py (1)
  • CythonKernelCache (4-6)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
  • NVRTCKernelCache (7-30)
tilelang/jit/adapter/torch/kernel_cache.py (1)
  • TorchKernelCache (4-6)
tilelang/jit/adapter/kernel_cache.py (1)
  • TVMFFIKernelCache (7-24)
🪛 Ruff (0.14.8)
tilelang/jit/adapter/cutedsl/kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

tilelang/cache/__init__.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (13)
tilelang/jit/adapter/cython/kernel_cache.py (1)

4-6: Minimal adapter follows established pattern.

The subclass correctly delegates to the base KernelCache. Note that the singleton implementation in the base class stores the instance on KernelCache._instance directly, meaning all backend-specific subclasses share the same singleton—this appears intentional for a unified cache but is worth being aware of.

tilelang/jit/adapter/torch/kernel_cache.py (1)

4-6: LGTM!

The TorchKernelCache subclass correctly delegates to the base class and follows the consistent adapter pattern established across backends.

tilelang/jit/adapter/nvrtc/kernel_cache.py (1)

7-12: Class attributes and new are correctly defined.

The kernel_lib_path = "kernel.cubin" appropriately overrides the base class default for the NVRTC backend.

tilelang/jit/adapter/cutedsl/kernel_cache.py (3)

10-17: Class attributes and structure are well-defined.

The CuTeDSL-specific paths (launcher_lib.so, launcher.cpp) are appropriately declared as class attributes.


19-21: No-op override is intentional for CuTeDSL.

The CuTeDSL backend doesn't persist kernel source in the traditional manner, relying on launcher artifacts instead.


23-39: Defensive attribute access is appropriate.

Good use of getattr and hasattr to safely check for optional lib_generator and launcher_cpp_code attributes before attempting to save.

tilelang/jit/adapter/ctypes/kernel_cache.py (1)

4-6: LGTM!

The CTypesKernelCache subclass follows the consistent minimal adapter pattern.

tilelang/cache/__init__.py (4)

5-18: LGTM! Clean TYPE_CHECKING usage prevents circular imports.

The TYPE_CHECKING conditional import pattern properly isolates the KernelCache type reference for type checking while avoiding runtime circular dependencies. All backend implementations are imported and ready for the dispatch pool.


20-28: Well-structured dispatch pool with singleton instances.

The dispatch pool architecture cleanly maps backend names to their singleton KernelCache implementations. This eliminates the conditional branching that was present in the previous design.


45-59: Dispatch logic correctly routes to backend implementations.

The normalization of "auto"/None to "tvm_ffi" provides a sensible default, and all parameters are properly forwarded to the selected backend's cached() method.


60-61: Exception handling addresses previous feedback.

The ValueError for unsupported backends implements the behavior requested in past review comments. The inline error message is clear and actionable.

Note: Ruff suggests moving the message to an exception subclass (TRY003), but this inline approach is acceptable for a simple validation error.

The static analysis hint is optional; if you prefer to follow strict style guidelines, consider defining a custom exception class, but the current implementation is functionally correct.

tilelang/jit/adapter/kernel_cache.py (2)

7-8: LGTM! Backend-specific library path is properly defined.

The TVMFFIKernelCache subclass appropriately sets kernel_lib_path = "executable.so" for the TVM FFI backend, following the pattern used by other backend implementations (e.g., NVRTC uses "kernel.cubin").


13-17: LGTM! Wrapper kernel saving logic is correct.

The method properly constructs the host kernel path, logs the correct path variable when verbose, and safely writes the host source using the parent class helper method.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tilelang/cache/__init__.py (1)

37-37: Missing "cutedsl" in execution_backend type hint.

The _dispatch_pool includes "cutedsl" (line 26), but the type hint for execution_backend on line 37 omits it. This creates a type-checking inconsistency where the backend is available at runtime but not reflected in the type signature.

-    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch"] | None = None,
+    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None,

Note: This issue was already flagged in a previous review comment.

🧹 Nitpick comments (1)
tilelang/cache/__init__.py (1)

58-58: Optional: Consider Ruff hint about exception message format.

Ruff suggests avoiding long messages directly in exception raises (TRY003). For this simple case, the current format is clear and maintainable, so this is purely a linting preference.

If you want to strictly follow the linter suggestion:

+_UNSUPPORTED_BACKEND_MSG = 'Cannot find support for execution backend "{}"'
+
 def cached(
     ...
     else:
-        raise ValueError(f'Cannot find support for execution backend "{execution_backend}"')
+        raise ValueError(_UNSUPPORTED_BACKEND_MSG.format(execution_backend))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81dd6eb and d256f9b.

📒 Files selected for processing (1)
  • tilelang/cache/__init__.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
tilelang/cache/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (2)
tilelang/cache/__init__.py (2)

20-28: Good refactoring: dispatch pool centralizes backend selection.

The dispatch pool cleanly maps backend keys to singleton instances, eliminating the if-else branching mentioned in the PR objectives. This design makes adding new backends straightforward.


10-15: All imported KernelCache implementations verified and exist. The refactored dispatch system has all six required backend-specific classes properly defined in their respective modules. No issues found.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tilelang/cache/__init__.py (1)

37-37: Critical: Default execution_backend=None will always fail.

The parameter defaults to None, but None is not a key in _dispatch_pool (lines 21-28). Every call to cached() without an explicit execution_backend will raise ValueError at line 58, breaking the public API.

Based on past review comments, the old implementation supported "auto" mode for backend detection. This refactor removed that logic but didn't update the default value.

🔎 Recommended fixes

Option 1: Restore auto-detection for None (preserves backward compatibility)

+    # Auto-detect backend if not specified
+    if execution_backend is None:
+        # Determine the best available backend based on target
+        if isinstance(target, str) and target == "auto":
+            execution_backend = "tvm_ffi"  # or implement detection logic
+        else:
+            # Parse target to determine appropriate backend
+            execution_backend = "tvm_ffi"  # placeholder
+    
     if execution_backend in _dispatch_pool:

Option 2: Use a valid default backend

-    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None,
+    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] = "tvm_ffi",

Option 3: Make execution_backend required

-    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None,
+    execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"],
🧹 Nitpick comments (2)
tilelang/cache/__init__.py (2)

20-28: Consider lazy initialization of backend instances.

All backend instances are created at module import time, even if only one backend will be used. For better startup performance, consider lazy initialization:

🔎 Optional refactor for lazy initialization
-_dispatch_pool: dict[str, KernelCache] = {
-    "tvm_ffi": TVMFFIKernelCache(),
-    "ctypes": CTypesKernelCache(),
-    "cython": CythonKernelCache(),
-    "nvrtc": NVRTCKernelCache(),
-    "cutedsl": CuTeDSLKernelCache(),
-    "torch": TorchKernelCache(),
-}
+_dispatch_pool: dict[str, KernelCache] = {}
+
+def _get_backend_cache(backend: str) -> KernelCache:
+    """Get or create a backend cache instance."""
+    if backend not in _dispatch_pool:
+        backend_classes = {
+            "tvm_ffi": TVMFFIKernelCache,
+            "ctypes": CTypesKernelCache,
+            "cython": CythonKernelCache,
+            "nvrtc": NVRTCKernelCache,
+            "cutedsl": CuTeDSLKernelCache,
+            "torch": TorchKernelCache,
+        }
+        if backend in backend_classes:
+            _dispatch_pool[backend] = backend_classes[backend]()
+    return _dispatch_pool.get(backend)

Then update line 46 to use _get_backend_cache(execution_backend).


57-58: Consider extracting error message to improve maintainability.

Static analysis flags this as TRY003 (long message outside exception class). While the current message is clear and acceptable, extracting it can improve consistency if you add more validation errors later.

🔎 Optional refactor
+_UNSUPPORTED_BACKEND_ERROR = (
+    'Cannot find support for execution backend "{}". '
+    f'Supported backends: {", ".join(_dispatch_pool.keys())}'
+)
+
 def cached(
     ...
 ) -> JITKernel:
     ...
     if execution_backend in _dispatch_pool:
         ...
     else:
-        raise ValueError(f'Cannot find support for execution backend "{execution_backend}"')
+        raise ValueError(_UNSUPPORTED_BACKEND_ERROR.format(execution_backend))
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d256f9b and 2e3be15.

📒 Files selected for processing (2)
  • tilelang/cache/__init__.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/cache/__init__.py (1)
tilelang/cache/kernel_cache.py (2)
  • KernelCache (25-449)
  • cached (113-227)
🪛 Ruff (0.14.8)
tilelang/cache/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (1)
tilelang/cache/__init__.py (1)

5-18: LGTM! Clean import structure with proper TYPE_CHECKING usage.

The TYPE_CHECKING pattern correctly avoids circular import issues while maintaining type safety. The backend-specific imports are well-organized.

@SiriusNEO SiriusNEO self-requested a review December 22, 2025 04:47
Copy link
Collaborator

@SiriusNEO SiriusNEO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, just some comments to go

@SiriusNEO SiriusNEO changed the title [Enhancement] refactor kernel_cache [Enhancement] Refactor KernelCache to use inheritance-based design Dec 22, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)

1-4: Consider adding a docstring to document the placeholder implementation.

The empty class body is valid but would benefit from a docstring explaining that CTypesKernelCache currently uses all default KernelCache behavior and may be extended in the future with ctypes-specific logic.

🔎 Suggested documentation
+"""CTypes backend kernel cache implementation."""
 from tilelang.cache.kernel_cache import KernelCache


-class CTypesKernelCache(KernelCache): ...
+class CTypesKernelCache(KernelCache):
+    """
+    Kernel cache for the ctypes execution backend.
+    
+    Currently uses default KernelCache behavior. Backend-specific
+    customizations can be added here as needed.
+    """
+    ...
tilelang/jit/adapter/cython/kernel_cache.py (1)

1-4: Consider adding a docstring to document the placeholder implementation.

Similar to CTypesKernelCache, this empty class would benefit from documentation explaining it's a placeholder for Cython-specific cache behavior.

🔎 Suggested documentation
+"""Cython backend kernel cache implementation."""
 from tilelang.cache.kernel_cache import KernelCache


-class CythonKernelCache(KernelCache): ...
+class CythonKernelCache(KernelCache):
+    """
+    Kernel cache for the Cython execution backend.
+    
+    Currently uses default KernelCache behavior. Backend-specific
+    customizations can be added here as needed.
+    """
+    ...
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

41-43: Consider iterable unpacking for concatenation.

The list concatenation works correctly but could use iterable unpacking for slightly better performance.

🔎 Optional optimization
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2e3be15 and 1c0c714.

📒 Files selected for processing (8)
  • tilelang/cache/__init__.py
  • tilelang/cache/kernel_cache.py
  • tilelang/jit/adapter/ctypes/kernel_cache.py
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
  • tilelang/jit/adapter/cython/kernel_cache.py
  • tilelang/jit/adapter/kernel_cache.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
  • tilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tilelang/jit/adapter/torch/kernel_cache.py
  • tilelang/jit/adapter/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • tilelang/cache/__init__.py
🧬 Code graph analysis (4)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-475)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-475)
tilelang/jit/adapter/nvrtc/kernel_cache.py (2)
tilelang/cache/kernel_cache.py (4)
  • KernelCache (25-475)
  • _save_so_cubin_to_disk (412-417)
  • _safe_write_file (253-260)
  • _load_binary (247-250)
tilelang/jit/adapter/kernel_cache.py (1)
  • _save_so_cubin_to_disk (16-21)
tilelang/cache/kernel_cache.py (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (6)
  • _set_adapter_cache_path (50-52)
  • _save_kernel_source_code_to_disk (20-21)
  • _save_so_cubin_to_disk (24-39)
  • _get_required_files (42-43)
  • _load_kernel_source (46-47)
  • _build_kernel (55-83)
🪛 Ruff (0.14.10)
tilelang/cache/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

tilelang/jit/adapter/cutedsl/kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

tilelang/cache/kernel_cache.py

443-443: Unused method argument: kernel

(ARG002)


443-443: Unused method argument: cache_path

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (14)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)

7-18: LGTM! Clean NVRTC-specific implementation.

The class correctly customizes paths for NVRTC's cubin output and implements the necessary logic to persist the Python wrapper alongside the binary. The previous duplicate logic concern has been addressed.

tilelang/cache/__init__.py (3)

21-28: Clean dispatch mechanism successfully replaces singleton pattern.

The backend dispatch map provides a clear, extensible way to route kernel cache operations to backend-specific implementations. This aligns with the PR objective to eliminate if-else branching.


37-37: Good choice of default execution backend.

Setting the default to "tvm_ffi" addresses previous review concerns about None causing errors. This provides a sensible fallback while still allowing explicit backend selection.

Based on past review comments, this follows the suggestion to use "tvm_ffi" as the default for internal APIs.


45-58: Dispatch logic is straightforward and correct.

The if-else routing to backend-specific cache instances is clear. The ValueError provides a helpful message for unsupported backends.

Note: The static analysis hint (TRY003) about the long error message is a stylistic suggestion. The current inline message is acceptable and clear for this use case.

tilelang/jit/adapter/cutedsl/kernel_cache.py (4)

13-21: CuTeDSL-specific paths and no-op kernel source save are appropriate.

The custom paths for launcher artifacts align with CuTeDSL's C++ launcher design. The early return in _save_kernel_source_code_to_disk indicates CuTeDSL handles kernel sources differently than other backends.


23-39: Defensive attribute checking for launcher artifacts is good practice.

The use of getattr() and hasattr() ensures the code handles cases where the adapter or lib_generator may not have the expected launcher attributes. The verbose logging aids debugging.


45-52: Appropriate overrides for CuTeDSL's kernel loading behavior.

Returning empty strings from _load_kernel_source and propagating the cache path to the adapter reflect CuTeDSL's different approach to managing kernel sources and artifacts.


54-83: Correct kernel reconstruction from cached parameters.

The _build_kernel implementation properly delegates to JITKernel.from_database when parameters are available, which is the expected behavior for loading cached kernels.

tilelang/cache/kernel_cache.py (6)

39-43: Excellent use of class attributes for path customization.

Moving path constants from module-level to class attributes allows backend-specific subclasses to override paths as needed (e.g., NVRTC using "kernel.cubin" instead of "kernel_lib.so"). This is a key enabler of the inheritance-based design.


290-303: Well-structured delegation to specialized save methods.

Breaking _save_kernel_to_disk into separate calls to _save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, and _save_so_cubin_to_disk provides clear override points for backend-specific behavior while maintaining error handling at the orchestration level.


351-380: Clean refactor of disk loading logic with extensible helper methods.

The extraction of _get_required_files, _load_kernel_source, and _build_kernel successfully eliminates conditional logic while enabling backends like CuTeDSL to customize file requirements and loading behavior.


399-441: New helper methods provide excellent override points for subclasses.

These methods establish the template method pattern, allowing each backend to customize specific aspects:

  • _save_kernel_source_code_to_disk: Device kernel persistence
  • _save_wrapper_kernel_code_to_disk: Host wrapper persistence
  • _save_so_cubin_to_disk: Library/binary persistence
  • _get_required_files: File existence checks
  • _load_kernel_source: Source code loading

This design successfully achieves the PR objective of eliminating if-else branching in favor of polymorphism.


443-444: Hook method with unused parameters is intentional design.

The _set_adapter_cache_path method serves as an extension point for subclasses (e.g., CuTeDSLKernelCache overrides this at line 50-52 of tilelang/jit/adapter/cutedsl/kernel_cache.py to propagate the cache path). The unused parameters in the base implementation are expected for hook methods.

The static analysis warnings for unused arguments (ARG002) are false positives in this context.


446-475: Flexible kernel reconstruction logic supports diverse backends.

The _build_kernel method provides a sensible default that works for most backends while allowing specialized subclasses (like CuTeDSL) to implement custom reconstruction logic based on available artifacts.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
tilelang/jit/adapter/kernel_cache.py (1)

16-21: Fix incorrect variable in log message.

Line 20 logs the Executable object instead of the destination path kernel_lib_path. The log should indicate where the file is being saved.

🔎 Proposed fix
     executable = kernel.adapter.executable
     if verbose:
-        self.logger.debug(f"Saving kernel executable to file: {executable}")
+        self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}")
     KernelCache._safe_write_executable(executable, kernel_lib_path)
tilelang/cache/kernel_cache.py (2)

406-410: Add null check for adapter's kernel source.

Some adapters (e.g., CuTeDSL, NVRTC) may return None from get_kernel_source(). Passing None to file.write() will raise a TypeError, which gets caught and logged with a generic error message that doesn't indicate the root cause.

🔎 Proposed fix
 def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
     host_kernel_path = os.path.join(cache_path, self.host_kernel_path)
+    source = kernel.adapter.get_kernel_source()
+    if source is None:
+        if verbose:
+            self.logger.debug("Skipping host kernel source save: adapter returned None")
+        return
     if verbose:
         self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}")
-    KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source()))
+    KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))

446-476: Use explicit None checks instead of truthiness for source validation.

Line 460 uses truthiness checks that conflate two distinct cases:

  • None: Source file not found (error condition)
  • "": Source file exists but is empty (valid for backends like CuTeDSL)

The CuTeDSL adapter confirms this is problematic—it intentionally returns "", "" from _load_kernel_source but must override _build_kernel to only check kernel_params because the base class truthiness check incorrectly rejects empty strings.

🔎 Recommended fix
-    if host_kernel_source and device_kernel_source and kernel_params:
+    if host_kernel_source is not None and device_kernel_source is not None and kernel_params is not None:
         return JITKernel.from_database(

This allows backends that don't use kernel source files to return empty strings while still detecting actual load failures (None).

🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

41-43: Consider using iterable unpacking for cleaner syntax.

The list concatenation works correctly but could be more idiomatic.

🔎 Proposed refactor
 def _get_required_files(self, cache_path: str) -> list[str]:
-    return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+    return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]

Based on static analysis hint.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c7e1fa and aba3d29.

📒 Files selected for processing (8)
  • tilelang/cache/__init__.py
  • tilelang/cache/kernel_cache.py
  • tilelang/jit/adapter/ctypes/kernel_cache.py
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
  • tilelang/jit/adapter/cython/kernel_cache.py
  • tilelang/jit/adapter/kernel_cache.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
  • tilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tilelang/jit/adapter/torch/kernel_cache.py
  • tilelang/jit/adapter/ctypes/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • tilelang/cache/__init__.py
🧬 Code graph analysis (4)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
  • KernelCache (25-476)
tilelang/jit/adapter/cutedsl/kernel_cache.py (5)
tilelang/language/ast/ir.py (1)
  • target (1677-1707)
tilelang/cache/kernel_cache.py (7)
  • KernelCache (25-476)
  • _save_kernel_source_code_to_disk (399-404)
  • _save_so_cubin_to_disk (412-417)
  • _safe_write_file (252-259)
  • _load_binary (246-249)
  • _load_kernel_source (424-441)
  • _set_adapter_cache_path (443-444)
tilelang/engine/param.py (1)
  • KernelParam (13-141)
tilelang/jit/adapter/kernel_cache.py (1)
  • _save_so_cubin_to_disk (16-21)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
  • _save_so_cubin_to_disk (11-18)
tilelang/jit/adapter/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (2)
  • _safe_write_file (252-259)
  • _safe_write_executable (262-265)
tilelang/cache/kernel_cache.py (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (6)
  • _set_adapter_cache_path (50-52)
  • _save_kernel_source_code_to_disk (20-21)
  • _save_so_cubin_to_disk (24-39)
  • _get_required_files (42-43)
  • _load_kernel_source (46-47)
  • _build_kernel (55-84)
🪛 Ruff (0.14.10)
tilelang/jit/adapter/cutedsl/kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

tilelang/cache/__init__.py

58-58: Avoid specifying long messages outside the exception class

(TRY003)

tilelang/cache/kernel_cache.py

443-443: Unused method argument: kernel

(ARG002)


443-443: Unused method argument: cache_path

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (15)
tilelang/jit/adapter/cython/kernel_cache.py (1)

1-4: LGTM! Valid placeholder implementation.

The minimal CythonKernelCache subclass correctly inherits from KernelCache and can be extended with Cython-specific behavior as needed.

tilelang/jit/adapter/kernel_cache.py (1)

10-14: LGTM! TVM-specific wrapper save logic is correct.

The method properly saves the host source code using the TVM adapter's get_host_source() method.

tilelang/jit/adapter/cutedsl/kernel_cache.py (4)

13-17: LGTM! CuTeDSL-specific path configuration.

The custom paths for launcher library and C++ source are appropriate for the CuTeDSL backend's architecture.


19-21: LGTM! Intentional no-op for CuTeDSL.

CuTeDSL doesn't require device kernel source persistence, so the no-op implementation is correct for this backend.


23-39: LGTM! Robust launcher artifact persistence.

The method correctly handles optional launcher library and C++ source with defensive existence checks.


45-83: LGTM! CuTeDSL-specific load and build logic.

The implementation correctly:

  • Returns empty strings for kernel source (not needed for CuTeDSL)
  • Propagates cache path to the adapter
  • Builds kernel based solely on kernel_params (since source strings are always empty)

This aligns with CuTeDSL's architecture where the launcher library is the primary artifact.

tilelang/cache/__init__.py (2)

5-28: LGTM! Clean dispatch registry implementation.

The dispatch map architecture successfully replaces the if-else branching mentioned in the PR objectives. The TYPE_CHECKING guard and backend instantiation are well-structured.


31-58: LGTM! Dispatch logic successfully eliminates if-else branching.

The function properly routes to backend-specific implementations via the dispatch map, achieving the refactoring goal stated in the PR objectives. The default backend and error handling are appropriate.

tilelang/cache/kernel_cache.py (7)

39-42: LGTM! Class-level path attributes enable inheritance-based dispatch.

These attributes form the foundation of the inheritance-based refactoring, allowing subclasses to customize paths without modifying base class logic. This successfully addresses the PR objective of eliminating if-else branching.


219-219: LGTM! Extracted cache path setting enables subclass customization.

The helper method allows backends like CuTeDSL to customize cache path propagation while keeping the base implementation simple.


287-314: LGTM! Helper method extraction enables backend-specific behavior.

The refactored save logic successfully moves backend-specific handling to overridable methods, eliminating the need for if-else branching in the base class.


345-380: LGTM! Load logic properly delegates to customizable helpers.

The refactored loading flow uses class-level paths and overridable helper methods, allowing backends to customize required files, source loading, and kernel building without modifying the base flow.


399-404: LGTM! Device kernel source save with proper null check.

The method correctly checks for None before writing, and the override mechanism allows backends like CuTeDSL to skip this step entirely.


412-441: LGTM! I/O helper implementations are robust.

The save/load helper methods properly:

  • Use class-level path attributes
  • Handle exceptions gracefully
  • Return None on load failures
  • Enable subclass customization

This design cleanly separates concerns and eliminates backend-specific branching.


443-444: Intentional no-op base implementation.

The empty method with unused parameters is a valid extension point for subclasses. CuTeDSL overrides this to propagate the cache path to its adapter. The static analysis warning can be safely ignored in this case.

Comment on lines +11 to +18
def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
src_lib_path = kernel.adapter.libpath
kernel_py_path = os.path.join(cache_path, self.kernel_py_path)
src_lib_path = src_lib_path.replace(".cubin", ".py")
if verbose:
self.logger.debug(f"Saving kernel nvrtc python code to file: {kernel_py_path}")
KernelCache._safe_write_file(kernel_py_path, "wb", lambda file: file.write(KernelCache._load_binary(src_lib_path)))
super()._save_so_cubin_to_disk(kernel, cache_path, verbose)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate file extension assumption before string replacement.

Line 14 assumes kernel.adapter.libpath ends with .cubin, but performs a blind replace(".cubin", ".py"). If the libpath doesn't contain .cubin, this silently fails and could lead to incorrect paths.

🔎 Proposed fix to validate the extension
 def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
     src_lib_path = kernel.adapter.libpath
     kernel_py_path = os.path.join(cache_path, self.kernel_py_path)
-    src_lib_path = src_lib_path.replace(".cubin", ".py")
+    if not src_lib_path.endswith(".cubin"):
+        self.logger.warning(f"Expected .cubin extension but got: {src_lib_path}")
+    src_py_path = src_lib_path.replace(".cubin", ".py")
     if verbose:
         self.logger.debug(f"Saving kernel nvrtc python code to file: {kernel_py_path}")
-    KernelCache._safe_write_file(kernel_py_path, "wb", lambda file: file.write(KernelCache._load_binary(src_lib_path)))
+    KernelCache._safe_write_file(kernel_py_path, "wb", lambda file: file.write(KernelCache._load_binary(src_py_path)))
     super()._save_so_cubin_to_disk(kernel, cache_path, verbose)
🤖 Prompt for AI Agents
In tilelang/jit/adapter/nvrtc/kernel_cache.py around lines 11 to 18, the code
blindly replaces ".cubin" with ".py" on kernel.adapter.libpath which can
silently produce incorrect paths if the string doesn't contain that extension;
validate the extension first (e.g., if libpath.endswith(".cubin") then replace,
otherwise either compute the .py path using os.path.splitext(libpath)[0] + ".py"
or log and raise a clear error) and update the call to KernelCache._load_binary
to use the validated/constructed path so you don't silently operate on a wrong
filename.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
tilelang/jit/adapter/kernel_cache.py (1)

16-21: Fix incorrect variable in log message.

Line 20 logs executable (the Executable object) instead of kernel_lib_path (the destination path). The log message should indicate where the file is being saved, not the object being saved.

🔎 Proposed fix
     def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
         kernel_lib_path = os.path.join(cache_path, self.kernel_lib_path)
         executable = kernel.adapter.executable
         if verbose:
-            self.logger.debug(f"Saving kernel executable to file: {executable}")
+            self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}")
         KernelCache._safe_write_executable(executable, kernel_lib_path)
tilelang/cache/kernel_cache.py (2)

406-410: Add null check for get_kernel_source() return value.

Some adapters (e.g., CuTeDSLKernelAdapter, NVRTCKernelAdapter) can return None from get_kernel_source(). The current implementation passes the result directly to file.write(), which will raise a TypeError if None is returned. The exception gets caught but logged with a misleading generic message.

🔎 Proposed fix
     def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
         host_kernel_path = os.path.join(cache_path, self.host_kernel_path)
         if verbose:
             self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}")
-        KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source()))
+        source = kernel.adapter.get_kernel_source()
+        if source is None:
+            self.logger.debug("Skipping host kernel source save: adapter returned None")
+            return
+        KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))

460-476: Use explicit None checks instead of truthiness for kernel source validation.

The condition if host_kernel_source and device_kernel_source and kernel_params: uses truthiness checks that conflate two distinct cases:

  • None: Source file not found (error)
  • "": Source file exists but is intentionally empty (valid)

The CuTeDSLKernelCache works around this by overriding _build_kernel to only check kernel_params, but other backends that inherit this base implementation could fail if they need to return empty source strings.

🔎 Proposed fix
     def _build_kernel(
         self,
         func: Callable | None,
         host_kernel_source: str,
         device_kernel_source: str,
         kernel_lib_path: str,
         kernel_params: list[KernelParam] | None,
         target: str | Target,
         target_host: str | Target | None,
         out_idx: list[int] | None,
         execution_backend: Literal["tvm_ffi", "cython", "nvrtc", "torch", "cutedsl"],
         pass_configs: dict | None,
         compile_flags: list[str] | str | None,
     ) -> JITKernel | None:
-        if host_kernel_source and device_kernel_source and kernel_params:
+        if host_kernel_source is not None and device_kernel_source is not None and kernel_params is not None:
             return JITKernel.from_database(
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

41-43: Consider using iterable unpacking for list concatenation.

Static analysis suggests replacing list concatenation with iterable unpacking for slightly better performance and idiomatic Python.

🔎 Proposed fix
     @override
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aba3d29 and 96bf90c.

📒 Files selected for processing (7)
  • tilelang/cache/__init__.py
  • tilelang/cache/kernel_cache.py
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
  • tilelang/jit/adapter/cython/kernel_cache.py
  • tilelang/jit/adapter/kernel_cache.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
  • tilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tilelang/jit/adapter/cython/kernel_cache.py
  • tilelang/jit/adapter/torch/kernel_cache.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • tilelang/cache/__init__.py
🧬 Code graph analysis (1)
tilelang/cache/__init__.py (6)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
  • CuTeDSLKernelCache (13-84)
tilelang/jit/adapter/cython/kernel_cache.py (1)
  • CythonKernelCache (4-4)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
  • NVRTCKernelCache (7-18)
tilelang/jit/adapter/torch/kernel_cache.py (1)
  • TorchKernelCache (4-4)
tilelang/jit/adapter/kernel_cache.py (1)
  • TVMFFIKernelCache (7-21)
tilelang/cache/kernel_cache.py (2)
  • KernelCache (25-476)
  • cached (110-223)
🪛 Ruff (0.14.10)
tilelang/cache/__init__.py

56-56: Avoid specifying long messages outside the exception class

(TRY003)

tilelang/cache/kernel_cache.py

443-443: Unused method argument: kernel

(ARG002)


443-443: Unused method argument: cache_path

(ARG002)

tilelang/jit/adapter/cutedsl/kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (12)
tilelang/jit/adapter/kernel_cache.py (1)

7-14: LGTM!

The TVMFFIKernelCache class correctly inherits from KernelCache and overrides the appropriate methods for TVM FFI-specific behavior. The host_kernel_path attribute is properly inherited from the base class.

tilelang/jit/adapter/cutedsl/kernel_cache.py (3)

23-39: LGTM!

The _save_so_cubin_to_disk override correctly handles CuTeDSL-specific launcher library and C++ source saving with proper defensive checks.


54-84: LGTM!

The _build_kernel override correctly checks only kernel_params since CuTeDSL's _load_kernel_source intentionally returns empty strings. This properly handles the CuTeDSL-specific loading behavior.


49-52: LGTM!

The _set_adapter_cache_path override correctly propagates the cache path to the CuTeDSL adapter.

tilelang/cache/__init__.py (3)

19-26: LGTM!

The dispatch map correctly maps execution backends to their respective KernelCache subclass instances, replacing the previous singleton pattern with a clean inheritance-based dispatch mechanism.


29-56: LGTM!

The cached() function correctly dispatches to the appropriate backend cache via the dispatch map. The default execution_backend="tvm_ffi" properly addresses previous review feedback about the None default causing errors.


74-75: Module-level clear_cache() call will crash on import when env var is set.

When TILELANG_CLEAR_CACHE is set to a truthy value, calling clear_cache() at module import time will raise a RuntimeError, crashing the application. This may be the intended behavior to force users to acknowledge the risk, but it could be unexpected for users setting this environment variable.

Consider whether this is the intended UX. If users are expected to manually delete the cache, you might want to just log a warning instead of raising an exception at import time:

if env.TILELANG_CLEAR_CACHE.lower() in ("1", "true", "yes", "on"):
    import warnings
    warnings.warn(
        f"TILELANG_CLEAR_CACHE is set but automatic cache clearing is disabled. "
        f"To clear the cache, manually run: rm -rf '{env.TILELANG_CACHE_DIR}'"
    )
tilelang/cache/kernel_cache.py (5)

39-42: LGTM!

Moving path constants to class-level attributes is a clean refactor that enables backend-specific subclasses to override paths as needed.


443-444: No-op hook method is appropriate for inheritance pattern.

The empty _set_adapter_cache_path method serves as a hook that subclasses (like CuTeDSLKernelCache) can override. The static analysis warnings about unused arguments are false positives for this design pattern.


424-441: LGTM!

The _load_kernel_source method correctly handles file loading with proper exception handling and logging. It returns None for failed loads, allowing callers to handle missing sources appropriately.


419-422: LGTM!

The _get_required_files method correctly returns only the essential files (kernel library and parameters). Source files are optional, and subclasses can extend this list as needed.


399-404: LGTM!

The _save_kernel_source_code_to_disk method correctly checks for None before writing, avoiding potential TypeError exceptions.

Copy link
Collaborator

@silentCoder-dev silentCoder-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lucifer1004
Copy link
Collaborator

Could you also update CI tests to cover kernel cache?

@sgjzfzzf
Copy link
Contributor Author

Could you also update CI tests to cover kernel cache?

Sorry I'm not familiar with this CI😭. Which part should I start with? In .github/workflow.yaml?

@lucifer1004
Copy link
Collaborator

You should modify https://github.com/tile-ai/tilelang/blob/main/.github/workflows/ci.yml

@lucifer1004
Copy link
Collaborator

I think running https://github.com/tile-ai/tilelang/tree/main/testing/python/jit twice should be enough.
There are some caveats, though:

  1. For postproc tests, cache needs to be disabled explicitly since they will fail if run twice.
  2. For cutedsl, a separate run is required since it relies on env var TILELANG_USE_GEMM_V1=1.

SiriusNEO
SiriusNEO previously approved these changes Dec 24, 2025
Copy link
Collaborator

@SiriusNEO SiriusNEO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. After adding the test we are good to go

Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
tilelang/jit/adapter/kernel_cache.py (1)

16-21: Fix incorrect variable in log message.

Line 20 logs executable (the Executable object) instead of kernel_lib_path (the destination path). The log message should indicate where the file is being saved.

🔎 Proposed fix
     def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
         kernel_lib_path = os.path.join(cache_path, self.kernel_lib_path)
         executable = kernel.adapter.executable
         if verbose:
-            self.logger.debug(f"Saving kernel executable to file: {executable}")
+            self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}")
         KernelCache._safe_write_executable(executable, kernel_lib_path)
tilelang/cache/kernel_cache.py (2)

429-433: Add null check for get_kernel_source() return value.

Some adapters (e.g., CuTeDSLKernelAdapter, NVRTCKernelAdapter) can return None from get_kernel_source(). Passing None to file.write() will cause a TypeError that gets caught with a misleading generic message.

🔎 Proposed fix
 def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
     host_kernel_path = os.path.join(cache_path, self.host_kernel_path)
     if verbose:
         self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}")
-    KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source()))
+    source = kernel.adapter.get_kernel_source()
+    if source is None:
+        self.logger.debug("Skipping host kernel source save: adapter returned None")
+        return
+    KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))

483-499: Use explicit None checks instead of truthiness for kernel source validation.

The condition if host_kernel_source and device_kernel_source and kernel_params: uses truthiness checks that conflate None (source not found) with "" (source exists but is empty). The CuTeDSLKernelCache._load_kernel_source intentionally returns "", "" which would fail this check.

🔎 Proposed fix
-        if host_kernel_source and device_kernel_source and kernel_params:
+        if host_kernel_source is not None and device_kernel_source is not None and kernel_params is not None:
             return JITKernel.from_database(
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)

41-43: Consider using iterable unpacking instead of concatenation.

Per Ruff RUF005, iterable unpacking is slightly more idiomatic:

🔎 Proposed fix
     @override
     def _get_required_files(self, cache_path: str) -> list[str]:
-        return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)]
+        return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96bf90c and 6195770.

📒 Files selected for processing (8)
  • .github/workflows/ci.yml
  • tilelang/cache/__init__.py
  • tilelang/cache/kernel_cache.py
  • tilelang/jit/adapter/cutedsl/kernel_cache.py
  • tilelang/jit/adapter/cython/kernel_cache.py
  • tilelang/jit/adapter/kernel_cache.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
  • tilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • tilelang/jit/adapter/torch/kernel_cache.py
  • tilelang/jit/adapter/cython/kernel_cache.py
  • tilelang/jit/adapter/nvrtc/kernel_cache.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

  • .github/workflows/ci.yml
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • .github/workflows/ci.yml
  • tilelang/cache/__init__.py
🧬 Code graph analysis (2)
tilelang/jit/adapter/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (4)
  • KernelCache (25-499)
  • _safe_write_file (275-282)
  • _save_so_cubin_to_disk (435-440)
  • _safe_write_executable (285-288)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (6)
  • KernelCache (25-499)
  • _save_so_cubin_to_disk (435-440)
  • _safe_write_file (275-282)
  • _load_binary (269-272)
  • _get_required_files (442-445)
  • _load_kernel_source (447-464)
🪛 Ruff (0.14.10)
tilelang/cache/kernel_cache.py

466-466: Unused method argument: kernel

(ARG002)


466-466: Unused method argument: cache_path

(ARG002)

tilelang/jit/adapter/cutedsl/kernel_cache.py

43-43: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

tilelang/cache/__init__.py

56-56: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (10)
.github/workflows/ci.yml (1)

376-404: LGTM! CI steps align with the requested two-pass JIT testing strategy.

The implementation correctly:

  • Runs JIT tests twice to catch cache-related regressions (per lucifer1004's suggestion)
  • Disables cache on pass 2 via TILELANG_DISABLE_CACHE=1 to avoid postproc conflicts
  • Keeps CuTeDSL tests separate (run with TILELANG_USE_GEMM_V1=1 in a dedicated step)
tilelang/jit/adapter/kernel_cache.py (1)

7-14: LGTM!

The TVMFFIKernelCache class correctly overrides the kernel_lib_path class attribute and provides a specialized implementation for saving the wrapper kernel source.

tilelang/cache/__init__.py (2)

19-26: LGTM! Dispatch map correctly routes to backend-specific caches.

The singleton instances for each backend are properly instantiated, and the naming follows the suggestion from previous reviews (_dispatch_map instead of _dispatch_pool).


43-56: LGTM!

The dispatch logic correctly routes to the appropriate backend cache and raises a clear error for unsupported backends. Past review concerns (missing "cutedsl", None default causing errors) have been addressed.

tilelang/jit/adapter/cutedsl/kernel_cache.py (3)

13-21: LGTM!

The class correctly defines CuTeDSL-specific paths and intentionally makes _save_kernel_source_code_to_disk a no-op since CuTeDSL uses a different code generation path.


54-84: LGTM!

The _build_kernel implementation correctly uses JITKernel.from_database when kernel_params exist and returns None otherwise, aligning with the inheritance-based design.


49-52: The _cache_path attribute is already handled defensively in the adapter.

The adapter code uses getattr(self, "_cache_path", None) with a default value, properly handling the case where the attribute may not be set. No additional defensive checks are needed in kernel_cache.py. The interface is implicitly documented via the method's usage context.

tilelang/cache/kernel_cache.py (3)

39-42: LGTM! Class-level path attributes enable proper inheritance.

Moving path constants to class attributes allows subclasses to override them (e.g., TVMFFIKernelCache.kernel_lib_path = "executable.so"), which is the core of the inheritance-based refactoring.


466-467: Intentional no-op hook for subclass override.

The unused arguments (kernel, cache_path) are expected for this hook method pattern. Subclasses like CuTeDSLKernelCache override this to propagate the cache path to adapters. The Ruff ARG002 warnings can be safely ignored.


310-328: LGTM! Clean delegation to helper methods.

The refactored _save_kernel_to_disk now properly delegates to specialized helper methods (_save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, _save_so_cubin_to_disk), enabling subclasses to override individual behaviors.

Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tilelang/cache/kernel_cache.py (1)

125-148: Docstring is outdated after refactoring.

The docstring claims "This is the ONLY place where environment variable processing...should happen" (line 125) and documents environment variable behavior (lines 140-148), but the actual env variable processing now occurs in tilelang/cache/__init__.py:cached() (lines 44-51). This method receives already-resolved parameters.

Update the docstring to reflect that this is an internal method receiving normalized parameters, and remove the environment variable documentation section.

🔎 Proposed docstring update
     def cached(
         self,
         func: PrimFunc = None,
         out_idx: list[int] = None,
         *args,
         target: str | Target,
         target_host: str | Target | None = None,
         execution_backend: Literal["tvm_ffi", "cython", "nvrtc", "torch", "cutedsl"] = "tvm_ffi",
         verbose: bool,
         pass_configs: dict | None = None,
         compile_flags: list[str] | str | None = None,
     ) -> JITKernel:
         """
-        Caches and reuses compiled kernels to avoid redundant compilation.
-
-        This is the ONLY place where environment variable processing, target normalization,
-        and execution backend resolution should happen. All compilation paths go through here.
+        Internal caching method for backend-specific KernelCache subclasses.
+        
+        This method receives already-normalized parameters from the public 
+        tilelang.cache.cached() API. Environment variable defaults and backend 
+        resolution have already been applied.
 
         Args:
             func: Function to be compiled or a prepared PrimFunc
             out_idx: Indices specifying which outputs to return
-            target: Compilation target platform (None = read from TILELANG_TARGET env var)
+            target: Normalized compilation target platform (required)
             target_host: Host target platform
-            execution_backend: Execution backend (None = read from TILELANG_EXECUTION_BACKEND)
-            verbose: Enable verbose output (None = read from TILELANG_VERBOSE)
+            execution_backend: Resolved execution backend (required)
+            verbose: Enable verbose output (required)
             *args: Arguments passed to func
 
         Returns:
             JITKernel: The compiled kernel, either freshly compiled or from cache
-
-        Environment Variables
-        ---------------------
-        TILELANG_TARGET : str
-            Default compilation target (e.g., "cuda", "llvm"). Defaults to "auto".
-        TILELANG_EXECUTION_BACKEND : str
-            Default execution backend. Defaults to "auto".
-        TILELANG_VERBOSE : str
-            Set to "1", "true", "yes", or "on" to enable verbose compilation by default.
         """
♻️ Duplicate comments (2)
tilelang/cache/kernel_cache.py (2)

402-406: Add None guard for get_kernel_source() return value.

According to past review analysis, CuTeDSLKernelAdapter and NVRTCKernelAdapter can return None from get_kernel_source(). Line 406 directly passes this to file.write(), which will raise TypeError: a bytes-like object is required, not 'NoneType' if the adapter returns None. The exception will be caught generically with a misleading message.

Add an explicit check:

🔎 Proposed fix
 def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
     host_kernel_path = os.path.join(cache_path, self.host_kernel_path)
+    source = kernel.adapter.get_kernel_source()
+    if source is None:
+        if verbose:
+            self.logger.debug(f"Skipping host kernel source save: adapter returned None")
+        return
     if verbose:
         self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}")
-    KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source()))
+    KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))

Based on past review comments, this issue was flagged but not yet addressed.


442-472: Use explicit None checks instead of truthiness for source validation.

Line 456 uses if host_kernel_source and device_kernel_source and kernel_params: which conflates two distinct cases:

  • None: file not loaded (error condition)
  • "": file loaded but empty (valid, intentional)

According to past review analysis, the CuteDSL adapter intentionally returns "", "" from _load_kernel_source and must override _build_kernel to work around this truthiness check. Empty source strings are valid; only None should indicate a loading failure.

🔎 Proposed fix
-    if host_kernel_source and device_kernel_source and kernel_params:
+    if host_kernel_source is not None and device_kernel_source is not None and kernel_params is not None:
         return JITKernel.from_database(

This clearly differentiates "not loaded" (None) from "loaded as empty" ("") for each value.

Based on past review comments, this issue was flagged but not yet addressed.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6195770 and 631861c.

📒 Files selected for processing (2)
  • tilelang/cache/__init__.py
  • tilelang/cache/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • tilelang/cache/__init__.py
🪛 Ruff (0.14.10)
tilelang/cache/kernel_cache.py

439-439: Unused method argument: kernel

(ARG002)


439-439: Unused method argument: cache_path

(ARG002)

tilelang/cache/__init__.py

86-86: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (8)
tilelang/cache/__init__.py (4)

5-18: LGTM: Clean import structure with proper circular import handling.

The use of TYPE_CHECKING for the base KernelCache import avoids circular dependencies while maintaining type hint accuracy. All backend-specific subclasses are imported and will be used in the dispatch map.


20-27: LGTM: Dispatch map correctly implements backend routing.

The _dispatch_map provides clean separation of backend-specific caching logic. Each backend has its own singleton instance, which is appropriate for the cache pattern.


44-72: Well-structured environment variable handling and backend resolution.

The centralized environment variable processing at lines 44-51 correctly establishes a single source of truth. The target normalization and backend resolution flow (lines 53-59) properly handles the "auto" case via resolve_execution_backend. The conditional logging (lines 60-72) provides useful diagnostics without noise when the user explicitly specified a backend.


73-86: Dispatch logic correctly routes to backend-specific cache implementations.

The dispatch check (line 73) and delegation (lines 74-84) properly route kernel caching to the appropriate backend. The error handling (lines 85-86) provides a clear message when an unsupported backend is requested.

tilelang/cache/kernel_cache.py (4)

39-42: LGTM: Instance path attributes enable backend-specific customization.

The introduction of instance-level path attributes (device_kernel_path, host_kernel_path, kernel_lib_path, params_path) provides a clean extension point for backend-specific subclasses to override file naming conventions (e.g., .cubin for NVRTC vs. .so for TVM FFI).


263-310: LGTM: Clean delegation to extensible helper methods.

The refactoring of _save_kernel_to_disk to delegate to _save_kernel_source_code_to_disk, _save_wrapper_kernel_code_to_disk, and _save_so_cubin_to_disk provides clear extension points for backend-specific subclasses while maintaining proper error handling.


312-376: LGTM: Well-structured load flow with clear separation of concerns.

The refactored _load_kernel_from_disk cleanly separates:

  1. Required file validation via _get_required_files() (line 346)
  2. Optional source loading via _load_kernel_source() (line 352)
  3. Parameter deserialization (lines 354-362)
  4. Kernel reconstruction via _build_kernel() (lines 364-376)

This provides clear extension points for subclasses to customize each phase.


439-441: Static analysis false positive: unused arguments are intentional.

The static analysis warning about unused kernel and cache_path parameters is a false positive. This is a base implementation designed to be overridden by subclasses (e.g., NVRTCKernelCache may set adapter state using these parameters). The parameters must remain for interface consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Refactor Kernel Cache System for Multiple Backends

5 participants