-
Notifications
You must be signed in to change notification settings - Fork 357
[Enhancement] Refactor KernelCache to use inheritance-based design #1483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughRefactors kernel caching from a singleton to a dispatch map selecting backend-specific KernelCache subclasses; splits disk I/O into helper methods on KernelCache; adds backend adapters (TVM FFI, NVRTC, CuTeDSL, Cython, Torch) implementing backend-specific save/load logic; updates Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Dispatcher as CacheDispatcher (_dispatch_map)
participant Backend as KernelCache (backend-specific)
participant Adapter
participant FS as Filesystem
Caller->>Dispatcher: cached(func, target, execution_backend, ...)
alt backend found
Dispatcher->>Backend: select backend instance
Dispatcher->>Backend: cached(..., set cache path)
Backend->>Backend: _set_adapter_cache_path(kernel, cache_path)
Backend->>Adapter: request sources/lib/params
Backend->>FS: _get_required_files(cache_path)
opt files missing
Backend->>FS: _save_kernel_source_code_to_disk(...)
Backend->>FS: _save_wrapper_kernel_code_to_disk(...)
Backend->>FS: _save_so_cubin_to_disk(...)
end
Backend->>Backend: _build_kernel(...) -> JITKernel | None
Backend-->>Caller: return JITKernel or None
else unknown backend
Dispatcher-->>Caller: raise ValueError
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-11-14T07:56:11.098ZApplied to files:
🪛 Ruff (0.14.10)tilelang/cache/kernel_cache.py439-439: Unused method argument: (ARG002) 439-439: Unused method argument: (ARG002) tilelang/cache/__init__.py86-86: Avoid specifying long messages outside the exception class (TRY003) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (8)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
I tested them on CUDA with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/cache/__init__.py (1)
39-39: Type hint is inconsistent with dispatch pool keys.The
Literaltype is missing"cutedsl"which exists in_dispatch_pool. Also,Noneis allowed by the type hint but would cause aKeyErrorat runtime since there's noNonekey in the pool.Suggested fix
- execution_backend: Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "torch"] | None = "auto", + execution_backend: Literal["auto", "tvm_ffi", "ctypes", "cython", "nvrtc", "cutedsl", "torch"] = "auto",
🧹 Nitpick comments (6)
tilelang/cache/torch_kernel_cache.py (1)
4-6: Superfluous*args, **kwargsin__new__signature.The base class
KernelCache.__new__(cls)does not accept additional arguments. Since these cache classes are instantiated without arguments in the dispatch pool, the*args, **kwargsparameters are unnecessary and could be removed for clarity.This applies to all similar subclasses in this PR.
Suggested simplification
class TorchKernelCache(KernelCache): - def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) + def __new__(cls): + return super().__new__(cls)tilelang/cache/__init__.py (1)
47-57: Consider adding a guard for invalid backend keys.If an unsupported
execution_backendvalue is passed,_dispatch_pool[execution_backend]will raise a rawKeyError. A more descriptive error could improve debuggability.Optional: Add validation with clearer error
+ if execution_backend not in _dispatch_pool: + raise ValueError( + f"Unsupported execution_backend '{execution_backend}'. " + f"Allowed: {list(_dispatch_pool.keys())}" + ) return _dispatch_pool[execution_backend].cached(tilelang/cache/tvm_ffi_kernel_cache.py (1)
10-11: Remove redundant__new__method.The
__new__method simply delegates to the parent without any custom logic. Since the parentKernelCachealready implements singleton behavior, this override is unnecessary.🔎 Proposed fix
- def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) -tilelang/cache/nvrtc_kernel_cache.py (1)
11-12: Remove redundant__new__method.The
__new__method simply delegates to the parent without any custom logic, making it unnecessary.🔎 Proposed fix
- def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) -tilelang/cache/cutedsl_kernel_cache.py (2)
16-17: Remove redundant__new__method.The
__new__method simply delegates to the parent without any custom logic, making it unnecessary.🔎 Proposed fix
- def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) -
42-43: Consider using unpacking for list concatenation.As suggested by the static analysis tool, unpacking provides cleaner syntax for combining lists.
🔎 Proposed refactor using unpacking
def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
tilelang/cache/__init__.py(2 hunks)tilelang/cache/auto_kernel_cache.py(1 hunks)tilelang/cache/ctypes_kernel_cache.py(1 hunks)tilelang/cache/cutedsl_kernel_cache.py(1 hunks)tilelang/cache/cython_kernel_cache.py(1 hunks)tilelang/cache/kernel_cache.py(4 hunks)tilelang/cache/nvrtc_kernel_cache.py(1 hunks)tilelang/cache/torch_kernel_cache.py(1 hunks)tilelang/cache/tvm_ffi_kernel_cache.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
tilelang/cache/cython_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/torch_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/tvm_ffi_kernel_cache.py (3)
tilelang/cache/kernel_cache.py (4)
KernelCache(25-449)_save_wrapper_kernel_code_to_disk(414-418)_safe_write_file(256-263)_save_so_cubin_to_disk(420-425)tilelang/cache/cutedsl_kernel_cache.py (1)
_save_so_cubin_to_disk(24-39)tilelang/cache/nvrtc_kernel_cache.py (1)
_save_so_cubin_to_disk(23-30)
tilelang/cache/cutedsl_kernel_cache.py (3)
tilelang/cache/kernel_cache.py (6)
KernelCache(25-449)_save_kernel_source_code_to_disk(407-412)_save_so_cubin_to_disk(420-425)_safe_write_file(256-263)_load_binary(250-253)_get_required_files(427-430)tilelang/cache/nvrtc_kernel_cache.py (2)
_save_kernel_source_code_to_disk(14-21)_save_so_cubin_to_disk(23-30)tilelang/cache/tvm_ffi_kernel_cache.py (1)
_save_so_cubin_to_disk(19-24)
tilelang/cache/auto_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/ctypes_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/kernel_cache.py (6)
tilelang/cache/cutedsl_kernel_cache.py (3)
_save_kernel_source_code_to_disk(20-21)_save_so_cubin_to_disk(24-39)_get_required_files(42-43)tilelang/cache/tvm_ffi_kernel_cache.py (2)
_save_wrapper_kernel_code_to_disk(13-17)_save_so_cubin_to_disk(19-24)tilelang/jit/kernel.py (3)
JITKernel(39-794)kernel_source(644-645)get_kernel_source(465-476)tilelang/jit/adapter/cutedsl/adapter.py (1)
get_kernel_source(206-214)tilelang/jit/adapter/nvrtc/adapter.py (1)
get_kernel_source(179-190)tilelang/jit/adapter/ctypes/adapter.py (2)
get_kernel_source(296-302)libpath(282-284)
tilelang/cache/__init__.py (8)
tilelang/cache/auto_kernel_cache.py (1)
AutoKernelCache(4-6)tilelang/cache/ctypes_kernel_cache.py (1)
CTypesKernelCache(4-6)tilelang/cache/cutedsl_kernel_cache.py (1)
CuTeDSLKernelCache(10-49)tilelang/cache/cython_kernel_cache.py (1)
CythonKernelCache(4-6)tilelang/cache/nvrtc_kernel_cache.py (1)
NVRTCKernelCache(7-30)tilelang/cache/torch_kernel_cache.py (1)
TorchKernelCache(4-6)tilelang/cache/tvm_ffi_kernel_cache.py (1)
TVMFFIKernelCache(7-24)tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/nvrtc_kernel_cache.py (2)
tilelang/cache/kernel_cache.py (4)
_save_kernel_source_code_to_disk(407-412)_safe_write_file(256-263)_load_binary(250-253)_save_so_cubin_to_disk(420-425)tilelang/cache/tvm_ffi_kernel_cache.py (1)
_save_so_cubin_to_disk(19-24)
🪛 Ruff (0.14.8)
tilelang/cache/cutedsl_kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (7)
tilelang/cache/auto_kernel_cache.py (1)
4-6: LGTM, same note as TorchKernelCache.The same optional simplification of removing
*args, **kwargsapplies here.tilelang/cache/ctypes_kernel_cache.py (1)
4-6: LGTM, same note as TorchKernelCache.The same optional simplification of removing
*args, **kwargsapplies here.tilelang/cache/cython_kernel_cache.py (1)
4-6: LGTM, same note as TorchKernelCache.The same optional simplification of removing
*args, **kwargsapplies here.tilelang/cache/__init__.py (2)
22-30: Dispatch pool design looks good for the multi-backend refactor.The dictionary-based dispatch cleanly replaces the if-else pattern as intended by issue #1469. Each backend gets its own singleton cache instance.
75-76: Verify this module-level behavior is intentional.If
TILELANG_CLEAR_CACHEenv var is set, this will raiseRuntimeErrorduring import, preventing any use of the module. This seems intentional as a safeguard, but confirm this is the desired UX.tilelang/cache/kernel_cache.py (2)
39-45: Good refactoring: centralized path configuration.Moving from module-level path constants to per-class attributes enables backend-specific path customization without modifying the base class. This aligns well with the PR's goal of eliminating conditional logic.
407-449: Well-structured I/O abstraction.The new helper methods (
_save_kernel_source_code_to_disk,_save_wrapper_kernel_code_to_disk,_save_so_cubin_to_disk,_get_required_files,_load_kernel_source) provide clean extension points for subclasses while centralizing common disk I/O patterns. This significantly improves maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tilelang/cache/__init__.py (1)
47-58: Consider defensive error handling for invalid backend keys.The direct dictionary lookup on line 48 will raise a
KeyErrorif an invalidbackend_keyis provided. While theLiteraltype hint provides compile-time protection, runtime validation would offer better error messages when type checking is bypassed or when backend names are dynamically constructed.🔎 Proposed enhancement
backend_key: str = execution_backend if execution_backend is not None else "auto" +if backend_key not in _dispatch_pool: + valid_backends = ", ".join(_dispatch_pool.keys()) + raise ValueError( + f"Invalid execution_backend '{backend_key}'. " + f"Valid options are: {valid_backends}" + ) return _dispatch_pool[backend_key].cached(
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/gemm/example_gemm.py(1 hunks)tilelang/cache/__init__.py(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
examples/gemm/example_gemm.py
📚 Learning: 2025-12-18T04:50:00.512Z
Learnt from: silentCoder-dev
Repo: tile-ai/tilelang PR: 1464
File: testing/python/language/test_tilelang_language_rand.py:14-14
Timestamp: 2025-12-18T04:50:00.512Z
Learning: In `testing/python/language/test_tilelang_language_rand.py`, the TileLang kernel uses `blk_M = M` (single block) and calls `rng_rand()` four times per element to align results with the Triton implementation, which uses `blk_M = 128` (multiple blocks) and calls the RNG once per element. These differences compensate for internal RNG behavior differences between TileLang and Triton.
Applied to files:
examples/gemm/example_gemm.py
🧬 Code graph analysis (1)
examples/gemm/example_gemm.py (1)
tilelang/jit/kernel.py (1)
out_idx(636-637)
🔇 Additional comments (2)
tilelang/cache/__init__.py (1)
5-20: Clean TYPE_CHECKING usage to avoid circular imports.The use of
TYPE_CHECKINGto importKernelCacheonly for type hints is a good pattern that avoids potential circular import issues at runtime while maintaining type safety.examples/gemm/example_gemm.py (1)
5-5: Good example demonstrating the new backend selection API.The addition of
execution_backend="ctypes"effectively demonstrates the new multi-backend dispatch mechanism. Based on the PR description, this has been tested successfully on CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tilelang/cache/cutedsl_kernel_cache.py (2)
16-17: Remove unnecessary__new__override.The
__new__method simply delegates to the parent without adding any custom logic. Python automatically uses the parent's__new__if not overridden, making this dead code.🔎 Proposed fix
- def __new__(cls, *args, **kwargs): - return super().__new__(cls, *args, **kwargs) -
41-43: Consider using iterable unpacking for list extension.The list concatenation works correctly, but iterable unpacking is more idiomatic and slightly more efficient.
🔎 Proposed fix
@override def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/cache/cutedsl_kernel_cache.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/cache/cutedsl_kernel_cache.py (1)
tilelang/cache/kernel_cache.py (7)
KernelCache(25-449)_save_kernel_source_code_to_disk(407-412)_save_so_cubin_to_disk(420-425)_safe_write_file(256-263)_load_binary(250-253)_get_required_files(427-430)_load_kernel_source(432-449)
🪛 Ruff (0.14.8)
tilelang/cache/cutedsl_kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (3)
tilelang/cache/cutedsl_kernel_cache.py (3)
19-21: Verify no-op implementation is intentional.The method intentionally skips saving kernel source code, deviating from the base class behavior. Confirm this is the correct design for CuTeDSL—does the backend not generate saveable kernel source, or is source code unnecessary for cache restoration?
23-39: LGTM! Launcher artifact handling is well-implemented.The method correctly saves CuTeDSL-specific launcher artifacts (library and optional C++ source) using defensive attribute checks and the parent class's safe write utilities.
45-47: Verify empty string return values are correct.The method returns empty strings rather than
None. The base class can returnNoneto indicate loading failure. Since_save_kernel_source_code_to_diskis a no-op and no source is saved, confirm that returning empty strings (rather thanNone) is the expected behavior for downstream code.Note: The previous review issue regarding method naming (_load_kernel_from_cache vs _load_kernel_source) has been correctly resolved.
LeiWang1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, would it be possible for us to move cache into adapter class itself in https://github.com/tile-ai/tilelang/tree/main/tilelang/jit/adapter ?
For sure I could do that. |
@LeiWang1999 Hi Lei! I tried to do that, but a horrible point is that the |
It seems sufficient to move this into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
14-30: Code duplication between methods.Lines 15-20 and 24-29 contain identical logic for saving the Python wrapper file. This was previously flagged.
tilelang/jit/adapter/kernel_cache.py (1)
19-24: Fix logging to display file path instead of object.Line 23 logs the
executableobject instead ofkernel_lib_path. The debug message should show where the file is being saved (the path), not the object being saved.🔎 Proposed fix
def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): kernel_lib_path = os.path.join(cache_path, self.kernel_lib_path) executable = kernel.adapter.executable if verbose: - self.logger.debug(f"Saving kernel executable to file: {executable}") + self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}") KernelCache._safe_write_executable(executable, kernel_lib_path)
🧹 Nitpick comments (3)
tilelang/jit/adapter/cutedsl/kernel_cache.py (2)
41-43: Consider iterable unpacking for clarity.Per static analysis (RUF005), prefer unpacking over concatenation for combining lists.
🔎 Suggested refactor
@override def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
45-47: Returning empty strings works but is subtly different fromNone.The base class's
_load_kernel_from_diskconditionif ((host_kernel_source and device_kernel_source) or self.execution_backend == "cutedsl")treats empty strings as falsy, so this works due to thecutedslbackend check. However, returning(None, None)would be more explicit and consistent with the base class's error-handling pattern that returnsNoneon failure.🔎 Suggested alternative
@override def _load_kernel_source(self, device_kernel_path: str, host_kernel_path: str, verbose: bool = False) -> tuple[str | None, str | None]: - return "", "" + return None, Nonetilelang/jit/adapter/kernel_cache.py (1)
10-11: Consider whether new override is necessary.The
__new__method simply delegates to the superclass without additional logic. While this pattern is consistent across all backend implementations in the codebase (CTypes, Cython, Torch, NVRTC), it appears redundant if no customization is needed.If this pattern serves a specific purpose (e.g., enforcing singleton behavior or future extensibility), consider adding a comment explaining the rationale. Otherwise, the method could be removed to reduce boilerplate.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tilelang/cache/__init__.py(2 hunks)tilelang/jit/adapter/ctypes/kernel_cache.py(1 hunks)tilelang/jit/adapter/cutedsl/kernel_cache.py(1 hunks)tilelang/jit/adapter/cython/kernel_cache.py(1 hunks)tilelang/jit/adapter/kernel_cache.py(1 hunks)tilelang/jit/adapter/nvrtc/kernel_cache.py(1 hunks)tilelang/jit/adapter/torch/kernel_cache.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/jit/adapter/cutedsl/kernel_cache.py (3)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)tilelang/jit/adapter/nvrtc/kernel_cache.py (2)
_save_kernel_source_code_to_disk(14-21)_save_so_cubin_to_disk(23-30)tilelang/jit/adapter/kernel_cache.py (1)
_save_so_cubin_to_disk(19-24)
tilelang/jit/adapter/torch/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-449)
tilelang/cache/__init__.py (5)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
CTypesKernelCache(4-6)tilelang/jit/adapter/cython/kernel_cache.py (1)
CythonKernelCache(4-6)tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
NVRTCKernelCache(7-30)tilelang/jit/adapter/torch/kernel_cache.py (1)
TorchKernelCache(4-6)tilelang/jit/adapter/kernel_cache.py (1)
TVMFFIKernelCache(7-24)
🪛 Ruff (0.14.8)
tilelang/jit/adapter/cutedsl/kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
tilelang/cache/__init__.py
61-61: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (13)
tilelang/jit/adapter/cython/kernel_cache.py (1)
4-6: Minimal adapter follows established pattern.The subclass correctly delegates to the base
KernelCache. Note that the singleton implementation in the base class stores the instance onKernelCache._instancedirectly, meaning all backend-specific subclasses share the same singleton—this appears intentional for a unified cache but is worth being aware of.tilelang/jit/adapter/torch/kernel_cache.py (1)
4-6: LGTM!The
TorchKernelCachesubclass correctly delegates to the base class and follows the consistent adapter pattern established across backends.tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
7-12: Class attributes and new are correctly defined.The
kernel_lib_path = "kernel.cubin"appropriately overrides the base class default for the NVRTC backend.tilelang/jit/adapter/cutedsl/kernel_cache.py (3)
10-17: Class attributes and structure are well-defined.The CuTeDSL-specific paths (
launcher_lib.so,launcher.cpp) are appropriately declared as class attributes.
19-21: No-op override is intentional for CuTeDSL.The CuTeDSL backend doesn't persist kernel source in the traditional manner, relying on launcher artifacts instead.
23-39: Defensive attribute access is appropriate.Good use of
getattrandhasattrto safely check for optionallib_generatorandlauncher_cpp_codeattributes before attempting to save.tilelang/jit/adapter/ctypes/kernel_cache.py (1)
4-6: LGTM!The
CTypesKernelCachesubclass follows the consistent minimal adapter pattern.tilelang/cache/__init__.py (4)
5-18: LGTM! Clean TYPE_CHECKING usage prevents circular imports.The TYPE_CHECKING conditional import pattern properly isolates the KernelCache type reference for type checking while avoiding runtime circular dependencies. All backend implementations are imported and ready for the dispatch pool.
20-28: Well-structured dispatch pool with singleton instances.The dispatch pool architecture cleanly maps backend names to their singleton KernelCache implementations. This eliminates the conditional branching that was present in the previous design.
45-59: Dispatch logic correctly routes to backend implementations.The normalization of "auto"/None to "tvm_ffi" provides a sensible default, and all parameters are properly forwarded to the selected backend's cached() method.
60-61: Exception handling addresses previous feedback.The ValueError for unsupported backends implements the behavior requested in past review comments. The inline error message is clear and actionable.
Note: Ruff suggests moving the message to an exception subclass (TRY003), but this inline approach is acceptable for a simple validation error.
The static analysis hint is optional; if you prefer to follow strict style guidelines, consider defining a custom exception class, but the current implementation is functionally correct.
tilelang/jit/adapter/kernel_cache.py (2)
7-8: LGTM! Backend-specific library path is properly defined.The TVMFFIKernelCache subclass appropriately sets
kernel_lib_path = "executable.so"for the TVM FFI backend, following the pattern used by other backend implementations (e.g., NVRTC uses "kernel.cubin").
13-17: LGTM! Wrapper kernel saving logic is correct.The method properly constructs the host kernel path, logs the correct path variable when verbose, and safely writes the host source using the parent class helper method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tilelang/cache/__init__.py (1)
37-37: Missing "cutedsl" in execution_backend type hint.The
_dispatch_poolincludes"cutedsl"(line 26), but the type hint forexecution_backendon line 37 omits it. This creates a type-checking inconsistency where the backend is available at runtime but not reflected in the type signature.- execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch"] | None = None, + execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None,Note: This issue was already flagged in a previous review comment.
🧹 Nitpick comments (1)
tilelang/cache/__init__.py (1)
58-58: Optional: Consider Ruff hint about exception message format.Ruff suggests avoiding long messages directly in exception raises (TRY003). For this simple case, the current format is clear and maintainable, so this is purely a linting preference.
If you want to strictly follow the linter suggestion:
+_UNSUPPORTED_BACKEND_MSG = 'Cannot find support for execution backend "{}"' + def cached( ... else: - raise ValueError(f'Cannot find support for execution backend "{execution_backend}"') + raise ValueError(_UNSUPPORTED_BACKEND_MSG.format(execution_backend))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/cache/__init__.py(2 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
tilelang/cache/__init__.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (2)
tilelang/cache/__init__.py (2)
20-28: Good refactoring: dispatch pool centralizes backend selection.The dispatch pool cleanly maps backend keys to singleton instances, eliminating the if-else branching mentioned in the PR objectives. This design makes adding new backends straightforward.
10-15: All imported KernelCache implementations verified and exist. The refactored dispatch system has all six required backend-specific classes properly defined in their respective modules. No issues found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tilelang/cache/__init__.py (1)
37-37: Critical: Defaultexecution_backend=Nonewill always fail.The parameter defaults to
None, butNoneis not a key in_dispatch_pool(lines 21-28). Every call tocached()without an explicitexecution_backendwill raiseValueErrorat line 58, breaking the public API.Based on past review comments, the old implementation supported
"auto"mode for backend detection. This refactor removed that logic but didn't update the default value.🔎 Recommended fixes
Option 1: Restore auto-detection for None (preserves backward compatibility)
+ # Auto-detect backend if not specified + if execution_backend is None: + # Determine the best available backend based on target + if isinstance(target, str) and target == "auto": + execution_backend = "tvm_ffi" # or implement detection logic + else: + # Parse target to determine appropriate backend + execution_backend = "tvm_ffi" # placeholder + if execution_backend in _dispatch_pool:Option 2: Use a valid default backend
- execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None, + execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] = "tvm_ffi",Option 3: Make execution_backend required
- execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"] | None = None, + execution_backend: Literal["tvm_ffi", "ctypes", "cython", "nvrtc", "torch", "cutedsl"],
🧹 Nitpick comments (2)
tilelang/cache/__init__.py (2)
20-28: Consider lazy initialization of backend instances.All backend instances are created at module import time, even if only one backend will be used. For better startup performance, consider lazy initialization:
🔎 Optional refactor for lazy initialization
-_dispatch_pool: dict[str, KernelCache] = { - "tvm_ffi": TVMFFIKernelCache(), - "ctypes": CTypesKernelCache(), - "cython": CythonKernelCache(), - "nvrtc": NVRTCKernelCache(), - "cutedsl": CuTeDSLKernelCache(), - "torch": TorchKernelCache(), -} +_dispatch_pool: dict[str, KernelCache] = {} + +def _get_backend_cache(backend: str) -> KernelCache: + """Get or create a backend cache instance.""" + if backend not in _dispatch_pool: + backend_classes = { + "tvm_ffi": TVMFFIKernelCache, + "ctypes": CTypesKernelCache, + "cython": CythonKernelCache, + "nvrtc": NVRTCKernelCache, + "cutedsl": CuTeDSLKernelCache, + "torch": TorchKernelCache, + } + if backend in backend_classes: + _dispatch_pool[backend] = backend_classes[backend]() + return _dispatch_pool.get(backend)Then update line 46 to use
_get_backend_cache(execution_backend).
57-58: Consider extracting error message to improve maintainability.Static analysis flags this as TRY003 (long message outside exception class). While the current message is clear and acceptable, extracting it can improve consistency if you add more validation errors later.
🔎 Optional refactor
+_UNSUPPORTED_BACKEND_ERROR = ( + 'Cannot find support for execution backend "{}". ' + f'Supported backends: {", ".join(_dispatch_pool.keys())}' +) + def cached( ... ) -> JITKernel: ... if execution_backend in _dispatch_pool: ... else: - raise ValueError(f'Cannot find support for execution backend "{execution_backend}"') + raise ValueError(_UNSUPPORTED_BACKEND_ERROR.format(execution_backend))
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tilelang/cache/__init__.pytilelang/jit/adapter/nvrtc/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tilelang/jit/adapter/nvrtc/kernel_cache.py
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/cache/__init__.py (1)
tilelang/cache/kernel_cache.py (2)
KernelCache(25-449)cached(113-227)
🪛 Ruff (0.14.8)
tilelang/cache/__init__.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
🔇 Additional comments (1)
tilelang/cache/__init__.py (1)
5-18: LGTM! Clean import structure with proper TYPE_CHECKING usage.The TYPE_CHECKING pattern correctly avoids circular import issues while maintaining type safety. The backend-specific imports are well-organized.
SiriusNEO
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just some comments to go
2e3be15 to
1c0c714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
1-4: Consider adding a docstring to document the placeholder implementation.The empty class body is valid but would benefit from a docstring explaining that CTypesKernelCache currently uses all default KernelCache behavior and may be extended in the future with ctypes-specific logic.
🔎 Suggested documentation
+"""CTypes backend kernel cache implementation.""" from tilelang.cache.kernel_cache import KernelCache -class CTypesKernelCache(KernelCache): ... +class CTypesKernelCache(KernelCache): + """ + Kernel cache for the ctypes execution backend. + + Currently uses default KernelCache behavior. Backend-specific + customizations can be added here as needed. + """ + ...tilelang/jit/adapter/cython/kernel_cache.py (1)
1-4: Consider adding a docstring to document the placeholder implementation.Similar to CTypesKernelCache, this empty class would benefit from documentation explaining it's a placeholder for Cython-specific cache behavior.
🔎 Suggested documentation
+"""Cython backend kernel cache implementation.""" from tilelang.cache.kernel_cache import KernelCache -class CythonKernelCache(KernelCache): ... +class CythonKernelCache(KernelCache): + """ + Kernel cache for the Cython execution backend. + + Currently uses default KernelCache behavior. Backend-specific + customizations can be added here as needed. + """ + ...tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
41-43: Consider iterable unpacking for concatenation.The list concatenation works correctly but could use iterable unpacking for slightly better performance.
🔎 Optional optimization
def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
tilelang/cache/__init__.pytilelang/cache/kernel_cache.pytilelang/jit/adapter/ctypes/kernel_cache.pytilelang/jit/adapter/cutedsl/kernel_cache.pytilelang/jit/adapter/cython/kernel_cache.pytilelang/jit/adapter/kernel_cache.pytilelang/jit/adapter/nvrtc/kernel_cache.pytilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tilelang/jit/adapter/torch/kernel_cache.py
- tilelang/jit/adapter/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
tilelang/cache/__init__.py
🧬 Code graph analysis (4)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-475)
tilelang/jit/adapter/ctypes/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-475)
tilelang/jit/adapter/nvrtc/kernel_cache.py (2)
tilelang/cache/kernel_cache.py (4)
KernelCache(25-475)_save_so_cubin_to_disk(412-417)_safe_write_file(253-260)_load_binary(247-250)tilelang/jit/adapter/kernel_cache.py (1)
_save_so_cubin_to_disk(16-21)
tilelang/cache/kernel_cache.py (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (6)
_set_adapter_cache_path(50-52)_save_kernel_source_code_to_disk(20-21)_save_so_cubin_to_disk(24-39)_get_required_files(42-43)_load_kernel_source(46-47)_build_kernel(55-83)
🪛 Ruff (0.14.10)
tilelang/cache/__init__.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
tilelang/jit/adapter/cutedsl/kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
tilelang/cache/kernel_cache.py
443-443: Unused method argument: kernel
(ARG002)
443-443: Unused method argument: cache_path
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (14)
tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
7-18: LGTM! Clean NVRTC-specific implementation.The class correctly customizes paths for NVRTC's cubin output and implements the necessary logic to persist the Python wrapper alongside the binary. The previous duplicate logic concern has been addressed.
tilelang/cache/__init__.py (3)
21-28: Clean dispatch mechanism successfully replaces singleton pattern.The backend dispatch map provides a clear, extensible way to route kernel cache operations to backend-specific implementations. This aligns with the PR objective to eliminate if-else branching.
37-37: Good choice of default execution backend.Setting the default to
"tvm_ffi"addresses previous review concerns aboutNonecausing errors. This provides a sensible fallback while still allowing explicit backend selection.Based on past review comments, this follows the suggestion to use "tvm_ffi" as the default for internal APIs.
45-58: Dispatch logic is straightforward and correct.The if-else routing to backend-specific cache instances is clear. The ValueError provides a helpful message for unsupported backends.
Note: The static analysis hint (TRY003) about the long error message is a stylistic suggestion. The current inline message is acceptable and clear for this use case.
tilelang/jit/adapter/cutedsl/kernel_cache.py (4)
13-21: CuTeDSL-specific paths and no-op kernel source save are appropriate.The custom paths for launcher artifacts align with CuTeDSL's C++ launcher design. The early return in
_save_kernel_source_code_to_diskindicates CuTeDSL handles kernel sources differently than other backends.
23-39: Defensive attribute checking for launcher artifacts is good practice.The use of
getattr()andhasattr()ensures the code handles cases where the adapter or lib_generator may not have the expected launcher attributes. The verbose logging aids debugging.
45-52: Appropriate overrides for CuTeDSL's kernel loading behavior.Returning empty strings from
_load_kernel_sourceand propagating the cache path to the adapter reflect CuTeDSL's different approach to managing kernel sources and artifacts.
54-83: Correct kernel reconstruction from cached parameters.The
_build_kernelimplementation properly delegates toJITKernel.from_databasewhen parameters are available, which is the expected behavior for loading cached kernels.tilelang/cache/kernel_cache.py (6)
39-43: Excellent use of class attributes for path customization.Moving path constants from module-level to class attributes allows backend-specific subclasses to override paths as needed (e.g., NVRTC using "kernel.cubin" instead of "kernel_lib.so"). This is a key enabler of the inheritance-based design.
290-303: Well-structured delegation to specialized save methods.Breaking
_save_kernel_to_diskinto separate calls to_save_kernel_source_code_to_disk,_save_wrapper_kernel_code_to_disk, and_save_so_cubin_to_diskprovides clear override points for backend-specific behavior while maintaining error handling at the orchestration level.
351-380: Clean refactor of disk loading logic with extensible helper methods.The extraction of
_get_required_files,_load_kernel_source, and_build_kernelsuccessfully eliminates conditional logic while enabling backends like CuTeDSL to customize file requirements and loading behavior.
399-441: New helper methods provide excellent override points for subclasses.These methods establish the template method pattern, allowing each backend to customize specific aspects:
_save_kernel_source_code_to_disk: Device kernel persistence_save_wrapper_kernel_code_to_disk: Host wrapper persistence_save_so_cubin_to_disk: Library/binary persistence_get_required_files: File existence checks_load_kernel_source: Source code loadingThis design successfully achieves the PR objective of eliminating if-else branching in favor of polymorphism.
443-444: Hook method with unused parameters is intentional design.The
_set_adapter_cache_pathmethod serves as an extension point for subclasses (e.g., CuTeDSLKernelCache overrides this at line 50-52 of tilelang/jit/adapter/cutedsl/kernel_cache.py to propagate the cache path). The unused parameters in the base implementation are expected for hook methods.The static analysis warnings for unused arguments (ARG002) are false positives in this context.
446-475: Flexible kernel reconstruction logic supports diverse backends.The
_build_kernelmethod provides a sensible default that works for most backends while allowing specialized subclasses (like CuTeDSL) to implement custom reconstruction logic based on available artifacts.
be091a6 to
aba3d29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
tilelang/jit/adapter/kernel_cache.py (1)
16-21: Fix incorrect variable in log message.Line 20 logs the
Executableobject instead of the destination pathkernel_lib_path. The log should indicate where the file is being saved.🔎 Proposed fix
executable = kernel.adapter.executable if verbose: - self.logger.debug(f"Saving kernel executable to file: {executable}") + self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}") KernelCache._safe_write_executable(executable, kernel_lib_path)tilelang/cache/kernel_cache.py (2)
406-410: Add null check for adapter's kernel source.Some adapters (e.g., CuTeDSL, NVRTC) may return
Nonefromget_kernel_source(). PassingNonetofile.write()will raise aTypeError, which gets caught and logged with a generic error message that doesn't indicate the root cause.🔎 Proposed fix
def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): host_kernel_path = os.path.join(cache_path, self.host_kernel_path) + source = kernel.adapter.get_kernel_source() + if source is None: + if verbose: + self.logger.debug("Skipping host kernel source save: adapter returned None") + return if verbose: self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}") - KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source())) + KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))
446-476: Use explicit None checks instead of truthiness for source validation.Line 460 uses truthiness checks that conflate two distinct cases:
None: Source file not found (error condition)"": Source file exists but is empty (valid for backends like CuTeDSL)The CuTeDSL adapter confirms this is problematic—it intentionally returns
"", ""from_load_kernel_sourcebut must override_build_kernelto only checkkernel_paramsbecause the base class truthiness check incorrectly rejects empty strings.🔎 Recommended fix
- if host_kernel_source and device_kernel_source and kernel_params: + if host_kernel_source is not None and device_kernel_source is not None and kernel_params is not None: return JITKernel.from_database(This allows backends that don't use kernel source files to return empty strings while still detecting actual load failures (None).
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
41-43: Consider using iterable unpacking for cleaner syntax.The list concatenation works correctly but could be more idiomatic.
🔎 Proposed refactor
def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]Based on static analysis hint.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
tilelang/cache/__init__.pytilelang/cache/kernel_cache.pytilelang/jit/adapter/ctypes/kernel_cache.pytilelang/jit/adapter/cutedsl/kernel_cache.pytilelang/jit/adapter/cython/kernel_cache.pytilelang/jit/adapter/kernel_cache.pytilelang/jit/adapter/nvrtc/kernel_cache.pytilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tilelang/jit/adapter/torch/kernel_cache.py
- tilelang/jit/adapter/ctypes/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
tilelang/cache/__init__.py
🧬 Code graph analysis (4)
tilelang/jit/adapter/cython/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (1)
KernelCache(25-476)
tilelang/jit/adapter/cutedsl/kernel_cache.py (5)
tilelang/language/ast/ir.py (1)
target(1677-1707)tilelang/cache/kernel_cache.py (7)
KernelCache(25-476)_save_kernel_source_code_to_disk(399-404)_save_so_cubin_to_disk(412-417)_safe_write_file(252-259)_load_binary(246-249)_load_kernel_source(424-441)_set_adapter_cache_path(443-444)tilelang/engine/param.py (1)
KernelParam(13-141)tilelang/jit/adapter/kernel_cache.py (1)
_save_so_cubin_to_disk(16-21)tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
_save_so_cubin_to_disk(11-18)
tilelang/jit/adapter/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (2)
_safe_write_file(252-259)_safe_write_executable(262-265)
tilelang/cache/kernel_cache.py (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (6)
_set_adapter_cache_path(50-52)_save_kernel_source_code_to_disk(20-21)_save_so_cubin_to_disk(24-39)_get_required_files(42-43)_load_kernel_source(46-47)_build_kernel(55-84)
🪛 Ruff (0.14.10)
tilelang/jit/adapter/cutedsl/kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
tilelang/cache/__init__.py
58-58: Avoid specifying long messages outside the exception class
(TRY003)
tilelang/cache/kernel_cache.py
443-443: Unused method argument: kernel
(ARG002)
443-443: Unused method argument: cache_path
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (15)
tilelang/jit/adapter/cython/kernel_cache.py (1)
1-4: LGTM! Valid placeholder implementation.The minimal CythonKernelCache subclass correctly inherits from KernelCache and can be extended with Cython-specific behavior as needed.
tilelang/jit/adapter/kernel_cache.py (1)
10-14: LGTM! TVM-specific wrapper save logic is correct.The method properly saves the host source code using the TVM adapter's
get_host_source()method.tilelang/jit/adapter/cutedsl/kernel_cache.py (4)
13-17: LGTM! CuTeDSL-specific path configuration.The custom paths for launcher library and C++ source are appropriate for the CuTeDSL backend's architecture.
19-21: LGTM! Intentional no-op for CuTeDSL.CuTeDSL doesn't require device kernel source persistence, so the no-op implementation is correct for this backend.
23-39: LGTM! Robust launcher artifact persistence.The method correctly handles optional launcher library and C++ source with defensive existence checks.
45-83: LGTM! CuTeDSL-specific load and build logic.The implementation correctly:
- Returns empty strings for kernel source (not needed for CuTeDSL)
- Propagates cache path to the adapter
- Builds kernel based solely on kernel_params (since source strings are always empty)
This aligns with CuTeDSL's architecture where the launcher library is the primary artifact.
tilelang/cache/__init__.py (2)
5-28: LGTM! Clean dispatch registry implementation.The dispatch map architecture successfully replaces the if-else branching mentioned in the PR objectives. The TYPE_CHECKING guard and backend instantiation are well-structured.
31-58: LGTM! Dispatch logic successfully eliminates if-else branching.The function properly routes to backend-specific implementations via the dispatch map, achieving the refactoring goal stated in the PR objectives. The default backend and error handling are appropriate.
tilelang/cache/kernel_cache.py (7)
39-42: LGTM! Class-level path attributes enable inheritance-based dispatch.These attributes form the foundation of the inheritance-based refactoring, allowing subclasses to customize paths without modifying base class logic. This successfully addresses the PR objective of eliminating if-else branching.
219-219: LGTM! Extracted cache path setting enables subclass customization.The helper method allows backends like CuTeDSL to customize cache path propagation while keeping the base implementation simple.
287-314: LGTM! Helper method extraction enables backend-specific behavior.The refactored save logic successfully moves backend-specific handling to overridable methods, eliminating the need for if-else branching in the base class.
345-380: LGTM! Load logic properly delegates to customizable helpers.The refactored loading flow uses class-level paths and overridable helper methods, allowing backends to customize required files, source loading, and kernel building without modifying the base flow.
399-404: LGTM! Device kernel source save with proper null check.The method correctly checks for None before writing, and the override mechanism allows backends like CuTeDSL to skip this step entirely.
412-441: LGTM! I/O helper implementations are robust.The save/load helper methods properly:
- Use class-level path attributes
- Handle exceptions gracefully
- Return None on load failures
- Enable subclass customization
This design cleanly separates concerns and eliminates backend-specific branching.
443-444: Intentional no-op base implementation.The empty method with unused parameters is a valid extension point for subclasses. CuTeDSL overrides this to propagate the cache path to its adapter. The static analysis warning can be safely ignored in this case.
| def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): | ||
| src_lib_path = kernel.adapter.libpath | ||
| kernel_py_path = os.path.join(cache_path, self.kernel_py_path) | ||
| src_lib_path = src_lib_path.replace(".cubin", ".py") | ||
| if verbose: | ||
| self.logger.debug(f"Saving kernel nvrtc python code to file: {kernel_py_path}") | ||
| KernelCache._safe_write_file(kernel_py_path, "wb", lambda file: file.write(KernelCache._load_binary(src_lib_path))) | ||
| super()._save_so_cubin_to_disk(kernel, cache_path, verbose) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate file extension assumption before string replacement.
Line 14 assumes kernel.adapter.libpath ends with .cubin, but performs a blind replace(".cubin", ".py"). If the libpath doesn't contain .cubin, this silently fails and could lead to incorrect paths.
🔎 Proposed fix to validate the extension
def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False):
src_lib_path = kernel.adapter.libpath
kernel_py_path = os.path.join(cache_path, self.kernel_py_path)
- src_lib_path = src_lib_path.replace(".cubin", ".py")
+ if not src_lib_path.endswith(".cubin"):
+ self.logger.warning(f"Expected .cubin extension but got: {src_lib_path}")
+ src_py_path = src_lib_path.replace(".cubin", ".py")
if verbose:
self.logger.debug(f"Saving kernel nvrtc python code to file: {kernel_py_path}")
- KernelCache._safe_write_file(kernel_py_path, "wb", lambda file: file.write(KernelCache._load_binary(src_lib_path)))
+ KernelCache._safe_write_file(kernel_py_path, "wb", lambda file: file.write(KernelCache._load_binary(src_py_path)))
super()._save_so_cubin_to_disk(kernel, cache_path, verbose)🤖 Prompt for AI Agents
In tilelang/jit/adapter/nvrtc/kernel_cache.py around lines 11 to 18, the code
blindly replaces ".cubin" with ".py" on kernel.adapter.libpath which can
silently produce incorrect paths if the string doesn't contain that extension;
validate the extension first (e.g., if libpath.endswith(".cubin") then replace,
otherwise either compute the .py path using os.path.splitext(libpath)[0] + ".py"
or log and raise a clear error) and update the call to KernelCache._load_binary
to use the validated/constructed path so you don't silently operate on a wrong
filename.
aba3d29 to
96bf90c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
tilelang/jit/adapter/kernel_cache.py (1)
16-21: Fix incorrect variable in log message.Line 20 logs
executable(theExecutableobject) instead ofkernel_lib_path(the destination path). The log message should indicate where the file is being saved, not the object being saved.🔎 Proposed fix
def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): kernel_lib_path = os.path.join(cache_path, self.kernel_lib_path) executable = kernel.adapter.executable if verbose: - self.logger.debug(f"Saving kernel executable to file: {executable}") + self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}") KernelCache._safe_write_executable(executable, kernel_lib_path)tilelang/cache/kernel_cache.py (2)
406-410: Add null check forget_kernel_source()return value.Some adapters (e.g., CuTeDSLKernelAdapter, NVRTCKernelAdapter) can return
Nonefromget_kernel_source(). The current implementation passes the result directly tofile.write(), which will raise aTypeErrorifNoneis returned. The exception gets caught but logged with a misleading generic message.🔎 Proposed fix
def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): host_kernel_path = os.path.join(cache_path, self.host_kernel_path) if verbose: self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}") - KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source())) + source = kernel.adapter.get_kernel_source() + if source is None: + self.logger.debug("Skipping host kernel source save: adapter returned None") + return + KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))
460-476: Use explicitNonechecks instead of truthiness for kernel source validation.The condition
if host_kernel_source and device_kernel_source and kernel_params:uses truthiness checks that conflate two distinct cases:
None: Source file not found (error)"": Source file exists but is intentionally empty (valid)The CuTeDSLKernelCache works around this by overriding
_build_kernelto only checkkernel_params, but other backends that inherit this base implementation could fail if they need to return empty source strings.🔎 Proposed fix
def _build_kernel( self, func: Callable | None, host_kernel_source: str, device_kernel_source: str, kernel_lib_path: str, kernel_params: list[KernelParam] | None, target: str | Target, target_host: str | Target | None, out_idx: list[int] | None, execution_backend: Literal["tvm_ffi", "cython", "nvrtc", "torch", "cutedsl"], pass_configs: dict | None, compile_flags: list[str] | str | None, ) -> JITKernel | None: - if host_kernel_source and device_kernel_source and kernel_params: + if host_kernel_source is not None and device_kernel_source is not None and kernel_params is not None: return JITKernel.from_database(
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
41-43: Consider using iterable unpacking for list concatenation.Static analysis suggests replacing list concatenation with iterable unpacking for slightly better performance and idiomatic Python.
🔎 Proposed fix
@override def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
tilelang/cache/__init__.pytilelang/cache/kernel_cache.pytilelang/jit/adapter/cutedsl/kernel_cache.pytilelang/jit/adapter/cython/kernel_cache.pytilelang/jit/adapter/kernel_cache.pytilelang/jit/adapter/nvrtc/kernel_cache.pytilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tilelang/jit/adapter/cython/kernel_cache.py
- tilelang/jit/adapter/torch/kernel_cache.py
- tilelang/jit/adapter/nvrtc/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
tilelang/cache/__init__.py
🧬 Code graph analysis (1)
tilelang/cache/__init__.py (6)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
CuTeDSLKernelCache(13-84)tilelang/jit/adapter/cython/kernel_cache.py (1)
CythonKernelCache(4-4)tilelang/jit/adapter/nvrtc/kernel_cache.py (1)
NVRTCKernelCache(7-18)tilelang/jit/adapter/torch/kernel_cache.py (1)
TorchKernelCache(4-4)tilelang/jit/adapter/kernel_cache.py (1)
TVMFFIKernelCache(7-21)tilelang/cache/kernel_cache.py (2)
KernelCache(25-476)cached(110-223)
🪛 Ruff (0.14.10)
tilelang/cache/__init__.py
56-56: Avoid specifying long messages outside the exception class
(TRY003)
tilelang/cache/kernel_cache.py
443-443: Unused method argument: kernel
(ARG002)
443-443: Unused method argument: cache_path
(ARG002)
tilelang/jit/adapter/cutedsl/kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (12)
tilelang/jit/adapter/kernel_cache.py (1)
7-14: LGTM!The
TVMFFIKernelCacheclass correctly inherits fromKernelCacheand overrides the appropriate methods for TVM FFI-specific behavior. Thehost_kernel_pathattribute is properly inherited from the base class.tilelang/jit/adapter/cutedsl/kernel_cache.py (3)
23-39: LGTM!The
_save_so_cubin_to_diskoverride correctly handles CuTeDSL-specific launcher library and C++ source saving with proper defensive checks.
54-84: LGTM!The
_build_kerneloverride correctly checks onlykernel_paramssince CuTeDSL's_load_kernel_sourceintentionally returns empty strings. This properly handles the CuTeDSL-specific loading behavior.
49-52: LGTM!The
_set_adapter_cache_pathoverride correctly propagates the cache path to the CuTeDSL adapter.tilelang/cache/__init__.py (3)
19-26: LGTM!The dispatch map correctly maps execution backends to their respective
KernelCachesubclass instances, replacing the previous singleton pattern with a clean inheritance-based dispatch mechanism.
29-56: LGTM!The
cached()function correctly dispatches to the appropriate backend cache via the dispatch map. The defaultexecution_backend="tvm_ffi"properly addresses previous review feedback about theNonedefault causing errors.
74-75: Module-levelclear_cache()call will crash on import when env var is set.When
TILELANG_CLEAR_CACHEis set to a truthy value, callingclear_cache()at module import time will raise aRuntimeError, crashing the application. This may be the intended behavior to force users to acknowledge the risk, but it could be unexpected for users setting this environment variable.Consider whether this is the intended UX. If users are expected to manually delete the cache, you might want to just log a warning instead of raising an exception at import time:
if env.TILELANG_CLEAR_CACHE.lower() in ("1", "true", "yes", "on"): import warnings warnings.warn( f"TILELANG_CLEAR_CACHE is set but automatic cache clearing is disabled. " f"To clear the cache, manually run: rm -rf '{env.TILELANG_CACHE_DIR}'" )tilelang/cache/kernel_cache.py (5)
39-42: LGTM!Moving path constants to class-level attributes is a clean refactor that enables backend-specific subclasses to override paths as needed.
443-444: No-op hook method is appropriate for inheritance pattern.The empty
_set_adapter_cache_pathmethod serves as a hook that subclasses (likeCuTeDSLKernelCache) can override. The static analysis warnings about unused arguments are false positives for this design pattern.
424-441: LGTM!The
_load_kernel_sourcemethod correctly handles file loading with proper exception handling and logging. It returnsNonefor failed loads, allowing callers to handle missing sources appropriately.
419-422: LGTM!The
_get_required_filesmethod correctly returns only the essential files (kernel library and parameters). Source files are optional, and subclasses can extend this list as needed.
399-404: LGTM!The
_save_kernel_source_code_to_diskmethod correctly checks forNonebefore writing, avoiding potentialTypeErrorexceptions.
silentCoder-dev
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
|
Could you also update CI tests to cover kernel cache? |
Sorry I'm not familiar with this CI😭. Which part should I start with? In |
|
You should modify https://github.com/tile-ai/tilelang/blob/main/.github/workflows/ci.yml |
|
I think running https://github.com/tile-ai/tilelang/tree/main/testing/python/jit twice should be enough.
|
SiriusNEO
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. After adding the test we are good to go
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
96bf90c to
6195770
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
tilelang/jit/adapter/kernel_cache.py (1)
16-21: Fix incorrect variable in log message.Line 20 logs
executable(theExecutableobject) instead ofkernel_lib_path(the destination path). The log message should indicate where the file is being saved.🔎 Proposed fix
def _save_so_cubin_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): kernel_lib_path = os.path.join(cache_path, self.kernel_lib_path) executable = kernel.adapter.executable if verbose: - self.logger.debug(f"Saving kernel executable to file: {executable}") + self.logger.debug(f"Saving kernel executable to file: {kernel_lib_path}") KernelCache._safe_write_executable(executable, kernel_lib_path)tilelang/cache/kernel_cache.py (2)
429-433: Add null check forget_kernel_source()return value.Some adapters (e.g.,
CuTeDSLKernelAdapter,NVRTCKernelAdapter) can returnNonefromget_kernel_source(). PassingNonetofile.write()will cause aTypeErrorthat gets caught with a misleading generic message.🔎 Proposed fix
def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): host_kernel_path = os.path.join(cache_path, self.host_kernel_path) if verbose: self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}") - KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source())) + source = kernel.adapter.get_kernel_source() + if source is None: + self.logger.debug("Skipping host kernel source save: adapter returned None") + return + KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))
483-499: Use explicit None checks instead of truthiness for kernel source validation.The condition
if host_kernel_source and device_kernel_source and kernel_params:uses truthiness checks that conflateNone(source not found) with""(source exists but is empty). TheCuTeDSLKernelCache._load_kernel_sourceintentionally returns"", ""which would fail this check.🔎 Proposed fix
- if host_kernel_source and device_kernel_source and kernel_params: + if host_kernel_source is not None and device_kernel_source is not None and kernel_params is not None: return JITKernel.from_database(
🧹 Nitpick comments (1)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
41-43: Consider using iterable unpacking instead of concatenation.Per Ruff RUF005, iterable unpacking is slightly more idiomatic:
🔎 Proposed fix
@override def _get_required_files(self, cache_path: str) -> list[str]: - return super()._get_required_files(cache_path) + [os.path.join(cache_path, self.launcher_lib_path)] + return [*super()._get_required_files(cache_path), os.path.join(cache_path, self.launcher_lib_path)]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
.github/workflows/ci.ymltilelang/cache/__init__.pytilelang/cache/kernel_cache.pytilelang/jit/adapter/cutedsl/kernel_cache.pytilelang/jit/adapter/cython/kernel_cache.pytilelang/jit/adapter/kernel_cache.pytilelang/jit/adapter/nvrtc/kernel_cache.pytilelang/jit/adapter/torch/kernel_cache.py
🚧 Files skipped from review as they are similar to previous changes (3)
- tilelang/jit/adapter/torch/kernel_cache.py
- tilelang/jit/adapter/cython/kernel_cache.py
- tilelang/jit/adapter/nvrtc/kernel_cache.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/ci.yml
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
.github/workflows/ci.ymltilelang/cache/__init__.py
🧬 Code graph analysis (2)
tilelang/jit/adapter/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (4)
KernelCache(25-499)_safe_write_file(275-282)_save_so_cubin_to_disk(435-440)_safe_write_executable(285-288)
tilelang/jit/adapter/cutedsl/kernel_cache.py (1)
tilelang/cache/kernel_cache.py (6)
KernelCache(25-499)_save_so_cubin_to_disk(435-440)_safe_write_file(275-282)_load_binary(269-272)_get_required_files(442-445)_load_kernel_source(447-464)
🪛 Ruff (0.14.10)
tilelang/cache/kernel_cache.py
466-466: Unused method argument: kernel
(ARG002)
466-466: Unused method argument: cache_path
(ARG002)
tilelang/jit/adapter/cutedsl/kernel_cache.py
43-43: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
tilelang/cache/__init__.py
56-56: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (10)
.github/workflows/ci.yml (1)
376-404: LGTM! CI steps align with the requested two-pass JIT testing strategy.The implementation correctly:
- Runs JIT tests twice to catch cache-related regressions (per lucifer1004's suggestion)
- Disables cache on pass 2 via
TILELANG_DISABLE_CACHE=1to avoid postproc conflicts- Keeps CuTeDSL tests separate (run with
TILELANG_USE_GEMM_V1=1in a dedicated step)tilelang/jit/adapter/kernel_cache.py (1)
7-14: LGTM!The
TVMFFIKernelCacheclass correctly overrides thekernel_lib_pathclass attribute and provides a specialized implementation for saving the wrapper kernel source.tilelang/cache/__init__.py (2)
19-26: LGTM! Dispatch map correctly routes to backend-specific caches.The singleton instances for each backend are properly instantiated, and the naming follows the suggestion from previous reviews (
_dispatch_mapinstead of_dispatch_pool).
43-56: LGTM!The dispatch logic correctly routes to the appropriate backend cache and raises a clear error for unsupported backends. Past review concerns (missing "cutedsl", None default causing errors) have been addressed.
tilelang/jit/adapter/cutedsl/kernel_cache.py (3)
13-21: LGTM!The class correctly defines CuTeDSL-specific paths and intentionally makes
_save_kernel_source_code_to_diska no-op since CuTeDSL uses a different code generation path.
54-84: LGTM!The
_build_kernelimplementation correctly usesJITKernel.from_databasewhenkernel_paramsexist and returnsNoneotherwise, aligning with the inheritance-based design.
49-52: The_cache_pathattribute is already handled defensively in the adapter.The adapter code uses
getattr(self, "_cache_path", None)with a default value, properly handling the case where the attribute may not be set. No additional defensive checks are needed inkernel_cache.py. The interface is implicitly documented via the method's usage context.tilelang/cache/kernel_cache.py (3)
39-42: LGTM! Class-level path attributes enable proper inheritance.Moving path constants to class attributes allows subclasses to override them (e.g.,
TVMFFIKernelCache.kernel_lib_path = "executable.so"), which is the core of the inheritance-based refactoring.
466-467: Intentional no-op hook for subclass override.The unused arguments (
kernel,cache_path) are expected for this hook method pattern. Subclasses likeCuTeDSLKernelCacheoverride this to propagate the cache path to adapters. The Ruff ARG002 warnings can be safely ignored.
310-328: LGTM! Clean delegation to helper methods.The refactored
_save_kernel_to_disknow properly delegates to specialized helper methods (_save_kernel_source_code_to_disk,_save_wrapper_kernel_code_to_disk,_save_so_cubin_to_disk), enabling subclasses to override individual behaviors.
Signed-off-by: Jinjie Liu <jjliu@baai.ac.cn>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tilelang/cache/kernel_cache.py (1)
125-148: Docstring is outdated after refactoring.The docstring claims "This is the ONLY place where environment variable processing...should happen" (line 125) and documents environment variable behavior (lines 140-148), but the actual env variable processing now occurs in
tilelang/cache/__init__.py:cached()(lines 44-51). This method receives already-resolved parameters.Update the docstring to reflect that this is an internal method receiving normalized parameters, and remove the environment variable documentation section.
🔎 Proposed docstring update
def cached( self, func: PrimFunc = None, out_idx: list[int] = None, *args, target: str | Target, target_host: str | Target | None = None, execution_backend: Literal["tvm_ffi", "cython", "nvrtc", "torch", "cutedsl"] = "tvm_ffi", verbose: bool, pass_configs: dict | None = None, compile_flags: list[str] | str | None = None, ) -> JITKernel: """ - Caches and reuses compiled kernels to avoid redundant compilation. - - This is the ONLY place where environment variable processing, target normalization, - and execution backend resolution should happen. All compilation paths go through here. + Internal caching method for backend-specific KernelCache subclasses. + + This method receives already-normalized parameters from the public + tilelang.cache.cached() API. Environment variable defaults and backend + resolution have already been applied. Args: func: Function to be compiled or a prepared PrimFunc out_idx: Indices specifying which outputs to return - target: Compilation target platform (None = read from TILELANG_TARGET env var) + target: Normalized compilation target platform (required) target_host: Host target platform - execution_backend: Execution backend (None = read from TILELANG_EXECUTION_BACKEND) - verbose: Enable verbose output (None = read from TILELANG_VERBOSE) + execution_backend: Resolved execution backend (required) + verbose: Enable verbose output (required) *args: Arguments passed to func Returns: JITKernel: The compiled kernel, either freshly compiled or from cache - - Environment Variables - --------------------- - TILELANG_TARGET : str - Default compilation target (e.g., "cuda", "llvm"). Defaults to "auto". - TILELANG_EXECUTION_BACKEND : str - Default execution backend. Defaults to "auto". - TILELANG_VERBOSE : str - Set to "1", "true", "yes", or "on" to enable verbose compilation by default. """
♻️ Duplicate comments (2)
tilelang/cache/kernel_cache.py (2)
402-406: Add None guard forget_kernel_source()return value.According to past review analysis,
CuTeDSLKernelAdapterandNVRTCKernelAdaptercan returnNonefromget_kernel_source(). Line 406 directly passes this tofile.write(), which will raiseTypeError: a bytes-like object is required, not 'NoneType'if the adapter returnsNone. The exception will be caught generically with a misleading message.Add an explicit check:
🔎 Proposed fix
def _save_wrapper_kernel_code_to_disk(self, kernel: JITKernel, cache_path: str, verbose: bool = False): host_kernel_path = os.path.join(cache_path, self.host_kernel_path) + source = kernel.adapter.get_kernel_source() + if source is None: + if verbose: + self.logger.debug(f"Skipping host kernel source save: adapter returned None") + return if verbose: self.logger.debug(f"Saving wrapped kernel source code to file: {host_kernel_path}") - KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(kernel.adapter.get_kernel_source())) + KernelCache._safe_write_file(host_kernel_path, "w", lambda file: file.write(source))Based on past review comments, this issue was flagged but not yet addressed.
442-472: Use explicit None checks instead of truthiness for source validation.Line 456 uses
if host_kernel_source and device_kernel_source and kernel_params:which conflates two distinct cases:
None: file not loaded (error condition)"": file loaded but empty (valid, intentional)According to past review analysis, the CuteDSL adapter intentionally returns
"", ""from_load_kernel_sourceand must override_build_kernelto work around this truthiness check. Empty source strings are valid; onlyNoneshould indicate a loading failure.🔎 Proposed fix
- if host_kernel_source and device_kernel_source and kernel_params: + if host_kernel_source is not None and device_kernel_source is not None and kernel_params is not None: return JITKernel.from_database(This clearly differentiates "not loaded" (
None) from "loaded as empty" ("") for each value.Based on past review comments, this issue was flagged but not yet addressed.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tilelang/cache/__init__.pytilelang/cache/kernel_cache.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
tilelang/cache/__init__.py
🪛 Ruff (0.14.10)
tilelang/cache/kernel_cache.py
439-439: Unused method argument: kernel
(ARG002)
439-439: Unused method argument: cache_path
(ARG002)
tilelang/cache/__init__.py
86-86: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (8)
tilelang/cache/__init__.py (4)
5-18: LGTM: Clean import structure with proper circular import handling.The use of
TYPE_CHECKINGfor the baseKernelCacheimport avoids circular dependencies while maintaining type hint accuracy. All backend-specific subclasses are imported and will be used in the dispatch map.
20-27: LGTM: Dispatch map correctly implements backend routing.The
_dispatch_mapprovides clean separation of backend-specific caching logic. Each backend has its own singleton instance, which is appropriate for the cache pattern.
44-72: Well-structured environment variable handling and backend resolution.The centralized environment variable processing at lines 44-51 correctly establishes a single source of truth. The target normalization and backend resolution flow (lines 53-59) properly handles the "auto" case via
resolve_execution_backend. The conditional logging (lines 60-72) provides useful diagnostics without noise when the user explicitly specified a backend.
73-86: Dispatch logic correctly routes to backend-specific cache implementations.The dispatch check (line 73) and delegation (lines 74-84) properly route kernel caching to the appropriate backend. The error handling (lines 85-86) provides a clear message when an unsupported backend is requested.
tilelang/cache/kernel_cache.py (4)
39-42: LGTM: Instance path attributes enable backend-specific customization.The introduction of instance-level path attributes (
device_kernel_path,host_kernel_path,kernel_lib_path,params_path) provides a clean extension point for backend-specific subclasses to override file naming conventions (e.g.,.cubinfor NVRTC vs..sofor TVM FFI).
263-310: LGTM: Clean delegation to extensible helper methods.The refactoring of
_save_kernel_to_diskto delegate to_save_kernel_source_code_to_disk,_save_wrapper_kernel_code_to_disk, and_save_so_cubin_to_diskprovides clear extension points for backend-specific subclasses while maintaining proper error handling.
312-376: LGTM: Well-structured load flow with clear separation of concerns.The refactored
_load_kernel_from_diskcleanly separates:
- Required file validation via
_get_required_files()(line 346)- Optional source loading via
_load_kernel_source()(line 352)- Parameter deserialization (lines 354-362)
- Kernel reconstruction via
_build_kernel()(lines 364-376)This provides clear extension points for subclasses to customize each phase.
439-441: Static analysis false positive: unused arguments are intentional.The static analysis warning about unused
kernelandcache_pathparameters is a false positive. This is a base implementation designed to be overridden by subclasses (e.g.,NVRTCKernelCachemay set adapter state using these parameters). The parameters must remain for interface consistency.
This PR resolves #1469. It moves most of the
if-elsecheckers for multi-backends intilelang/cache/kernel_cache.py, and dispatches the different implementations by inheritance in different subclasses.We retained some specialized implementations for
CuteDSLand considered removing them in the future once this backend is ready.Summary by CodeRabbit
New Features
Changes
Tests / CI
✏️ Tip: You can customize this high-level summary in your review settings.