-
Notifications
You must be signed in to change notification settings - Fork 357
[Enhancement] Allow import tilelang on CPU-only machines without CUDA libraries
#1481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughAdds a context-managed lazy loader to defer heavy shared-library imports in tilelang, exposes set_log_level(level), centralizes per-target patchelf behavior in CMake, and updates CI to use dnf, set uv/PIP/XDG cache envs, install/setup uv, and test built wheels in ephemeral venvs before upload. (≤50 words) Changes
Sequence Diagram(s)(Skipped — changes are primarily initialization, packaging, CI and do not introduce a new multi-component runtime control flow requiring a sequence diagram.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tilelang/__init__.py (2)
69-79: Consider warning on invalid level strings.If an invalid level string is passed (e.g.,
set_log_level('DEBG')), it silently falls back toINFO. This could mask typos.🔎 Optional enhancement to warn on invalid levels
def set_log_level(level): if isinstance(level, str): - level = getattr(logging, level.upper(), logging.INFO) + upper_level = level.upper() + if not hasattr(logging, upper_level): + warnings.warn(f"Unknown log level '{level}', defaulting to INFO", stacklevel=2) + level = logging.INFO + else: + level = getattr(logging, upper_level) logger = logging.getLogger(__name__) logger.setLevel(level)
113-162: Static analysis:noqa: F401directives may be unnecessary.Ruff reports these
noqa: F401directives as unused because the F401 rule appears to be disabled in your project configuration. You could remove them to reduce noise, or enable F401 in your Ruff config if you want to enforce unused import checks.This is a low-priority cleanup item.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/dist.yml(1 hunks)tilelang/__init__.py(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/dist.yml
🧬 Code graph analysis (1)
tilelang/__init__.py (3)
tilelang/env.py (3)
enable_cache(269-270)disable_cache(272-273)is_cache_enabled(266-267)tilelang/libinfo.py (1)
find_lib_path(7-35)tilelang/cache/__init__.py (1)
clear_cache(43-55)
🪛 Ruff (0.14.8)
tilelang/__init__.py
113-113: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
114-114: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
117-117: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
118-118: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
138-138: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
139-139: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
140-140: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
142-142: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
143-143: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
146-146: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
147-147: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
150-150: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
151-151: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
152-152: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
153-153: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
154-154: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
156-156: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
157-157: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
158-158: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
159-159: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
161-161: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
162-162: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
🔇 Additional comments (4)
.github/workflows/dist.yml (2)
168-173: LGTM - Python setup step added for wheel testing.The setup step correctly configures Python 3.12 and uv for the subsequent wheel validation tests.
175-190: Good addition for wheel validation on CPU-only runners.This test step validates that the built wheels can be installed and imported successfully, which directly tests the lazy loading changes. The
cd /ensures the installed package is tested rather than the source directory.One minor observation: the
echo ... | uv run --no-project --script -pattern is less common than usingpython -c. Consider using:uv run --no-project python -c "import tilelang; print(tilelang.__version__)"However, the current approach should work correctly.
tilelang/__init__.py (2)
112-136: Verify thatRTLD_LAZYachieves the goal for TVM imports.The
RTLD_LAZYflag defers symbol resolution for native libraries loaded viadlopen/ctypes.CDLL. However, TVM may load its runtime library duringimport tvmusing its own loading mechanism.The effectiveness depends on whether TVM's internal library loading respects the process-wide
dlopenflags. The workflow test step should validate this works correctly.
49-50: Good namespace hygiene.The
delstatements for internal helpers (_compute_version,_init_logger,_lazy_load_lib) keep the module's public namespace clean. This is a good practice.Also applies to: 96-97, 163-164
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tilelang/__init__.py (1)
102-117: Critical: Windows compatibility issue remains unresolved.This is the same issue flagged in the previous review.
os.RTLD_LAZYandsys.getdlopenflags()are Unix-only and will raiseAttributeErroron Windows, breaking imports on that platform.🔎 Recommended fix with platform guard
@contextlib.contextmanager def _lazy_load_lib(): + # RTLD_LAZY is Unix-only; Windows doesn't support dlopen flags + if not hasattr(os, 'RTLD_LAZY') or not hasattr(sys, 'getdlopenflags'): + yield + return + old_flags = sys.getdlopenflags() old_init = ctypes.CDLL.__init__ def lazy_init(self, name, mode=ctypes.DEFAULT_MODE, *args, **kwargs): return old_init(self, name, mode | os.RTLD_LAZY, *args, **kwargs) sys.setdlopenflags(old_flags | os.RTLD_LAZY) ctypes.CDLL.__init__ = lazy_init try: yield finally: sys.setdlopenflags(old_flags) ctypes.CDLL.__init__ = old_init
🧹 Nitpick comments (1)
tilelang/__init__.py (1)
120-169: Optional cleanup: Remove unnecessarynoqadirectives.Ruff reports that the
# noqa: F401directives on many lines are unnecessary because F401 (unused imports) is not enabled in your linter configuration. If these imports are intentionally re-exported, consider using__all__to make this explicit instead of suppressing warnings.Alternative approach using __all__
You could replace the noqa comments with an explicit
__all__declaration at the end of the file:__all__ = [ 'enable_cache', 'disable_cache', 'is_cache_enabled', 'env', 'DataType', 'jit', 'lazy_jit', 'JITKernel', 'compile', 'par_compile', 'Profiler', 'clear_cache', 'TensorSupplyType', 'deprecated', 'Layout', 'Fragment', 'analysis', 'transform', 'language', 'engine', 'tools', 'dtypes', 'autotune', 'PassConfigKey', 'lower', 'register_cuda_postproc', 'register_hip_postproc', 'ir', 'tileop', 'set_log_level', '__version__', ]This makes the public API explicit and is more maintainable than individual noqa comments.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/__init__.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/__init__.py (2)
tilelang/env.py (3)
enable_cache(269-270)disable_cache(272-273)is_cache_enabled(266-267)tilelang/libinfo.py (1)
find_lib_path(7-35)
🪛 Ruff (0.14.8)
tilelang/__init__.py
120-120: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
121-121: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
124-124: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
125-125: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
145-145: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
146-146: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
147-147: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
149-149: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
150-150: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
153-153: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
154-154: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
157-157: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
158-158: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
159-159: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
160-160: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
161-161: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
163-163: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
164-164: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
165-165: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
166-166: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
168-168: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
169-169: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (5)
tilelang/__init__.py (5)
1-8: LGTM: Necessary imports for lazy loading.The added standard library imports (
contextlib,os,sys) are required for implementing the lazy loading mechanism.
12-50: LGTM: Version computation with proper cleanup.The version computation logic is sound, and the namespace cleanup on line 50 follows best practices.
53-97: LGTM: Well-structured logging infrastructure.The logging infrastructure is properly implemented:
TqdmLoggingHandlercorrectly integrates with tqdm to prevent output interferenceset_log_levelprovides a user-friendly public API- Logger initialization and cleanup follow best practices
119-171: Heavy imports properly wrapped in lazy loading context.The import structure is correct, and moving all heavy imports inside the
with _lazy_load_lib()context achieves the intended lazy loading behavior.
102-117: RTLD_LAZY implementation is correct but requires runtime validation on CPU-only systems.The
_lazy_load_lib()context manager at lines 102-117 is an intentional enhancement (commit: "SetRTLD_LAZYflag onCDLL") that correctly wraps TVM and tilelang library imports. RTLD_LAZY defers symbol resolution until symbols are actually used, which is the appropriate mechanism for allowing imports without CUDA libraries present.The implementation properly:
- Modifies process-level dlopen flags via
sys.setdlopenflags()- Wraps CDLL initialization to append RTLD_LAZY to the mode
- Restores original state in a finally block
CPU-only support is explicitly intended (CMakeLists.txt mentions "Build CPU-only when we explicitly disable CUDA"), confirming the use case is real.
However, no explicit test validating successful import on a CPU-only machine (with CUDA libraries hidden/unavailable) was found. The existing test file (
test_tilelang_language_lazy_jit.py) covers the @lazy_jit decorator for JIT compilation, not the import-time lazy loading behavior. To definitively confirm this solves theOSError: libcuda.so.1problem, add a test that imports tilelang with CUDA libraries unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tilelang/__init__.py (1)
100-117: Platform compatibility:os.RTLD_LAZYis Unix-only.This issue was flagged in a previous review but remains unaddressed.
os.RTLD_LAZYis not available on Windows, and accessing it will raiseAttributeError. Thelibinfo.pysnippet shows Windows is a supported platform (win32checks exist), so this needs a guard.🔎 Proposed fix for Windows compatibility
@contextlib.contextmanager def _lazy_load_lib(): import torch # noqa: F401 # preload torch to avoid dlopen errors + if not hasattr(os, 'RTLD_LAZY'): + # Windows doesn't support dlopen flags; skip lazy loading adjustments + yield + return + old_flags = sys.getdlopenflags() old_init = ctypes.CDLL.__init__ def lazy_init(self, name, mode=ctypes.DEFAULT_MODE, *args, **kwargs): return old_init(self, name, mode | os.RTLD_LAZY, *args, **kwargs) sys.setdlopenflags(old_flags | os.RTLD_LAZY) ctypes.CDLL.__init__ = lazy_init try: yield finally: sys.setdlopenflags(old_flags) ctypes.CDLL.__init__ = old_init
🧹 Nitpick comments (1)
tilelang/__init__.py (1)
119-170: Lazy loading structure achieves the PR objective.The heavy imports (tvm, ctypes library loading, submodules) are correctly wrapped within the
_lazy_load_lib()context, deferring shared library resolution until import time withRTLD_LAZYsemantics.Regarding static analysis hints: The
# noqa: F401directives are intentional for re-exported symbols. If F401 is not enabled in your Ruff configuration, you could remove them for cleaner code, but they're harmless.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/__init__.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/__init__.py (1)
tilelang/libinfo.py (1)
find_lib_path(7-35)
🪛 Ruff (0.14.8)
tilelang/__init__.py
82-82: Do not catch blind exception: Exception
(BLE001)
102-102: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
120-120: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
121-121: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
124-124: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
125-125: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
145-145: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
146-146: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
147-147: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
149-149: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
150-150: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
153-153: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
154-154: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
157-157: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
158-158: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
159-159: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
160-160: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
161-161: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
163-163: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
164-164: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
165-165: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
166-166: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
168-168: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
169-169: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Quick Lint
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
🔇 Additional comments (3)
tilelang/__init__.py (3)
1-48: LGTM!The import statements and version computation logic are well-structured with appropriate fallbacks.
66-97: LGTM!The logger initialization with
TqdmLoggingHandleris well-implemented. The broad exception catch inemit()(line 82) follows the standardlogging.Handlerpattern—logging should never raise, andhandleErroris correctly invoked as a fallback.
102-108: RTLD_LAZY defers function symbol resolution correctly, but verify no CUDA data dependencies exist.RTLD_LAZY resolves symbols only as the code that references them is executed, and if the symbol is never referenced, then it is never resolved. This approach should allow
import tilelangto succeed on CPU-only machines withoutlibcuda.so.1.However, lazy binding is performed only for function references; references to variables are always immediately bound when the shared object is loaded. If the tilelang library has direct data dependencies on CUDA symbols (not just function calls), those will fail at dlopen() time regardless of RTLD_LAZY. Confirm that TVM's CUDA integration uses function pointers rather than direct variable references.
bca3f3a to
eadeafd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pyproject.toml(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
a57c8cf to
7ca7298
Compare
8ec5352 to
1929031
Compare
1929031 to
19c3864
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/dist.yml (1)
168-174: Cache may not save due to missingactions: writepermission.Based on learnings, the
astral-sh/setup-uv@v7action's caching feature requiresGITHUB_TOKENwithactions: writepermission to save caches. The current permissions block (line 26) only grantscontents: read, so caches will restore but not save.This is not blocking since caches will still restore from previous runs, but consider adding
actions: writeto enable cache saving:permissions: contents: read actions: write
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/dist.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/dist.yml
919423c to
9a0bec9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
CMakeLists.txt (1)
330-333: Consider usingWARNINGinstead ofSTATUSwhen patchelf is not found.When patchelf is missing, the libcuda.so dependency won't be removed, which means the resulting library may fail to import on CPU-only machines. A
STATUSmessage could be easily missed in build output. UsingWARNINGwould make this more visible to developers.🔎 Suggested change
find_program(PATCHELF_EXECUTABLE patchelf) if (NOT PATCHELF_EXECUTABLE) - message(STATUS "patchelf not found.") + message(WARNING "patchelf not found - libcuda.so dependency will not be removed from built libraries") endif()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CMakeLists.txt(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (4)
CMakeLists.txt (4)
309-316: Good refactoring to centralize target handling.Defining
TILELANG_OUTPUT_TARGETSimproves maintainability by ensuring consistent handling across RPATH settings, patching, and installation. Theunset(PATCHELF_EXECUTABLE)ensures a clean state before the conditionalfind_programcall.
337-339: LGTM!Clean refactoring that applies consistent RPATH settings to all output targets via the loop.
359-361: LGTM!Using
${TILELANG_OUTPUT_TARGETS}maintains consistency with the variable defined at line 309 and ensures any future target additions only need updating in one place.
341-357: Well-structured patching approach that correctly handles targets from different directories.The separation of POST_BUILD for local targets and a custom target for subdirectory targets is the right approach. Using
ALLensurespatch_tvm_libsruns during normal builds. Since the install target depends on the ALL target by default, patching is guaranteed to occur before installation regardless of the build workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
CMakeLists.txt (3)
316-316: Consider adding a comment explaining the unset.Unsetting
PATCHELF_EXECUTABLEbefore detection ensures a clean state, but a brief inline comment would clarify the intent for future maintainers.📝 Suggested comment
+# Ensure clean detection state for patchelf before find_program unset(PATCHELF_EXECUTABLE)
330-333: Enhance the status message to clarify implications.When
patchelfis not found, the built libraries will retain theirlibcuda.sodependencies, preventing imports on CPU-only machines. Consider making the message more informative.📝 Suggested improvement
find_program(PATCHELF_EXECUTABLE patchelf) if (NOT PATCHELF_EXECUTABLE) - message(STATUS "`patchelf` not found.") + message(STATUS "`patchelf` not found. Built libraries will retain libcuda.so dependencies and may fail to import on CPU-only machines.") endif()
341-343: Consider more explicit condition checking for SOURCE_DIR.The current check
if (tgt_src_dir)works becauseNOTFOUNDvalues evaluate to false in CMake. However, a more explicit check would improve code clarity and robustness.🔎 Suggested improvement
get_target_property(tgt_src_dir ${target} SOURCE_DIR) - if (tgt_src_dir) + if (tgt_src_dir AND NOT tgt_src_dir MATCHES "-NOTFOUND$") message(STATUS "Target `${target}` defined in '${tgt_src_dir}' will remove its `libcuda.so` dependency")
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CMakeLists.txt(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (3)
CMakeLists.txt (3)
309-314: LGTM! Good refactoring to centralize target management.Defining the output targets in a single list improves maintainability and reduces duplication in the subsequent foreach loop and install statement.
363-363: LGTM! Consistent use of the centralized target list.The install statement correctly references
TILELANG_OUTPUT_TARGETS, completing the refactoring to centralize target management.
344-359: Patchelf approach is sound and handles all target types correctly.The implementation properly removes CUDA dependencies across target types:
- POST_BUILD command handles targets in the current directory
- Custom targets with explicit DEPENDS clauses handle subdirectory targets (tvm/tvm_runtime) and ensure correct build ordering
- The patchelf syntax (
--remove-needed libcuda.so.1 --remove-needed libcuda.so) is correct and gracefully handles cases where dependencies don't exist- Availability check with
find_programprevents build failures on systems without patchelf installedThe code is ready to merge.
80f6dad to
0620942
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
CMakeLists.txt (1)
330-333: Consider clarifying the impact when patchelf is not found.The current status message could be more informative about what happens when patchelf is unavailable. This would help users understand that libcuda.so dependencies won't be removed, potentially preventing CPU-only imports.
🔎 Proposed enhancement
find_program(PATCHELF_EXECUTABLE patchelf) if (NOT PATCHELF_EXECUTABLE) - message(STATUS "`patchelf` not found.") + message(STATUS "`patchelf` not found. Built libraries will retain libcuda.so dependencies and may fail to import on CPU-only machines.") endif()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CMakeLists.txt(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (3)
CMakeLists.txt (3)
309-314: LGTM! Good consolidation of target names.Centralizing the target list improves maintainability and ensures consistent handling across INSTALL_RPATH and patchelf operations.
337-339: LGTM! Effective refactoring to reduce duplication.The foreach loop cleanly replaces repetitive set_target_properties calls and ensures consistent INSTALL_RPATH configuration across all targets.
342-356: Verify that patchelf correctly removes libcuda.so dependencies and make patchelf failures fatal.The implementation uses
install(CODE ...)to patch build artifacts before theinstall(TARGETS ...)call copies them totilelang/lib. While this approach can work, patchelf failures only emit warnings (line 51) rather than halting the install—meaning the PR objective could be silently compromised if patchelf fails. Consider usingexecute_process(...COMMAND_ERROR_IS_FATAL ANY)or checking the result variable and callingmessage(FATAL_ERROR ...)instead.Additionally, verify that the installed libraries in
tilelang/libactually have libcuda.so.1 and libcuda.so removed from their NEEDED entries, and that they can load on CPU-only systems without CUDA drivers installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
CMakeLists.txt (2)
316-316: Consider whether unconditionally unsetting is necessary.Unconditionally unsetting
PATCHELF_EXECUTABLEclears any previously cached or user-provided value before thefind_programcall at line 329. If this is intentional (e.g., to force a fresh search), consider adding a comment explaining why. Otherwise, you might allow users to override the search by pre-setting this variable.
329-332: Consider clarifying the consequence in the status message.The message "
patchelfnot found." could be more informative. Consider mentioning that without patchelf, libcuda.so dependencies won't be removed, which may prevent importing tilelang on CPU-only machines.🔎 Suggested improvement
if (NOT PATCHELF_EXECUTABLE) - message(STATUS "`patchelf` not found.") + message(STATUS "`patchelf` not found. libcuda.so dependencies will not be removed, which may prevent importing on CPU-only machines.") endif()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CMakeLists.txt(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build SDist
🔇 Additional comments (3)
CMakeLists.txt (3)
309-314: LGTM: Clean refactoring to centralize target list.Defining
TILELANG_OUTPUT_TARGETSeliminates duplication and improves maintainability.
336-338: LGTM: Cleaner RPATH handling with foreach loop.Replacing per-target code with a foreach loop improves maintainability and reduces duplication.
340-357: The critical issue in this review is based on incorrect information about CMake's behavior.CMake 3.14 and later will evaluate generator expressions for install(CODE) and install(SCRIPT). Since this project requires CMake 3.26, the generator expression
$<TARGET_FILE:${target}>will be properly evaluated at install time. The code is correct and does not have the flaw described in the review.The secondary concern about the success message wording is minor—the message says "removed dependency
libcuda.so" when technically two dependencies (libcuda.so.1 and libcuda.so) are being removed. However, this is not a functional issue, only a slight imprecision in the message text.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
CMakeLists.txt (1)
316-316: Clarify the purpose of unsettingPATCHELF_EXECUTABLE.This unset happens before
find_program(PATCHELF_EXECUTABLE patchelf)at line 329. Is this to avoid using a stale cached value? If so, a brief comment would help future maintainers understand the intent.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CMakeLists.txt(1 hunks)pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (5)
CMakeLists.txt (5)
309-314: LGTM! Clean centralization of output targets.Defining
TILELANG_OUTPUT_TARGETSin one place makes the subsequent RPATH and patchelf logic more maintainable and less error-prone.
319-333: LGTM! RPATH configuration and patchelf detection look correct.The extended RPATH entries for torch and tvm_ffi will help the runtime linker locate dependencies correctly. The patchelf detection with a status message provides good visibility.
336-338: LGTM! Uniform RPATH application across all targets.The loop ensures all output targets have consistent INSTALL_RPATH settings, which is exactly what's needed.
360-360: LGTM! Using centralized target list for installation.This change aligns with the
TILELANG_OUTPUT_TARGETSdefinition and improves maintainability.
346-346: libcaffe2_nvrtc.so already uses lazy loading (dlopen) for CUDA symbols—no verification needed.PyTorch does not link directly against nvrtc to keep CPU-only builds possible; instead it uses a stub that calls dlopen on libnvrtc.so. The patchelf command correctly removes direct libcuda.so dependencies, allowing the library to load successfully on CPU-only machines where CUDA drivers are absent. This matches PyTorch's documented design pattern for supporting dual CPU/CUDA binaries.
This reverts commit 8c6a278.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
CMakeLists.txt (1)
344-356: Critical:install(CODE ...)uses incorrect file path.The generator expression
$<TARGET_FILE:${target}>expands at CMake generation time to the build directory path, butinstall(CODE ...)executes at install time when the files are already in${CMAKE_INSTALL_PREFIX}/tilelang/lib. This causes patchelf to fail or operate on stale build artifacts instead of the installed files, defeating the PR objective of enabling CPU-only imports.🔎 Proposed fix
Replace the generator expression with the actual installed file path:
install(CODE " + set(installed_lib \"\${CMAKE_INSTALL_PREFIX}/tilelang/lib/${CMAKE_SHARED_LIBRARY_PREFIX}${target}${CMAKE_SHARED_LIBRARY_SUFFIX}\") execute_process( - COMMAND ${PATCHELF_EXECUTABLE} --remove-needed libcuda.so.1 --remove-needed libcuda.so \"$<TARGET_FILE:${target}>\" - WORKING_DIRECTORY \"${CMAKE_INSTALL_PREFIX}\" + COMMAND ${PATCHELF_EXECUTABLE} --remove-needed libcuda.so.1 --remove-needed libcuda.so \"\${installed_lib}\" RESULT_VARIABLE patchelf_result ) if(patchelf_result EQUAL 0) - message(STATUS \"`patchelf` successfully removed dependency `libcuda.so` from $<TARGET_FILE:${target}>\") + message(STATUS \"`patchelf` successfully removed dependencies `libcuda.so.1` and `libcuda.so` from \${installed_lib}\") else() - message(WARNING \"`patchelf` failed to remove dependency `libcuda.so` from $<TARGET_FILE:${target}>\") + message(WARNING \"`patchelf` failed to remove dependencies `libcuda.so.1` and `libcuda.so` from \${installed_lib}\") endif() ")Note: This fix also addresses the message inconsistency.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CMakeLists.txt(1 hunks)pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Quick Lint
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
🔇 Additional comments (4)
CMakeLists.txt (4)
309-314: Good refactoring: centralized target list.Introducing
TILELANG_OUTPUT_TARGETSas a single source of truth for all output targets improves maintainability and consistency throughout the build configuration.
329-332: LGTM!The
find_program()call and status message follow CMake best practices for locating optional build tools.
336-338: LGTM!Using a
foreachloop to apply INSTALL_RPATH to all targets is cleaner and more maintainable than individualset_target_properties()calls.
360-360: LGTM!Using the centralized
${TILELANG_OUTPUT_TARGETS}variable maintains consistency with the refactoring introduced earlier in the file.
Resolves #1478
Summary by CodeRabbit
New Features
Chores
CI Improvements
✏️ Tip: You can customize this high-level summary in your review settings.