Skip to content

Conversation

@XuehaiPan
Copy link
Collaborator

@XuehaiPan XuehaiPan commented Dec 19, 2025

Resolves #1478

Summary by CodeRabbit

  • New Features

    • Added set_log_level() for module logging control.
    • Introduced lazy loading of heavy components to reduce startup work and improve import performance.
  • Chores

    • Packaging: extended wheel repair exclusions and adjusted CUDA install steps for cleaner wheels.
    • Build system: consolidated output targets, unified runtime-path handling, and added conditional runtime dependency patching.
  • CI Improvements

    • Workflow: added caching and environment settings, improved CUDA handling, switched to dnf, and added validation steps to test built artifacts before upload.

✏️ Tip: You can customize this high-level summary in your review settings.

@XuehaiPan XuehaiPan self-assigned this Dec 19, 2025
@XuehaiPan XuehaiPan added enhancement New feature or request dependencies Pull requests that update a dependency file labels Dec 19, 2025
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

📝 Walkthrough

Walkthrough

Adds a context-managed lazy loader to defer heavy shared-library imports in tilelang, exposes set_log_level(level), centralizes per-target patchelf behavior in CMake, and updates CI to use dnf, set uv/PIP/XDG cache envs, install/setup uv, and test built wheels in ephemeral venvs before upload. (≤50 words)

Changes

Cohort / File(s) Summary
Workflow — dist & wheel validation
\.github/workflows/dist.yml
Add workflow-level cache/env vars (UV_INDEX_STRATEGY, UV_HTTP_TIMEOUT, XDG_CACHE_HOME, PIP_CACHE_DIR, UV_CACHE_DIR); extract CUDA major.minor and export CUDA_VERSION/UV_INDEX; switch Linux package install from yumdnf; add Setup Python + uv step with caching; add Test built wheels loop that creates ephemeral venvs, installs each wheelhouse/*.whl, runs a version check, and cleans up; inserted after build steps.
Module initialization & logging
tilelang/__init__.py
Replace eager heavy imports with a _lazy_load_lib context manager that temporarily adjusts dlopen/ctypes flags and performs heavy imports (env, tvm, libinfo, many submodules) inside the context; remove prior TqdmLoggingHandler/init flow, add public set_log_level(level), compute __version__ as before, and delete private helpers (_init_logger, _lazy_load_lib, _compute_version) after use.
Packaging config / wheel repair
pyproject.toml
Replace yumdnf in Linux build setup and remove several CUDA packages from the install list; extend repair/auditwheel exclusions to omit libcuda.so.1/libcuda.so and /usr/local/cuda*; add comment about not bundling libtvm_ffi.so/libz3.so; minor formatting tweaks.
Build system — post-build patchelf loop
CMakeLists.txt
Introduce TILELANG_OUTPUT_TARGETS (tilelang, tilelang_module, tvm, tvm_runtime); unset PATCHELF_EXECUTABLE; detect patchelf and, when present with USE_CUDA, iterate targets to set install RPATH and apply per-target patchelf --remove-needed for libcuda.so.1/libcuda.so via install(CODE ...) with warnings on failure; update install(TARGETS ...) to use the new list.

Sequence Diagram(s)

(Skipped — changes are primarily initialization, packaging, CI and do not introduce a new multi-component runtime control flow requiring a sequence diagram.)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • LeiWang1999

Poem

🐇 I tuck my libs in quiet paws,

I open slow when loaders call,
I spin a venv, test each wheel with care,
I peel CUDA gently if it's not there,
Hop-hop — light imports for one and all.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: enabling tilelang imports on CPU-only machines despite CUDA build.
Linked Issues check ✅ Passed All coding requirements from #1478 are addressed: lazy loading mechanism reduces import-time dependencies, CUDA library handling prevents loading on CPU-only systems, and new set_log_level API supports flexible logging configuration.
Out of Scope Changes check ✅ Passed All changes directly support the core objective of CPU-only import support. Build system updates (yum→dnf), patchelf integration, cache optimization, and wheel testing are all necessary infrastructure changes.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d6fb1d and d75dbcc.

📒 Files selected for processing (2)
  • CMakeLists.txt
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • CMakeLists.txt
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build SDist
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@XuehaiPan XuehaiPan marked this pull request as ready for review December 19, 2025 16:06
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tilelang/__init__.py (2)

69-79: Consider warning on invalid level strings.

If an invalid level string is passed (e.g., set_log_level('DEBG')), it silently falls back to INFO. This could mask typos.

🔎 Optional enhancement to warn on invalid levels
 def set_log_level(level):
     if isinstance(level, str):
-        level = getattr(logging, level.upper(), logging.INFO)
+        upper_level = level.upper()
+        if not hasattr(logging, upper_level):
+            warnings.warn(f"Unknown log level '{level}', defaulting to INFO", stacklevel=2)
+            level = logging.INFO
+        else:
+            level = getattr(logging, upper_level)
     logger = logging.getLogger(__name__)
     logger.setLevel(level)

113-162: Static analysis: noqa: F401 directives may be unnecessary.

Ruff reports these noqa: F401 directives as unused because the F401 rule appears to be disabled in your project configuration. You could remove them to reduce noise, or enable F401 in your Ruff config if you want to enforce unused import checks.

This is a low-priority cleanup item.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2217eb7 and 226d896.

📒 Files selected for processing (2)
  • .github/workflows/dist.yml (1 hunks)
  • tilelang/__init__.py (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

  • .github/workflows/dist.yml
🧬 Code graph analysis (1)
tilelang/__init__.py (3)
tilelang/env.py (3)
  • enable_cache (269-270)
  • disable_cache (272-273)
  • is_cache_enabled (266-267)
tilelang/libinfo.py (1)
  • find_lib_path (7-35)
tilelang/cache/__init__.py (1)
  • clear_cache (43-55)
🪛 Ruff (0.14.8)
tilelang/__init__.py

113-113: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


114-114: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


117-117: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


118-118: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


138-138: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


139-139: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


140-140: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


142-142: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


143-143: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


146-146: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


147-147: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


150-150: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


151-151: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


152-152: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


153-153: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


154-154: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


156-156: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


157-157: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


158-158: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


159-159: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


161-161: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


162-162: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build SDist
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
🔇 Additional comments (4)
.github/workflows/dist.yml (2)

168-173: LGTM - Python setup step added for wheel testing.

The setup step correctly configures Python 3.12 and uv for the subsequent wheel validation tests.


175-190: Good addition for wheel validation on CPU-only runners.

This test step validates that the built wheels can be installed and imported successfully, which directly tests the lazy loading changes. The cd / ensures the installed package is tested rather than the source directory.

One minor observation: the echo ... | uv run --no-project --script - pattern is less common than using python -c. Consider using:

uv run --no-project python -c "import tilelang; print(tilelang.__version__)"

However, the current approach should work correctly.

tilelang/__init__.py (2)

112-136: Verify that RTLD_LAZY achieves the goal for TVM imports.

The RTLD_LAZY flag defers symbol resolution for native libraries loaded via dlopen/ctypes.CDLL. However, TVM may load its runtime library during import tvm using its own loading mechanism.

The effectiveness depends on whether TVM's internal library loading respects the process-wide dlopenflags. The workflow test step should validate this works correctly.


49-50: Good namespace hygiene.

The del statements for internal helpers (_compute_version, _init_logger, _lazy_load_lib) keep the module's public namespace clean. This is a good practice.

Also applies to: 96-97, 163-164

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tilelang/__init__.py (1)

102-117: Critical: Windows compatibility issue remains unresolved.

This is the same issue flagged in the previous review. os.RTLD_LAZY and sys.getdlopenflags() are Unix-only and will raise AttributeError on Windows, breaking imports on that platform.

🔎 Recommended fix with platform guard
 @contextlib.contextmanager
 def _lazy_load_lib():
+    # RTLD_LAZY is Unix-only; Windows doesn't support dlopen flags
+    if not hasattr(os, 'RTLD_LAZY') or not hasattr(sys, 'getdlopenflags'):
+        yield
+        return
+    
     old_flags = sys.getdlopenflags()
     old_init = ctypes.CDLL.__init__

     def lazy_init(self, name, mode=ctypes.DEFAULT_MODE, *args, **kwargs):
         return old_init(self, name, mode | os.RTLD_LAZY, *args, **kwargs)

     sys.setdlopenflags(old_flags | os.RTLD_LAZY)
     ctypes.CDLL.__init__ = lazy_init
     try:
         yield
     finally:
         sys.setdlopenflags(old_flags)
         ctypes.CDLL.__init__ = old_init
🧹 Nitpick comments (1)
tilelang/__init__.py (1)

120-169: Optional cleanup: Remove unnecessary noqa directives.

Ruff reports that the # noqa: F401 directives on many lines are unnecessary because F401 (unused imports) is not enabled in your linter configuration. If these imports are intentionally re-exported, consider using __all__ to make this explicit instead of suppressing warnings.

Alternative approach using __all__

You could replace the noqa comments with an explicit __all__ declaration at the end of the file:

__all__ = [
    'enable_cache', 'disable_cache', 'is_cache_enabled', 'env',
    'DataType', 'jit', 'lazy_jit', 'JITKernel', 'compile', 'par_compile',
    'Profiler', 'clear_cache', 'TensorSupplyType', 'deprecated',
    'Layout', 'Fragment', 'analysis', 'transform', 'language', 'engine',
    'tools', 'dtypes', 'autotune', 'PassConfigKey', 'lower',
    'register_cuda_postproc', 'register_hip_postproc', 'ir', 'tileop',
    'set_log_level', '__version__',
]

This makes the public API explicit and is more maintainable than individual noqa comments.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 226d896 and d2b6a9c.

📒 Files selected for processing (1)
  • tilelang/__init__.py (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/__init__.py (2)
tilelang/env.py (3)
  • enable_cache (269-270)
  • disable_cache (272-273)
  • is_cache_enabled (266-267)
tilelang/libinfo.py (1)
  • find_lib_path (7-35)
🪛 Ruff (0.14.8)
tilelang/__init__.py

120-120: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


121-121: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


124-124: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


125-125: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


145-145: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


146-146: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


147-147: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


149-149: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


150-150: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


153-153: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


154-154: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


157-157: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


158-158: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


159-159: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


160-160: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


161-161: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


163-163: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


164-164: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


165-165: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


166-166: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


168-168: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


169-169: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
  • GitHub Check: Build SDist
🔇 Additional comments (5)
tilelang/__init__.py (5)

1-8: LGTM: Necessary imports for lazy loading.

The added standard library imports (contextlib, os, sys) are required for implementing the lazy loading mechanism.


12-50: LGTM: Version computation with proper cleanup.

The version computation logic is sound, and the namespace cleanup on line 50 follows best practices.


53-97: LGTM: Well-structured logging infrastructure.

The logging infrastructure is properly implemented:

  • TqdmLoggingHandler correctly integrates with tqdm to prevent output interference
  • set_log_level provides a user-friendly public API
  • Logger initialization and cleanup follow best practices

119-171: Heavy imports properly wrapped in lazy loading context.

The import structure is correct, and moving all heavy imports inside the with _lazy_load_lib() context achieves the intended lazy loading behavior.


102-117: RTLD_LAZY implementation is correct but requires runtime validation on CPU-only systems.

The _lazy_load_lib() context manager at lines 102-117 is an intentional enhancement (commit: "Set RTLD_LAZY flag on CDLL") that correctly wraps TVM and tilelang library imports. RTLD_LAZY defers symbol resolution until symbols are actually used, which is the appropriate mechanism for allowing imports without CUDA libraries present.

The implementation properly:

  • Modifies process-level dlopen flags via sys.setdlopenflags()
  • Wraps CDLL initialization to append RTLD_LAZY to the mode
  • Restores original state in a finally block

CPU-only support is explicitly intended (CMakeLists.txt mentions "Build CPU-only when we explicitly disable CUDA"), confirming the use case is real.

However, no explicit test validating successful import on a CPU-only machine (with CUDA libraries hidden/unavailable) was found. The existing test file (test_tilelang_language_lazy_jit.py) covers the @lazy_jit decorator for JIT compilation, not the import-time lazy loading behavior. To definitively confirm this solves the OSError: libcuda.so.1 problem, add a test that imports tilelang with CUDA libraries unavailable.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
tilelang/__init__.py (1)

100-117: Platform compatibility: os.RTLD_LAZY is Unix-only.

This issue was flagged in a previous review but remains unaddressed. os.RTLD_LAZY is not available on Windows, and accessing it will raise AttributeError. The libinfo.py snippet shows Windows is a supported platform (win32 checks exist), so this needs a guard.

🔎 Proposed fix for Windows compatibility
 @contextlib.contextmanager
 def _lazy_load_lib():
     import torch  # noqa: F401 # preload torch to avoid dlopen errors

+    if not hasattr(os, 'RTLD_LAZY'):
+        # Windows doesn't support dlopen flags; skip lazy loading adjustments
+        yield
+        return
+
     old_flags = sys.getdlopenflags()
     old_init = ctypes.CDLL.__init__

     def lazy_init(self, name, mode=ctypes.DEFAULT_MODE, *args, **kwargs):
         return old_init(self, name, mode | os.RTLD_LAZY, *args, **kwargs)

     sys.setdlopenflags(old_flags | os.RTLD_LAZY)
     ctypes.CDLL.__init__ = lazy_init
     try:
         yield
     finally:
         sys.setdlopenflags(old_flags)
         ctypes.CDLL.__init__ = old_init
🧹 Nitpick comments (1)
tilelang/__init__.py (1)

119-170: Lazy loading structure achieves the PR objective.

The heavy imports (tvm, ctypes library loading, submodules) are correctly wrapped within the _lazy_load_lib() context, deferring shared library resolution until import time with RTLD_LAZY semantics.

Regarding static analysis hints: The # noqa: F401 directives are intentional for re-exported symbols. If F401 is not enabled in your Ruff configuration, you could remove them for cleaner code, but they're harmless.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2b6a9c and a9d01bc.

📒 Files selected for processing (1)
  • tilelang/__init__.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/__init__.py (1)
tilelang/libinfo.py (1)
  • find_lib_path (7-35)
🪛 Ruff (0.14.8)
tilelang/__init__.py

82-82: Do not catch blind exception: Exception

(BLE001)


102-102: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


120-120: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


121-121: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


124-124: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


125-125: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


145-145: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


146-146: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


147-147: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


149-149: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


150-150: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


153-153: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


154-154: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


157-157: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


158-158: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


159-159: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


160-160: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


161-161: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


163-163: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


164-164: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


165-165: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


166-166: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


168-168: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


169-169: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Quick Lint
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build SDist
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
🔇 Additional comments (3)
tilelang/__init__.py (3)

1-48: LGTM!

The import statements and version computation logic are well-structured with appropriate fallbacks.


66-97: LGTM!

The logger initialization with TqdmLoggingHandler is well-implemented. The broad exception catch in emit() (line 82) follows the standard logging.Handler pattern—logging should never raise, and handleError is correctly invoked as a fallback.


102-108: RTLD_LAZY defers function symbol resolution correctly, but verify no CUDA data dependencies exist.

RTLD_LAZY resolves symbols only as the code that references them is executed, and if the symbol is never referenced, then it is never resolved. This approach should allow import tilelang to succeed on CPU-only machines without libcuda.so.1.

However, lazy binding is performed only for function references; references to variables are always immediately bound when the shared object is loaded. If the tilelang library has direct data dependencies on CUDA symbols (not just function calls), those will fail at dlopen() time regardless of RTLD_LAZY. Confirm that TVM's CUDA integration uses function pointers rather than direct variable references.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9d01bc and eadeafd.

📒 Files selected for processing (1)
  • pyproject.toml (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build SDist
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8

@XuehaiPan XuehaiPan force-pushed the lazy-dlopen branch 2 times, most recently from 8ec5352 to 1929031 Compare December 19, 2025 18:11
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/dist.yml (1)

168-174: Cache may not save due to missing actions: write permission.

Based on learnings, the astral-sh/setup-uv@v7 action's caching feature requires GITHUB_TOKEN with actions: write permission to save caches. The current permissions block (line 26) only grants contents: read, so caches will restore but not save.

This is not blocking since caches will still restore from previous runs, but consider adding actions: write to enable cache saving:

permissions:
  contents: read
  actions: write
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7c1604 and 19c3864.

📒 Files selected for processing (1)
  • .github/workflows/dist.yml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

  • .github/workflows/dist.yml

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CMakeLists.txt (1)

330-333: Consider using WARNING instead of STATUS when patchelf is not found.

When patchelf is missing, the libcuda.so dependency won't be removed, which means the resulting library may fail to import on CPU-only machines. A STATUS message could be easily missed in build output. Using WARNING would make this more visible to developers.

🔎 Suggested change
   find_program(PATCHELF_EXECUTABLE patchelf)
   if (NOT PATCHELF_EXECUTABLE)
-    message(STATUS "patchelf not found.")
+    message(WARNING "patchelf not found - libcuda.so dependency will not be removed from built libraries")
   endif()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 414055d and 5bd8c50.

📒 Files selected for processing (1)
  • CMakeLists.txt (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
  • GitHub Check: Build SDist
🔇 Additional comments (4)
CMakeLists.txt (4)

309-316: Good refactoring to centralize target handling.

Defining TILELANG_OUTPUT_TARGETS improves maintainability by ensuring consistent handling across RPATH settings, patching, and installation. The unset(PATCHELF_EXECUTABLE) ensures a clean state before the conditional find_program call.


337-339: LGTM!

Clean refactoring that applies consistent RPATH settings to all output targets via the loop.


359-361: LGTM!

Using ${TILELANG_OUTPUT_TARGETS} maintains consistency with the variable defined at line 309 and ensures any future target additions only need updating in one place.


341-357: Well-structured patching approach that correctly handles targets from different directories.

The separation of POST_BUILD for local targets and a custom target for subdirectory targets is the right approach. Using ALL ensures patch_tvm_libs runs during normal builds. Since the install target depends on the ALL target by default, patching is guaranteed to occur before installation regardless of the build workflow.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
CMakeLists.txt (3)

316-316: Consider adding a comment explaining the unset.

Unsetting PATCHELF_EXECUTABLE before detection ensures a clean state, but a brief inline comment would clarify the intent for future maintainers.

📝 Suggested comment
+# Ensure clean detection state for patchelf before find_program
 unset(PATCHELF_EXECUTABLE)

330-333: Enhance the status message to clarify implications.

When patchelf is not found, the built libraries will retain their libcuda.so dependencies, preventing imports on CPU-only machines. Consider making the message more informative.

📝 Suggested improvement
   find_program(PATCHELF_EXECUTABLE patchelf)
   if (NOT PATCHELF_EXECUTABLE)
-    message(STATUS "`patchelf` not found.")
+    message(STATUS "`patchelf` not found. Built libraries will retain libcuda.so dependencies and may fail to import on CPU-only machines.")
   endif()

341-343: Consider more explicit condition checking for SOURCE_DIR.

The current check if (tgt_src_dir) works because NOTFOUND values evaluate to false in CMake. However, a more explicit check would improve code clarity and robustness.

🔎 Suggested improvement
     get_target_property(tgt_src_dir ${target} SOURCE_DIR)
-    if (tgt_src_dir)
+    if (tgt_src_dir AND NOT tgt_src_dir MATCHES "-NOTFOUND$")
       message(STATUS "Target `${target}` defined in '${tgt_src_dir}' will remove its `libcuda.so` dependency")
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5bd8c50 and 80f6dad.

📒 Files selected for processing (1)
  • CMakeLists.txt (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
  • GitHub Check: Build SDist
🔇 Additional comments (3)
CMakeLists.txt (3)

309-314: LGTM! Good refactoring to centralize target management.

Defining the output targets in a single list improves maintainability and reduces duplication in the subsequent foreach loop and install statement.


363-363: LGTM! Consistent use of the centralized target list.

The install statement correctly references TILELANG_OUTPUT_TARGETS, completing the refactoring to centralize target management.


344-359: Patchelf approach is sound and handles all target types correctly.

The implementation properly removes CUDA dependencies across target types:

  • POST_BUILD command handles targets in the current directory
  • Custom targets with explicit DEPENDS clauses handle subdirectory targets (tvm/tvm_runtime) and ensure correct build ordering
  • The patchelf syntax (--remove-needed libcuda.so.1 --remove-needed libcuda.so) is correct and gracefully handles cases where dependencies don't exist
  • Availability check with find_program prevents build failures on systems without patchelf installed

The code is ready to merge.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
CMakeLists.txt (1)

330-333: Consider clarifying the impact when patchelf is not found.

The current status message could be more informative about what happens when patchelf is unavailable. This would help users understand that libcuda.so dependencies won't be removed, potentially preventing CPU-only imports.

🔎 Proposed enhancement
  find_program(PATCHELF_EXECUTABLE patchelf)
  if (NOT PATCHELF_EXECUTABLE)
-    message(STATUS "`patchelf` not found.")
+    message(STATUS "`patchelf` not found. Built libraries will retain libcuda.so dependencies and may fail to import on CPU-only machines.")
  endif()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0620942 and cc8c3b6.

📒 Files selected for processing (1)
  • CMakeLists.txt (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
  • GitHub Check: Build SDist
🔇 Additional comments (3)
CMakeLists.txt (3)

309-314: LGTM! Good consolidation of target names.

Centralizing the target list improves maintainability and ensures consistent handling across INSTALL_RPATH and patchelf operations.


337-339: LGTM! Effective refactoring to reduce duplication.

The foreach loop cleanly replaces repetitive set_target_properties calls and ensures consistent INSTALL_RPATH configuration across all targets.


342-356: Verify that patchelf correctly removes libcuda.so dependencies and make patchelf failures fatal.

The implementation uses install(CODE ...) to patch build artifacts before the install(TARGETS ...) call copies them to tilelang/lib. While this approach can work, patchelf failures only emit warnings (line 51) rather than halting the install—meaning the PR objective could be silently compromised if patchelf fails. Consider using execute_process(...COMMAND_ERROR_IS_FATAL ANY) or checking the result variable and calling message(FATAL_ERROR ...) instead.

Additionally, verify that the installed libraries in tilelang/lib actually have libcuda.so.1 and libcuda.so removed from their NEEDED entries, and that they can load on CPU-only systems without CUDA drivers installed.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
CMakeLists.txt (2)

316-316: Consider whether unconditionally unsetting is necessary.

Unconditionally unsetting PATCHELF_EXECUTABLE clears any previously cached or user-provided value before the find_program call at line 329. If this is intentional (e.g., to force a fresh search), consider adding a comment explaining why. Otherwise, you might allow users to override the search by pre-setting this variable.


329-332: Consider clarifying the consequence in the status message.

The message "patchelf not found." could be more informative. Consider mentioning that without patchelf, libcuda.so dependencies won't be removed, which may prevent importing tilelang on CPU-only machines.

🔎 Suggested improvement
   if (NOT PATCHELF_EXECUTABLE)
-    message(STATUS "`patchelf` not found.")
+    message(STATUS "`patchelf` not found. libcuda.so dependencies will not be removed, which may prevent importing on CPU-only machines.")
   endif()
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc8c3b6 and 54f73e2.

📒 Files selected for processing (1)
  • CMakeLists.txt (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build SDist
🔇 Additional comments (3)
CMakeLists.txt (3)

309-314: LGTM: Clean refactoring to centralize target list.

Defining TILELANG_OUTPUT_TARGETS eliminates duplication and improves maintainability.


336-338: LGTM: Cleaner RPATH handling with foreach loop.

Replacing per-target code with a foreach loop improves maintainability and reduces duplication.


340-357: The critical issue in this review is based on incorrect information about CMake's behavior.

CMake 3.14 and later will evaluate generator expressions for install(CODE) and install(SCRIPT). Since this project requires CMake 3.26, the generator expression $<TARGET_FILE:${target}> will be properly evaluated at install time. The code is correct and does not have the flaw described in the review.

The secondary concern about the success message wording is minor—the message says "removed dependency libcuda.so" when technically two dependencies (libcuda.so.1 and libcuda.so) are being removed. However, this is not a functional issue, only a slight imprecision in the message text.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
CMakeLists.txt (1)

316-316: Clarify the purpose of unsetting PATCHELF_EXECUTABLE.

This unset happens before find_program(PATCHELF_EXECUTABLE patchelf) at line 329. Is this to avoid using a stale cached value? If so, a brief comment would help future maintainers understand the intent.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54f73e2 and 8c6a278.

📒 Files selected for processing (2)
  • CMakeLists.txt (1 hunks)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build SDist
🔇 Additional comments (5)
CMakeLists.txt (5)

309-314: LGTM! Clean centralization of output targets.

Defining TILELANG_OUTPUT_TARGETS in one place makes the subsequent RPATH and patchelf logic more maintainable and less error-prone.


319-333: LGTM! RPATH configuration and patchelf detection look correct.

The extended RPATH entries for torch and tvm_ffi will help the runtime linker locate dependencies correctly. The patchelf detection with a status message provides good visibility.


336-338: LGTM! Uniform RPATH application across all targets.

The loop ensures all output targets have consistent INSTALL_RPATH settings, which is exactly what's needed.


360-360: LGTM! Using centralized target list for installation.

This change aligns with the TILELANG_OUTPUT_TARGETS definition and improves maintainability.


346-346: libcaffe2_nvrtc.so already uses lazy loading (dlopen) for CUDA symbols—no verification needed.

PyTorch does not link directly against nvrtc to keep CPU-only builds possible; instead it uses a stub that calls dlopen on libnvrtc.so. The patchelf command correctly removes direct libcuda.so dependencies, allowing the library to load successfully on CPU-only machines where CUDA drivers are absent. This matches PyTorch's documented design pattern for supporting dual CPU/CUDA binaries.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
CMakeLists.txt (1)

344-356: Critical: install(CODE ...) uses incorrect file path.

The generator expression $<TARGET_FILE:${target}> expands at CMake generation time to the build directory path, but install(CODE ...) executes at install time when the files are already in ${CMAKE_INSTALL_PREFIX}/tilelang/lib. This causes patchelf to fail or operate on stale build artifacts instead of the installed files, defeating the PR objective of enabling CPU-only imports.

🔎 Proposed fix

Replace the generator expression with the actual installed file path:

     install(CODE "
+      set(installed_lib \"\${CMAKE_INSTALL_PREFIX}/tilelang/lib/${CMAKE_SHARED_LIBRARY_PREFIX}${target}${CMAKE_SHARED_LIBRARY_SUFFIX}\")
       execute_process(
-        COMMAND ${PATCHELF_EXECUTABLE} --remove-needed libcuda.so.1 --remove-needed libcuda.so \"$<TARGET_FILE:${target}>\"
-        WORKING_DIRECTORY \"${CMAKE_INSTALL_PREFIX}\"
+        COMMAND ${PATCHELF_EXECUTABLE} --remove-needed libcuda.so.1 --remove-needed libcuda.so \"\${installed_lib}\"
         RESULT_VARIABLE patchelf_result
       )
       if(patchelf_result EQUAL 0)
-        message(STATUS \"`patchelf` successfully removed dependency `libcuda.so` from $<TARGET_FILE:${target}>\")
+        message(STATUS \"`patchelf` successfully removed dependencies `libcuda.so.1` and `libcuda.so` from \${installed_lib}\")
       else()
-        message(WARNING \"`patchelf` failed to remove dependency `libcuda.so` from $<TARGET_FILE:${target}>\")
+        message(WARNING \"`patchelf` failed to remove dependencies `libcuda.so.1` and `libcuda.so` from \${installed_lib}\")
       endif()
     ")

Note: This fix also addresses the message inconsistency.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c6a278 and 8d6fb1d.

📒 Files selected for processing (2)
  • CMakeLists.txt (1 hunks)
  • pyproject.toml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Quick Lint
  • GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
  • GitHub Check: Build SDist
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
  • GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
🔇 Additional comments (4)
CMakeLists.txt (4)

309-314: Good refactoring: centralized target list.

Introducing TILELANG_OUTPUT_TARGETS as a single source of truth for all output targets improves maintainability and consistency throughout the build configuration.


329-332: LGTM!

The find_program() call and status message follow CMake best practices for locating optional build tools.


336-338: LGTM!

Using a foreach loop to apply INSTALL_RPATH to all targets is cleaner and more maintainable than individual set_target_properties() calls.


360-360: LGTM!

Using the centralized ${TILELANG_OUTPUT_TARGETS} variable maintains consistency with the refactoring introduced earlier in the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Cannot import tilelang on pure CPU machine with CUDA build

3 participants