[Enhancement] Allow `import tilelang` on CPU-only machines without CUDA libraries #1481

XuehaiPan · 2025-12-19T15:48:28Z

Resolves #1478

Summary by CodeRabbit

New Features
- Added set_log_level() for runtime logging control.
- Added a CUDA driver stub that provides a safe CPU-only fallback with runtime-loaded CUDA wrappers.
Performance / UX
- Heavy native dependencies now lazy-load to reduce import-time overhead.
- Header inclusion tightened to enforce stub usage.
Chores
- Unified library output/install layout and removed direct CUDA runtime links at install time.
CI Improvements
- Added CI env vars, caching and setup steps, replaced yum with dnf, adjusted CUDA build matrix, and added Python 3.12 wheel validation/testing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-12-19T15:48:37Z

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

coderabbitai · 2025-12-19T15:48:38Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a lazy-load mechanism to defer heavy/shared-library loads, exposes set_log_level(level), provides a CPU-safe libcuda stub with extern-C wrappers and include guards, updates sources to use the stub, adds a cuda_stub CMake target with install-time patchelf removal of libcuda, and updates CI to use dnf/uv and validate built wheels. (≤50 words)

Changes

Cohort / File(s)	Summary
Workflow — dist & wheel validation `\.github/workflows/dist.yml`	Add CI env/cache vars (`UV_INDEX_STRATEGY`, `UV_HTTP_TIMEOUT`, `XDG_CACHE_HOME`, `PIP_CACHE_DIR`, `UV_CACHE_DIR`); replace `yum`→`dnf`; add `astral-sh/setup-uv@v7` step; parse/export CUDA parts and `UV_INDEX`; add Python 3.12 wheel-test venv steps; apply across build-wheels and build-sdist.
Module init & logging `tilelang/__init__.py`	Introduce `_lazy_load_lib` context manager to adjust dlopen/ctypes flags and defer heavy imports (env, tvm, libinfo, submodules); remove prior TqdmLoggingHandler/_init_logger pattern; add public `set_log_level(level)`; compute `__version__` and delete helper symbols after use.
CUDA runtime stubs — impl & header `src/target/stubs/cuda.h`, `src/target/stubs/cuda.cc`	Add `CUDADriverAPI` singleton that dlopen()s `libcuda.so(.1)`, resolves required/optional CUDA driver symbols, exposes `get()/is_available()/get_handle()`, and supplies extern "C" wrappers so driver calls route through the stub for CPU-only machines.
Source includes updated to stubs `src/op/builtin.cc`, `src/op/copy.cc`, `src/runtime/runtime.cc`	Replace `#include "../target/cuda.h"` → `#include "../target/stubs/cuda.h"`; no behavior changes.
Build system — CUDA stub, RPATH & patchelf `CMakeLists.txt`	Add `cuda_stub` shared target, include it in `TILELANG_OUTPUT_TARGETS`, link it into targets when `USE_CUDA`; unify per-target output dirs and INSTALL_RPATH; detect `patchelf` and run install-time `patchelf --remove-needed` to strip `libcuda.so(.1)` from installed artifacts.
Packaging config / wheel repair `pyproject.toml`	Switch Linux installer steps from `yum` → `dnf`; remove some nvidia packages from install list; extend repair/audit exclusions to omit `libcuda.so.1`/`libcuda.so` and `/usr/local/cuda*`; minor formatting tweaks.
Header include guard tightening `src/target/stubs/vendor/cuda.h`	Add an include-time guard that enforces the header is included only by the stub implementation (compile-time error otherwise).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Importer as "Python importer"
  participant Tilelang as "tilelang/__init__"
  participant CUDAStub as "CUDADriverAPI (cuda_stub)"
  participant LibCUDA as "libcuda.so (system)"

  rect rgba(200,230,255,0.18)
  Note over Tilelang,CUDAStub: Lazy-load import-time behavior and CPU-safe stub
  end

  Importer->>Tilelang: import tilelang
  Tilelang->>Tilelang: enter _lazy_load_lib (adjust dlopen/ctypes flags)
  Tilelang->>CUDAStub: request CUDADriverAPI singleton
  CUDAStub->>LibCUDA: dlopen("libcuda.so.1" / "libcuda.so")
  alt libcuda found
    LibCUDA-->>CUDAStub: handle + symbols
    CUDAStub-->>Tilelang: resolved function pointers (is_available=true)
    Tilelang-->>Importer: import completes (CUDA available)
  else libcuda not found
    LibCUDA-->>CUDAStub: open failed
    CUDAStub-->>Tilelang: provide safe stubs (is_available=false)
    Tilelang-->>Importer: import completes (CPU-only safe)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[Build][CI] Build and test SDist in release CI #1098 — CI dist/workflow edits and wheel-test logic similar to the new dist.yml wheel validation steps.
[Refactor] Refactor env into a more flexible version #740 — touches module/environment initialization patterns and relates to lazy-load/refactor of tilelang/__init__.py.

Suggested reviewers

LeiWang1999

Poem

🐇 I tuck my libs in quiet paws,

I open slow when loaders call,
If CUDA sleeps, I shim and stay,
I patch the wheels and test by day,
Hop-hop — light imports for all.

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main objective: enabling tilelang import on CPU-only machines without CUDA libraries, which matches the core changes throughout the PR.
Linked Issues check	✅ Passed	The PR successfully addresses issue #1478 by implementing lazy loading of CUDA libraries to allow import on CPU-only machines: adds CUDA driver stubs with lazy dlopen loading, modifies tilelang/init.py to use lazy-loading context, updates build configuration to support dynamic CUDA library loading, and ensures import succeeds without hard CUDA dependency.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with the objective of supporting CPU-only imports: CI/build configuration updates for lazy loading, CUDA stub library implementation, header path adjustments, and module initialization refactoring all serve the core goal without extraneous modifications.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tilelang/__init__.py (2)
69-79: Consider warning on invalid level strings.

If an invalid level string is passed (e.g., set_log_level('DEBG')), it silently falls back to INFO. This could mask typos.
🔎 Optional enhancement to warn on invalid levels
 def set_log_level(level):
     if isinstance(level, str):
-        level = getattr(logging, level.upper(), logging.INFO)
+        upper_level = level.upper()
+        if not hasattr(logging, upper_level):
+            warnings.warn(f"Unknown log level '{level}', defaulting to INFO", stacklevel=2)
+            level = logging.INFO
+        else:
+            level = getattr(logging, upper_level)
     logger = logging.getLogger(__name__)
     logger.setLevel(level)
113-162: Static analysis: noqa: F401 directives may be unnecessary.

Ruff reports these noqa: F401 directives as unused because the F401 rule appears to be disabled in your project configuration. You could remove them to reduce noise, or enable F401 in your Ruff config if you want to enforce unused import checks.

This is a low-priority cleanup item.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2217eb7 and 226d896.

📒 Files selected for processing (2)

.github/workflows/dist.yml (1 hunks)
tilelang/__init__.py (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/dist.yml

🧬 Code graph analysis (1)

tilelang/__init__.py (3)

tilelang/env.py (3)

enable_cache (269-270)

disable_cache (272-273)

is_cache_enabled (266-267)

tilelang/libinfo.py (1)

find_lib_path (7-35)

tilelang/cache/__init__.py (1)

clear_cache (43-55)

🪛 Ruff (0.14.8)

tilelang/__init__.py

113-113: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

114-114: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

117-117: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

118-118: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

138-138: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

139-139: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

140-140: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

142-142: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

143-143: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

146-146: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

147-147: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

150-150: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

151-151: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

152-152: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

153-153: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

154-154: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

156-156: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

157-157: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

158-158: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

159-159: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

161-161: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

162-162: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Build SDist
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8

🔇 Additional comments (4)

.github/workflows/dist.yml (2)
168-173: LGTM - Python setup step added for wheel testing.

The setup step correctly configures Python 3.12 and uv for the subsequent wheel validation tests.

175-190: Good addition for wheel validation on CPU-only runners.

This test step validates that the built wheels can be installed and imported successfully, which directly tests the lazy loading changes. The cd / ensures the installed package is tested rather than the source directory.

One minor observation: the echo ... | uv run --no-project --script - pattern is less common than using python -c. Consider using:
uv run --no-project python -c "import tilelang; print(tilelang.__version__)"
However, the current approach should work correctly.
tilelang/__init__.py (2)

112-136: Verify that RTLD_LAZY achieves the goal for TVM imports.

The RTLD_LAZY flag defers symbol resolution for native libraries loaded via dlopen/ctypes.CDLL. However, TVM may load its runtime library during import tvm using its own loading mechanism.

The effectiveness depends on whether TVM's internal library loading respects the process-wide dlopenflags. The workflow test step should validate this works correctly.

49-50: Good namespace hygiene.

The del statements for internal helpers (_compute_version, _init_logger, _lazy_load_lib) keep the module's public namespace clean. This is a good practice.

Also applies to: 96-97, 163-164

tilelang/__init__.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tilelang/__init__.py (1)

102-117: Critical: Windows compatibility issue remains unresolved.

This is the same issue flagged in the previous review. os.RTLD_LAZY and sys.getdlopenflags() are Unix-only and will raise AttributeError on Windows, breaking imports on that platform.

🔎 Recommended fix with platform guard

 @contextlib.contextmanager
 def _lazy_load_lib():
+    # RTLD_LAZY is Unix-only; Windows doesn't support dlopen flags
+    if not hasattr(os, 'RTLD_LAZY') or not hasattr(sys, 'getdlopenflags'):
+        yield
+        return
+    
     old_flags = sys.getdlopenflags()
     old_init = ctypes.CDLL.__init__

     def lazy_init(self, name, mode=ctypes.DEFAULT_MODE, *args, **kwargs):
         return old_init(self, name, mode | os.RTLD_LAZY, *args, **kwargs)

     sys.setdlopenflags(old_flags | os.RTLD_LAZY)
     ctypes.CDLL.__init__ = lazy_init
     try:
         yield
     finally:
         sys.setdlopenflags(old_flags)
         ctypes.CDLL.__init__ = old_init

🧹 Nitpick comments (1)

tilelang/__init__.py (1)
120-169: Optional cleanup: Remove unnecessary noqa directives.

Ruff reports that the # noqa: F401 directives on many lines are unnecessary because F401 (unused imports) is not enabled in your linter configuration. If these imports are intentionally re-exported, consider using __all__ to make this explicit instead of suppressing warnings.
Alternative approach using __all__

You could replace the noqa comments with an explicit __all__ declaration at the end of the file:
__all__ = [
    'enable_cache', 'disable_cache', 'is_cache_enabled', 'env',
    'DataType', 'jit', 'lazy_jit', 'JITKernel', 'compile', 'par_compile',
    'Profiler', 'clear_cache', 'TensorSupplyType', 'deprecated',
    'Layout', 'Fragment', 'analysis', 'transform', 'language', 'engine',
    'tools', 'dtypes', 'autotune', 'PassConfigKey', 'lower',
    'register_cuda_postproc', 'register_hip_postproc', 'ir', 'tileop',
    'set_log_level', '__version__',
]
This makes the public API explicit and is more maintainable than individual noqa comments.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 226d896 and d2b6a9c.

📒 Files selected for processing (1)

tilelang/__init__.py (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/__init__.py (2)

tilelang/env.py (3)

enable_cache (269-270)

disable_cache (272-273)

is_cache_enabled (266-267)

tilelang/libinfo.py (1)

find_lib_path (7-35)

🪛 Ruff (0.14.8)

tilelang/__init__.py

120-120: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

121-121: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

124-124: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

125-125: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

145-145: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

146-146: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

147-147: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

149-149: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

150-150: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

153-153: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

154-154: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

157-157: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

158-158: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

159-159: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

160-160: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

161-161: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

163-163: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

164-164: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

165-165: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

166-166: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

168-168: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

169-169: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
GitHub Check: Build SDist

🔇 Additional comments (5)

tilelang/__init__.py (5)

1-8: LGTM: Necessary imports for lazy loading.

The added standard library imports (contextlib, os, sys) are required for implementing the lazy loading mechanism.

12-50: LGTM: Version computation with proper cleanup.

The version computation logic is sound, and the namespace cleanup on line 50 follows best practices.

53-97: LGTM: Well-structured logging infrastructure.

The logging infrastructure is properly implemented:

TqdmLoggingHandler correctly integrates with tqdm to prevent output interference

set_log_level provides a user-friendly public API

Logger initialization and cleanup follow best practices

119-171: Heavy imports properly wrapped in lazy loading context.

The import structure is correct, and moving all heavy imports inside the with _lazy_load_lib() context achieves the intended lazy loading behavior.

102-117: RTLD_LAZY implementation is correct but requires runtime validation on CPU-only systems.

The _lazy_load_lib() context manager at lines 102-117 is an intentional enhancement (commit: "Set RTLD_LAZY flag on CDLL") that correctly wraps TVM and tilelang library imports. RTLD_LAZY defers symbol resolution until symbols are actually used, which is the appropriate mechanism for allowing imports without CUDA libraries present.

The implementation properly:

Modifies process-level dlopen flags via sys.setdlopenflags()

Wraps CDLL initialization to append RTLD_LAZY to the mode

Restores original state in a finally block

CPU-only support is explicitly intended (CMakeLists.txt mentions "Build CPU-only when we explicitly disable CUDA"), confirming the use case is real.

However, no explicit test validating successful import on a CPU-only machine (with CUDA libraries hidden/unavailable) was found. The existing test file (test_tilelang_language_lazy_jit.py) covers the @lazy_jit decorator for JIT compilation, not the import-time lazy loading behavior. To definitively confirm this solves the OSError: libcuda.so.1 problem, add a test that imports tilelang with CUDA libraries unavailable.

tilelang/__init__.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tilelang/__init__.py (1)
100-117: Platform compatibility: os.RTLD_LAZY is Unix-only.

This issue was flagged in a previous review but remains unaddressed. os.RTLD_LAZY is not available on Windows, and accessing it will raise AttributeError. The libinfo.py snippet shows Windows is a supported platform (win32 checks exist), so this needs a guard.
🔎 Proposed fix for Windows compatibility
 @contextlib.contextmanager
 def _lazy_load_lib():
     import torch  # noqa: F401 # preload torch to avoid dlopen errors

+    if not hasattr(os, 'RTLD_LAZY'):
+        # Windows doesn't support dlopen flags; skip lazy loading adjustments
+        yield
+        return
+
     old_flags = sys.getdlopenflags()
     old_init = ctypes.CDLL.__init__

     def lazy_init(self, name, mode=ctypes.DEFAULT_MODE, *args, **kwargs):
         return old_init(self, name, mode | os.RTLD_LAZY, *args, **kwargs)

     sys.setdlopenflags(old_flags | os.RTLD_LAZY)
     ctypes.CDLL.__init__ = lazy_init
     try:
         yield
     finally:
         sys.setdlopenflags(old_flags)
         ctypes.CDLL.__init__ = old_init

🧹 Nitpick comments (1)

tilelang/__init__.py (1)

119-170: Lazy loading structure achieves the PR objective.

The heavy imports (tvm, ctypes library loading, submodules) are correctly wrapped within the _lazy_load_lib() context, deferring shared library resolution until import time with RTLD_LAZY semantics.

Regarding static analysis hints: The # noqa: F401 directives are intentional for re-exported symbols. If F401 is not enabled in your Ruff configuration, you could remove them for cleaner code, but they're harmless.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2b6a9c and a9d01bc.

📒 Files selected for processing (1)

tilelang/__init__.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tilelang/__init__.py (1)

tilelang/libinfo.py (1)

find_lib_path (7-35)

🪛 Ruff (0.14.8)

tilelang/__init__.py

82-82: Do not catch blind exception: Exception

(BLE001)

102-102: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

120-120: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

121-121: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

124-124: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

125-125: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

145-145: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

146-146: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

147-147: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

149-149: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

150-150: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

153-153: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

154-154: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

157-157: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

158-158: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

159-159: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

160-160: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

161-161: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

163-163: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

164-164: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

165-165: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

166-166: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

168-168: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

169-169: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Quick Lint
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build SDist
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal

🔇 Additional comments (3)

tilelang/__init__.py (3)

1-48: LGTM!

The import statements and version computation logic are well-structured with appropriate fallbacks.

66-97: LGTM!

The logger initialization with TqdmLoggingHandler is well-implemented. The broad exception catch in emit() (line 82) follows the standard logging.Handler pattern—logging should never raise, and handleError is correctly invoked as a fallback.

102-108: RTLD_LAZY defers function symbol resolution correctly, but verify no CUDA data dependencies exist.

RTLD_LAZY resolves symbols only as the code that references them is executed, and if the symbol is never referenced, then it is never resolved. This approach should allow import tilelang to succeed on CPU-only machines without libcuda.so.1.

However, lazy binding is performed only for function references; references to variables are always immediately bound when the shared object is loaded. If the tilelang library has direct data dependencies on CUDA symbols (not just function calls), those will fail at dlopen() time regardless of RTLD_LAZY. Confirm that TVM's CUDA integration uses function pointers rather than direct variable references.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9d01bc and eadeafd.

📒 Files selected for processing (1)

pyproject.toml (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build SDist
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8

pyproject.toml

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

.github/workflows/dist.yml (1)
168-174: Cache may not save due to missing actions: write permission.

Based on learnings, the astral-sh/setup-uv@v7 action's caching feature requires GITHUB_TOKEN with actions: write permission to save caches. The current permissions block (line 26) only grants contents: read, so caches will restore but not save.

This is not blocking since caches will still restore from previous runs, but consider adding actions: write to enable cache saving:
permissions:
  contents: read
  actions: write

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7c1604 and 19c3864.

📒 Files selected for processing (1)

.github/workflows/dist.yml (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/dist.yml

.github/workflows/dist.yml

pyproject.toml

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

CMakeLists.txt (1)
330-333: Consider using WARNING instead of STATUS when patchelf is not found.

When patchelf is missing, the libcuda.so dependency won't be removed, which means the resulting library may fail to import on CPU-only machines. A STATUS message could be easily missed in build output. Using WARNING would make this more visible to developers.
🔎 Suggested change
   find_program(PATCHELF_EXECUTABLE patchelf)
   if (NOT PATCHELF_EXECUTABLE)
-    message(STATUS "patchelf not found.")
+    message(WARNING "patchelf not found - libcuda.so dependency will not be removed from built libraries")
   endif()

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 414055d and 5bd8c50.

📒 Files selected for processing (1)

CMakeLists.txt (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
GitHub Check: Build SDist

🔇 Additional comments (4)

CMakeLists.txt (4)

309-316: Good refactoring to centralize target handling.

Defining TILELANG_OUTPUT_TARGETS improves maintainability by ensuring consistent handling across RPATH settings, patching, and installation. The unset(PATCHELF_EXECUTABLE) ensures a clean state before the conditional find_program call.

337-339: LGTM!

Clean refactoring that applies consistent RPATH settings to all output targets via the loop.

359-361: LGTM!

Using ${TILELANG_OUTPUT_TARGETS} maintains consistency with the variable defined at line 309 and ensures any future target additions only need updating in one place.

341-357: Well-structured patching approach that correctly handles targets from different directories.

The separation of POST_BUILD for local targets and a custom target for subdirectory targets is the right approach. Using ALL ensures patch_tvm_libs runs during normal builds. Since the install target depends on the ALL target by default, patching is guaranteed to occur before installation regardless of the build workflow.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

src/target/stubs/cuda.h (1)

69-70: Move cuLaunchKernelEx to optional APIs for CUDA < 12.0 compatibility.

As flagged in a previous review, cuLaunchKernelEx was introduced in CUDA 12.0. Listing it in TILELANG_LIBCUDA_API_REQUIRED means the lazy loader will throw an error on systems with CUDA 11.x drivers, even if this function is never called. This contradicts the PR's goal of supporting varied environments.

🔎 Proposed fix

Move cuLaunchKernelEx to the optional list or conditionally include it based on CUDA version:

 #define TILELANG_LIBCUDA_API_REQUIRED(_)                                       \
   _(cuGetErrorName)                                                            \
   _(cuGetErrorString)                                                          \
   _(cuCtxGetDevice)                                                            \
   _(cuCtxGetLimit)                                                             \
   _(cuCtxSetLimit)                                                             \
   _(cuCtxResetPersistingL2Cache)                                               \
   _(cuDeviceGetName)                                                           \
   _(cuDeviceGetAttribute)                                                      \
   _(cuModuleLoadData)                                                          \
   _(cuModuleLoadDataEx)                                                        \
   _(cuModuleUnload)                                                            \
   _(cuModuleGetFunction)                                                       \
   _(cuModuleGetGlobal)                                                         \
   _(cuFuncSetAttribute)                                                        \
   _(cuLaunchKernel)                                                            \
-  _(cuLaunchKernelEx)                                                          \
   _(cuLaunchCooperativeKernel)                                                 \
   _(cuMemsetD32)                                                               \
   _(cuStreamSetAttribute)

 // Optional APIs (may not exist in older drivers or specific configurations)
 // These are loaded but may be nullptr if not available
 #if defined(CUDA_VERSION) && (CUDA_VERSION >= 12000)
 #define TILELANG_LIBCUDA_API_OPTIONAL(_)                                       \
+  _(cuLaunchKernelEx)                                                          \
   _(cuTensorMapEncodeTiled)                                                    \
   _(cuTensorMapEncodeIm2col)

Also move the corresponding extern "C" declaration inside the #if defined(CUDA_VERSION) && (CUDA_VERSION >= 12000) block at lines 183-186.

🧹 Nitpick comments (2)

src/target/stubs/cuda.h (2)
104-107: Naming inconsistency between X-macro and actual CUDA API names.

The X-macro creates function pointer members using names like cuModuleGetGlobal_ and cuMemsetD32_ (line 104-107), but the actual CUDA driver API versioned names are cuModuleGetGlobal_v2 and cuMemsetD32_v2 (as seen in extern "C" declarations at lines 171 and 192). This mismatch is explained in comments (lines 97-99) but could lead to confusion during maintenance.
💡 Consider using versioned names in the X-macro

Update the X-macro to use the actual versioned API names to avoid confusion:
 #define TILELANG_LIBCUDA_API_REQUIRED(_)                                       \
   _(cuGetErrorName)                                                            \
   _(cuGetErrorString)                                                          \
   _(cuCtxGetDevice)                                                            \
   _(cuCtxGetLimit)                                                             \
   _(cuCtxSetLimit)                                                             \
   _(cuCtxResetPersistingL2Cache)                                               \
   _(cuDeviceGetName)                                                           \
   _(cuDeviceGetAttribute)                                                      \
   _(cuModuleLoadData)                                                          \
   _(cuModuleLoadDataEx)                                                        \
   _(cuModuleUnload)                                                            \
   _(cuModuleGetFunction)                                                       \
-  _(cuModuleGetGlobal)                                                         \
+  _(cuModuleGetGlobal_v2)                                                      \
   _(cuFuncSetAttribute)                                                        \
   _(cuLaunchKernel)                                                            \
   _(cuLaunchKernelEx)                                                          \
   _(cuLaunchCooperativeKernel)                                                 \
-  _(cuMemsetD32)                                                               \
+  _(cuMemsetD32_v2)                                                            \
   _(cuStreamSetAttribute)
Note: This would require updating the member access in cuda.cc to use the versioned names with trailing underscore (e.g., cuModuleGetGlobal_v2_).
145-195: Document error handling behavior for wrapper functions.

The extern "C" wrapper functions are declared but their error handling behavior when CUDA is unavailable is not documented. Based on the implementation snippet from cuda.cc, these wrappers call CUDADriverAPI::get(), which throws std::runtime_error if libcuda.so cannot be loaded.

Add documentation clarifying the error handling behavior:
📝 Suggested documentation enhancement

Add to the section comment at lines 138-144:
 // ============================================================================
 // Global wrapper functions for lazy-loaded CUDA driver API
 // ============================================================================
 // These functions provide drop-in replacements for CUDA driver API calls.
 // They are exported from the stub library and can be linked against directly.
 // The implementations are in cuda.cc.
+//
+// Error handling: All wrapper functions call CUDADriverAPI::get(), which will
+// throw std::runtime_error if libcuda.so cannot be loaded or if required
+// symbols are missing. Use CUDADriverAPI::is_available() to check CUDA
+// availability without throwing.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac22b7a and 5a7edd0.

📒 Files selected for processing (2)

src/target/stubs/cuda.h
src/target/stubs/vendor/cuda.h

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-11-14T07:56:11.098Z

Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

src/target/stubs/cuda.h

🧬 Code graph analysis (1)

src/target/stubs/cuda.h (1)

src/target/stubs/cuda.cc (8)

get (108-118)

get (108-108)

is_available (106-106)

is_available (106-106)

get_handle (100-104)

get_handle (100-100)

cuModuleLoadData (165-167)

cuModuleLoadData (165-165)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-13.0
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-13.0
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build SDist

🔇 Additional comments (2)

src/target/stubs/cuda.h (2)

1-37: LGTM! Clear documentation and well-designed include guard pattern.

The file-level documentation clearly explains the purpose and provides concrete usage examples. The include guard pattern (_TILELANG_CUDA_STUB_INCLUDE_GUARD) is a good design to enforce that vendor/cuda.h can only be included through this stub header.

39-48: LGTM! Standard cross-platform symbol visibility pattern.

The platform-specific export/import macros follow standard practices for shared library development on Windows and Unix-like systems.

src/target/stubs/cuda.h

src/target/stubs/vendor/cuda.h

oraluben

Overall LGTM, amazing work!

CMakeLists.txt

.github/workflows/dist.yml

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

.github/workflows/dist.yml (1)
183-200: Critical: uv run --no-project still bypasses the installed wheel.

While line 191 has been improved to use uv pip install directly, line 195 still uses uv run --no-project -- python, which creates an ephemeral environment instead of testing the wheel installed in test-venv. Even though the venv is activated, uv run --no-project ignores the activated environment.

Additionally, error handling could be improved by placing set -e at the top of the run block (before the for loop) rather than inside the subshell.
🔎 Recommended fix
       - name: Test built wheels
         run: |
+          set -e
           for WHEEL in wheelhouse/*.whl; do
             echo "Testing wheel: ${WHEEL}"
-            (
-              set -e
-              uv venv --python=3.12 test-venv
-              source test-venv/bin/activate
-              uv pip install -v "${WHEEL}"
-              (
-                set -e
-                cd /
-                uv run --no-project -- python -c "import tilelang; print(tilelang.__version__)"
-              )
-              deactivate
-              rm -rf test-venv
-            )
+            uv venv --python=3.12 test-venv
+            uv pip install --python test-venv -v "${WHEEL}"
+            # Test from root directory to ensure we're not importing from source
+            (cd / && "${GITHUB_WORKSPACE}/test-venv/bin/python" -c "import tilelang; print(tilelang.__version__)")
+            rm -rf test-venv
           done
This fix:

Adds set -e at the top for fail-fast behavior

Uses uv pip install --python test-venv to install directly into the venv

Invokes test-venv/bin/python directly to test the installed wheel

Removes unnecessary activation/deactivation and nested subshells

🧹 Nitpick comments (1)

.github/workflows/dist.yml (1)

157-161: Consider using only UV_TORCH_BACKEND for PyTorch backend selection.

Both UV_INDEX and UV_TORCH_BACKEND are currently set, which may lead to ambiguous package resolution behavior when multiple indices provide the same package. Based on the past discussion and UV's PyTorch integration guide, UV_TORCH_BACKEND is the explicit, recommended approach for selecting CUDA backends.

Consider removing line 160 (UV_INDEX) and relying solely on UV_TORCH_BACKEND (line 161) for clearer, more predictable behavior.

Based on learnings from past discussions between maintainers about UV_INDEX vs UV_TORCH_BACKEND.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 20beda8 and 42278c3.

📒 Files selected for processing (1)

.github/workflows/dist.yml

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-12-24T17:20:32.819Z

Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

.github/workflows/dist.yml

📚 Learning: 2025-12-15T08:56:23.827Z

Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.

Applied to files:

.github/workflows/dist.yml

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/dist.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-13.0
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-13.0
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build SDist

🔇 Additional comments (4)

.github/workflows/dist.yml (4)

38-42: LGTM with minor note.

The UV environment variables are configured appropriately. The cache directories will resolve to absolute paths at runtime, which should work correctly with the caching actions.

Note: The inline comment mentions "to be updated" - ensure these paths are finalized before merging.

66-72: LGTM!

The setup-uv action is properly configured with caching enabled, which will improve workflow performance.

176-181: LGTM!

Consistent setup of UV across both build-sdist and build-wheels jobs enables proper wheel testing.

116-117: The CUDA-13.0 version in these lines is the current standard in the build matrix, alongside CUDA-12.8. No evidence of a CUDA-13.1 version exists in the codebase to support the claim of a downgrade.

Likely an incorrect or invalid review comment.

XuehaiPan · 2026-01-04T07:36:55Z

cc @LeiWang1999 to stamp this

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

.github/workflows/dist.yml (1)
189-207: Critical: uv run --no-project bypasses the installed wheel, invalidating the test.

The test creates and activates a venv (Lines 195-196), installs the wheel (Line 197), but then uses uv run --no-project (Line 201) which spawns an ephemeral environment that doesn't contain the installed wheel. This completely defeats the validation purpose.

Additionally, error handling is inadequate - the outer subshell doesn't have set -e, so failures in venv creation or installation won't fail the CI step.
🔎 Proposed fix to directly invoke the venv's Python interpreter
       - name: Test built wheels
         run: |
+          set -e
           for WHEEL in wheelhouse/*.whl; do
             echo "Testing wheel: ${WHEEL}"
-            (
-              set -e
-              uv venv --python=3.12 test-venv
-              source test-venv/bin/activate
-              uv pip install -v "${WHEEL}"
-              (
-                set -e
-                cd /
-                uv run --no-project -- python -c "import tilelang; print(tilelang.__version__)"
-              )
-              deactivate
-              rm -rf test-venv
-            )
+            uv venv --python=3.12 test-venv
+            uv pip install --python test-venv -v "${WHEEL}"
+            # Test from root directory to ensure we're not importing from source
+            (cd / && "${GITHUB_WORKSPACE}/test-venv/bin/python" -c "import tilelang; print(tilelang.__version__)")
+            rm -rf test-venv
           done
This approach:

Adds set -e at the top to fail fast on any error

Removes the activation/deactivation which is unnecessary when directly targeting a venv

Uses uv pip install --python test-venv to install directly into the venv

Invokes test-venv/bin/python directly to test the installed package

Removes unnecessary subshells

🧹 Nitpick comments (2)

.github/workflows/dist.yml (2)
160-161: Consider whether both UV_INDEX and UV_TORCH_BACKEND are necessary.

Based on previous discussion in this PR, UV_TORCH_BACKEND is the more explicit and recommended approach for PyTorch backend selection. Setting both UV_INDEX and relying on UV_INDEX_STRATEGY: "unsafe-best-match" (Line 38) may introduce ambiguity when multiple indexes provide the same package.

If UV_TORCH_BACKEND alone is sufficient for your use case, consider removing the UV_INDEX configuration to simplify the setup.
🔎 Proposed simplification
-            echo "UV_INDEX=https://download.pytorch.org/whl/cu${CUDA_VERSION_MAJMIN_NODOT}" | tee -a "${GITHUB_ENV}"
             echo "UV_TORCH_BACKEND=cu${CUDA_VERSION_MAJMIN_NODOT}" | tee -a "${GITHUB_ENV}"
40-42: Remove misleading "to be updated" comments on cache environment variables.

The manually configured cache paths (XDG_CACHE_HOME, PIP_CACHE_DIR, UV_CACHE_DIR at lines 40-42) align with setup-uv's documented best practices for self-hosted runners, where workspace-relative cache paths prevent unbounded cache growth. However, the "# to be updated" comments are misleading—these paths are correct and don't require adjustment. Remove the comments to clarify intent.

Note: For GitHub-hosted runners where enable-cache: auto applies, the setup-uv@v7 action handles uv caching automatically; the manual UV_CACHE_DIR env var is redundant but harmless. Ensure that PIP_CACHE_DIR remains if pip is used separately outside of uv workflows.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 80a3084 and 27b9543.

📒 Files selected for processing (1)

.github/workflows/dist.yml

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-12-24T17:20:32.819Z

Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

.github/workflows/dist.yml

📚 Learning: 2025-12-15T08:56:23.827Z

Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.

Applied to files:

.github/workflows/dist.yml

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/dist.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with Nightly-CUDA-13.1
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with Nightly-CUDA-13.1
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
GitHub Check: Build SDist

🔇 Additional comments (2)

.github/workflows/dist.yml (2)

172-172: LGTM: Package manager updated to dnf.

The change from yum to dnf is appropriate for modern Fedora/RHEL-based container images used by cibuildwheel. This aligns with the current standard package manager in these distributions.

182-187: LGTM: Python and uv setup for wheel validation.

The addition of this setup step is appropriate for testing built wheels. Using Python 3.12 and enabling environment activation provides a clean environment for validation.

coderabbitai

Actionable comments posted: 1

Fix all issues with AI Agents 🤖

In @.github/workflows/dist.yml:
- Around line 160-167: The workflow sets UV_INDEX to the PyTorch nightly index
when matrix.target.toolkit starts with "Nightly-" which triggers a known uv
infinite-download bug; change the nightly branch that sets UV_INDEX (and avoids
setting UV_TORCH_BACKEND) to instead avoid pointing UV at the nightly
index—either pin a specific torch wheel URL/version into UV_INDEX, skip setting
UV_INDEX and use an editable/local/explicit install for nightly builds, or add
logic to set UV_INDEX to a non-nightly/temporal mirror; update the branch that
checks matrix.target.toolkit, the UV_INDEX assignment, and any use of
CUDA_VERSION_MAJMIN_NODOT so nightly jobs don’t use
https://download.pytorch.org/whl/nightly/... directly.

♻️ Duplicate comments (1)

.github/workflows/dist.yml (1)
195-213: uv run --no-project bypasses the installed wheel in test-venv.

The test creates a venv and installs the wheel (line 203), but then uses uv run --no-project (line 207) to run the import test. The --no-project flag creates an ephemeral isolated environment that doesn't use the packages installed in test-venv, defeating the purpose of the validation.

Additionally, set -e is only inside the inner subshell (line 205), so wheel installation failures won't fail the outer loop.
🔎 Proposed fix using direct venv interpreter
       - name: Test built wheels
         run: |
+          set -e
           for WHEEL in wheelhouse/*.whl; do
             echo "Testing wheel: ${WHEEL}"
-            (
-              set -e
-              uv venv --python=3.12 test-venv
-              source test-venv/bin/activate
-              uv pip install --refresh -v "${WHEEL}"
-              (
-                set -e
-                cd /
-                uv run --no-project -- python -c "import tilelang; print(tilelang.__version__)"
-              )
-              deactivate
-              rm -rf test-venv
-            )
+            uv venv --python=3.12 test-venv
+            uv pip install --python test-venv --refresh -v "${WHEEL}"
+            # Test from root directory to ensure we're not importing from source
+            (cd / && "${GITHUB_WORKSPACE}/test-venv/bin/python" -c "import tilelang; print(tilelang.__version__)")
+            rm -rf test-venv
           done
This approach:

Adds set -e at the top level to fail fast on any error

Uses uv pip install --python test-venv to install directly into the venv without activation

Invokes test-venv/bin/python directly to test the installed wheel

Removes unnecessary nested subshells and activation/deactivation steps

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a827b69 and dd7b82b.

📒 Files selected for processing (1)

.github/workflows/dist.yml

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-12-24T17:20:32.819Z

Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

.github/workflows/dist.yml

📚 Learning: 2025-12-15T08:56:23.827Z

Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.

Applied to files:

.github/workflows/dist.yml

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/dist.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with Nightly-CUDA-13.0
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
GitHub Check: Build SDist
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with Nightly-CUDA-13.0
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8

🔇 Additional comments (4)

.github/workflows/dist.yml (4)
116-117: LGTM! CUDA version change aligns with PyTorch nightly availability.

Changing from CUDA 13.1 to 13.0 ensures compatibility with available PyTorch nightly wheels. As mentioned in past discussions, the minor CUDA version is not critical when kernels are JIT-compiled.

178-178: LGTM! Updated to modern package manager.

Replacing yum with dnf is appropriate for recent Linux distributions used by cibuildwheel.

188-193: LGTM! Setup step for wheel testing.

The uv setup step is appropriate for the subsequent wheel testing workflow.

40-42: github.workspace is not interpolated in workflow-level env block.

The cache directory paths use ${{ github.workspace }}, but the workflow-level env block is evaluated before job contexts are available. These variables will contain the literal string ${{ github.workspace }} rather than the actual workspace path.

Move these environment variable definitions to job-level env blocks (under build-sdist and build-wheels jobs) where github.workspace is available, or use relative paths like .cache, .cache/pip, and .cache/uv.
🔎 Proposed fix to move cache paths to job-level env

Remove from workflow-level env (lines 40-42):
-  XDG_CACHE_HOME: "${{ github.workspace }}/.cache" # to be updated
-  PIP_CACHE_DIR: "${{ github.workspace }}/.cache/pip" # to be updated
-  UV_CACHE_DIR: "${{ github.workspace }}/.cache/uv" # to be updated
Then add to each job's env block (e.g., under build-sdist.env and build-wheels.env):
  XDG_CACHE_HOME: "${{ github.workspace }}/.cache"
  PIP_CACHE_DIR: "${{ github.workspace }}/.cache/pip"
  UV_CACHE_DIR: "${{ github.workspace }}/.cache/uv"
Or use relative paths in the workflow-level env:
-  XDG_CACHE_HOME: "${{ github.workspace }}/.cache" # to be updated
-  PIP_CACHE_DIR: "${{ github.workspace }}/.cache/pip" # to be updated
-  UV_CACHE_DIR: "${{ github.workspace }}/.cache/uv" # to be updated
+  XDG_CACHE_HOME: ".cache"
+  PIP_CACHE_DIR: ".cache/pip"
+  UV_CACHE_DIR: ".cache/uv"
⛔ Skipped due to learnings
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

.github/workflows/dist.yml

coderabbitai

Actionable comments posted: 1

Fix all issues with AI Agents 🤖

In @.github/workflows/dist.yml:
- Around line 40-42: Update the inline comments on the cache env vars in the
workflow: either remove the "# to be updated" comment if XDG_CACHE_HOME,
PIP_CACHE_DIR, and UV_CACHE_DIR values are correct and final, or replace each
comment with a precise note describing what must change (e.g., target path, use
of runner temp, or reason for customization) so the intent is clear; apply this
to the XDG_CACHE_HOME, PIP_CACHE_DIR, and UV_CACHE_DIR entries in dist.yml.

♻️ Duplicate comments (2)

.github/workflows/dist.yml (2)
160-167: Critical: PyTorch nightly index bug remains unresolved.

This segment still sets UV_INDEX to the PyTorch nightly URL (line 162), which was flagged in previous reviews as triggering a known uv bug (issues #9651/#10307) causing infinite download loops. The stable builds (lines 163-166) set both UV_INDEX and UV_TORCH_BACKEND, but previous discussion suggests UV_TORCH_BACKEND alone may be sufficient for stable builds.

Consider:

For nightly builds: Pin a specific torch wheel URL or use an alternative installation method to avoid the nightly index bug

For stable builds: Evaluate whether UV_INDEX is needed when UV_TORCH_BACKEND is set, as the backend selection may be sufficient

195-219: Critical: Test still uses uv run --no-project which bypasses the installed wheel.

The test creates and activates test-venv, installs the wheel into it (line 209), but then uses uv run --no-project (line 213) to run the import test. The --no-project flag creates an ephemeral environment that bypasses the venv and the installed wheel, defeating the purpose of the validation.

The fix from previous reviews remains valid: invoke the venv's Python interpreter directly instead of using uv run --no-project.
🔎 Recommended fix using direct venv Python
       - name: Test built wheels
         run: |
+          set -e
           for WHEEL in wheelhouse/*.whl; do
             echo "Testing wheel: ${WHEEL}"
-            (
-              set -e
-              uv venv --python=3.12 test-venv
-              source test-venv/bin/activate
-
-              uv pip install --upgrade pip setuptools wheel
-              if [[ "${UV_INDEX}" == *"/nightly/"* ]]; then
-                uv pip install --prerelease=allow -v torch
-              fi
-
-              uv pip install -v "${WHEEL}"
-              (
-                set -e
-                cd /
-                uv run --no-project -- python -c "import tilelang; print(tilelang.__version__)"
-              )
-              deactivate
-              rm -rf test-venv
-            )
+            uv venv --python=3.12 test-venv
+            uv pip install --python test-venv --upgrade pip setuptools wheel
+            if [[ "${UV_INDEX}" == *"/nightly/"* ]]; then
+              uv pip install --python test-venv --prerelease=allow -v torch
+            fi
+            uv pip install --python test-venv -v "${WHEEL}"
+            # Test from root to ensure we're not importing from source
+            (cd / && "${GITHUB_WORKSPACE}/test-venv/bin/python" -c "import tilelang; print(tilelang.__version__)")
+            rm -rf test-venv
           done
This eliminates venv activation/deactivation, uses --python test-venv for installs, and invokes the venv's Python directly.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0abf514 and 6d0d01e.

📒 Files selected for processing (1)

.github/workflows/dist.yml

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-12-24T17:20:32.819Z

Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.

Applied to files:

.github/workflows/dist.yml

📚 Learning: 2025-12-15T08:56:23.827Z

Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.

Applied to files:

.github/workflows/dist.yml

📚 Learning: 2025-10-10T13:29:29.347Z

Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.

Applied to files:

.github/workflows/dist.yml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
GitHub Check: Build SDist
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with Nightly-CUDA-13.0
GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with Nightly-CUDA-13.0
GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal

🔇 Additional comments (4)

.github/workflows/dist.yml (4)

66-72: LGTM!

The uv setup with caching is configured correctly.

116-117: LGTM!

The CUDA version update to 13.0 for nightly builds is appropriate.

178-178: LGTM!

Updating from yum to dnf is appropriate for modern RHEL/Fedora-based build containers.

188-194: LGTM!

The uv setup with caching is configured correctly for the build-wheels job.

.github/workflows/dist.yml

[Enhancement] Set RTLD_LAZY flag on dlopen

4452437

XuehaiPan self-assigned this Dec 19, 2025

XuehaiPan added enhancement New feature or request dependencies Pull requests that update a dependency file labels Dec 19, 2025

[CI] Add import test for CPU-only machine

226d896

XuehaiPan marked this pull request as ready for review December 19, 2025 16:06

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

tilelang/__init__.py Show resolved Hide resolved

[Enhancement] Set RTLD_LAZY flag on CDLL

d2b6a9c

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

tilelang/__init__.py Show resolved Hide resolved

XuehaiPan added 2 commits December 20, 2025 01:08

[Refactor] Preload torch

1239e20

Merge remote-tracking branch 'upstream/main' into lazy-dlopen

a9d01bc

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

[Refactor] Remove libcuda.so.1 dependency from wheel

eadeafd

XuehaiPan force-pushed the lazy-dlopen branch from bca3f3a to eadeafd Compare December 19, 2025 17:23

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

[CI] Install zip in CIBW container

1ddb30b

XuehaiPan force-pushed the lazy-dlopen branch from a57c8cf to 7ca7298 Compare December 19, 2025 17:43

[Fix] Use wheel pack/unpack instead of zip

e7c1604

XuehaiPan force-pushed the lazy-dlopen branch 2 times, most recently from 8ec5352 to 1929031 Compare December 19, 2025 18:11

[Fix] Fix wheel test commands

19c3864

XuehaiPan force-pushed the lazy-dlopen branch from 1929031 to 19c3864 Compare December 19, 2025 18:17

coderabbitai bot reviewed Dec 19, 2025

View reviewed changes

.github/workflows/dist.yml Show resolved Hide resolved

[CI] set UV_INDEX

9a0bec9

XuehaiPan force-pushed the lazy-dlopen branch from 919423c to 9a0bec9 Compare December 19, 2025 18:36

oraluben reviewed Dec 20, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

oraluben reviewed Dec 20, 2025

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 20, 2025

View reviewed changes

coderabbitai bot reviewed Dec 30, 2025

View reviewed changes

src/target/stubs/cuda.h Show resolved Hide resolved

src/target/stubs/vendor/cuda.h Show resolved Hide resolved

oraluben reviewed Dec 30, 2025

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

.github/workflows/dist.yml Outdated Show resolved Hide resolved

.github/workflows/dist.yml Show resolved Hide resolved

XuehaiPan added 2 commits December 30, 2025 23:40

[Misc] cleanup

20beda8

[CI] set UV_TORCH_BACKEND for non-nightly versions

42278c3

coderabbitai bot reviewed Dec 31, 2025

View reviewed changes

oraluben previously approved these changes Dec 31, 2025

View reviewed changes

XuehaiPan added 3 commits January 2, 2026 20:29

Merge remote-tracking branch 'upstream/main' into lazy-dlopen

ef681c4

Merge remote-tracking branch 'upstream/main' into lazy-dlopen

763b938

Merge remote-tracking branch 'upstream/main' into lazy-dlopen

80a3084

Merge remote-tracking branch 'upstream/main' into lazy-dlopen

27b9543

XuehaiPan dismissed oraluben’s stale review via 27b9543 January 5, 2026 08:22

XuehaiPan requested a review from oraluben January 5, 2026 08:23

oraluben previously approved these changes Jan 5, 2026

View reviewed changes

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

[CI] set UV_TORCH_BACKEND for non-nightly versions

5b13465

XuehaiPan dismissed oraluben’s stale review via 5b13465 January 5, 2026 08:39

[CI] set UV_TORCH_BACKEND for non-nightly versions

a827b69

oraluben previously approved these changes Jan 5, 2026

View reviewed changes

oraluben self-requested a review January 5, 2026 08:59

[CI] refresh cache for uv pip install

dd7b82b

XuehaiPan dismissed oraluben’s stale review via dd7b82b January 5, 2026 09:46

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

.github/workflows/dist.yml Show resolved Hide resolved

XuehaiPan added 2 commits January 5, 2026 17:55

[CI] clean up torch uv cache

0abf514

[CI] install prerelease torch for nightly

6d0d01e

coderabbitai bot reviewed Jan 5, 2026

View reviewed changes

.github/workflows/dist.yml Show resolved Hide resolved

oraluben approved these changes Jan 5, 2026

View reviewed changes

LeiWang1999 approved these changes Jan 5, 2026

View reviewed changes

XuehaiPan merged commit cfbc49b into tile-ai:main Jan 6, 2026
13 checks passed

XuehaiPan deleted the lazy-dlopen branch January 6, 2026 04:16

[Enhancement] Allow import tilelang on CPU-only machines without CUDA libraries #1481

[Enhancement] Allow import tilelang on CPU-only machines without CUDA libraries #1481

Conversation

XuehaiPan commented Dec 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

oraluben left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

XuehaiPan commented Jan 4, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

[Enhancement] Allow `import tilelang` on CPU-only machines without CUDA libraries #1481

[Enhancement] Allow `import tilelang` on CPU-only machines without CUDA libraries #1481

XuehaiPan commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading