-
Notifications
You must be signed in to change notification settings - Fork 381
[Enhancement] Allow import tilelang on CPU-only machines without CUDA libraries
#1481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a lazy-load mechanism to defer heavy/shared-library loads, exposes set_log_level(level), provides a CPU-safe libcuda stub with extern-C wrappers and include guards, updates sources to use the stub, adds a cuda_stub CMake target with install-time patchelf removal of libcuda, and updates CI to use dnf/uv and validate built wheels. (≤50 words) Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Importer as "Python importer"
participant Tilelang as "tilelang/__init__"
participant CUDAStub as "CUDADriverAPI (cuda_stub)"
participant LibCUDA as "libcuda.so (system)"
rect rgba(200,230,255,0.18)
Note over Tilelang,CUDAStub: Lazy-load import-time behavior and CPU-safe stub
end
Importer->>Tilelang: import tilelang
Tilelang->>Tilelang: enter _lazy_load_lib (adjust dlopen/ctypes flags)
Tilelang->>CUDAStub: request CUDADriverAPI singleton
CUDAStub->>LibCUDA: dlopen("libcuda.so.1" / "libcuda.so")
alt libcuda found
LibCUDA-->>CUDAStub: handle + symbols
CUDAStub-->>Tilelang: resolved function pointers (is_available=true)
Tilelang-->>Importer: import completes (CUDA available)
else libcuda not found
LibCUDA-->>CUDAStub: open failed
CUDAStub-->>Tilelang: provide safe stubs (is_available=false)
Tilelang-->>Importer: import completes (CPU-only safe)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tilelang/__init__.py (2)
69-79: Consider warning on invalid level strings.If an invalid level string is passed (e.g.,
set_log_level('DEBG')), it silently falls back toINFO. This could mask typos.🔎 Optional enhancement to warn on invalid levels
def set_log_level(level): if isinstance(level, str): - level = getattr(logging, level.upper(), logging.INFO) + upper_level = level.upper() + if not hasattr(logging, upper_level): + warnings.warn(f"Unknown log level '{level}', defaulting to INFO", stacklevel=2) + level = logging.INFO + else: + level = getattr(logging, upper_level) logger = logging.getLogger(__name__) logger.setLevel(level)
113-162: Static analysis:noqa: F401directives may be unnecessary.Ruff reports these
noqa: F401directives as unused because the F401 rule appears to be disabled in your project configuration. You could remove them to reduce noise, or enable F401 in your Ruff config if you want to enforce unused import checks.This is a low-priority cleanup item.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/dist.yml(1 hunks)tilelang/__init__.py(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/dist.yml
🧬 Code graph analysis (1)
tilelang/__init__.py (3)
tilelang/env.py (3)
enable_cache(269-270)disable_cache(272-273)is_cache_enabled(266-267)tilelang/libinfo.py (1)
find_lib_path(7-35)tilelang/cache/__init__.py (1)
clear_cache(43-55)
🪛 Ruff (0.14.8)
tilelang/__init__.py
113-113: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
114-114: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
117-117: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
118-118: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
138-138: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
139-139: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
140-140: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
142-142: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
143-143: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
146-146: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
147-147: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
150-150: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
151-151: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
152-152: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
153-153: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
154-154: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
156-156: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
157-157: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
158-158: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
159-159: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
161-161: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
162-162: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
🔇 Additional comments (4)
.github/workflows/dist.yml (2)
168-173: LGTM - Python setup step added for wheel testing.The setup step correctly configures Python 3.12 and uv for the subsequent wheel validation tests.
175-190: Good addition for wheel validation on CPU-only runners.This test step validates that the built wheels can be installed and imported successfully, which directly tests the lazy loading changes. The
cd /ensures the installed package is tested rather than the source directory.One minor observation: the
echo ... | uv run --no-project --script -pattern is less common than usingpython -c. Consider using:uv run --no-project python -c "import tilelang; print(tilelang.__version__)"However, the current approach should work correctly.
tilelang/__init__.py (2)
112-136: Verify thatRTLD_LAZYachieves the goal for TVM imports.The
RTLD_LAZYflag defers symbol resolution for native libraries loaded viadlopen/ctypes.CDLL. However, TVM may load its runtime library duringimport tvmusing its own loading mechanism.The effectiveness depends on whether TVM's internal library loading respects the process-wide
dlopenflags. The workflow test step should validate this works correctly.
49-50: Good namespace hygiene.The
delstatements for internal helpers (_compute_version,_init_logger,_lazy_load_lib) keep the module's public namespace clean. This is a good practice.Also applies to: 96-97, 163-164
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tilelang/__init__.py (1)
102-117: Critical: Windows compatibility issue remains unresolved.This is the same issue flagged in the previous review.
os.RTLD_LAZYandsys.getdlopenflags()are Unix-only and will raiseAttributeErroron Windows, breaking imports on that platform.🔎 Recommended fix with platform guard
@contextlib.contextmanager def _lazy_load_lib(): + # RTLD_LAZY is Unix-only; Windows doesn't support dlopen flags + if not hasattr(os, 'RTLD_LAZY') or not hasattr(sys, 'getdlopenflags'): + yield + return + old_flags = sys.getdlopenflags() old_init = ctypes.CDLL.__init__ def lazy_init(self, name, mode=ctypes.DEFAULT_MODE, *args, **kwargs): return old_init(self, name, mode | os.RTLD_LAZY, *args, **kwargs) sys.setdlopenflags(old_flags | os.RTLD_LAZY) ctypes.CDLL.__init__ = lazy_init try: yield finally: sys.setdlopenflags(old_flags) ctypes.CDLL.__init__ = old_init
🧹 Nitpick comments (1)
tilelang/__init__.py (1)
120-169: Optional cleanup: Remove unnecessarynoqadirectives.Ruff reports that the
# noqa: F401directives on many lines are unnecessary because F401 (unused imports) is not enabled in your linter configuration. If these imports are intentionally re-exported, consider using__all__to make this explicit instead of suppressing warnings.Alternative approach using __all__
You could replace the noqa comments with an explicit
__all__declaration at the end of the file:__all__ = [ 'enable_cache', 'disable_cache', 'is_cache_enabled', 'env', 'DataType', 'jit', 'lazy_jit', 'JITKernel', 'compile', 'par_compile', 'Profiler', 'clear_cache', 'TensorSupplyType', 'deprecated', 'Layout', 'Fragment', 'analysis', 'transform', 'language', 'engine', 'tools', 'dtypes', 'autotune', 'PassConfigKey', 'lower', 'register_cuda_postproc', 'register_hip_postproc', 'ir', 'tileop', 'set_log_level', '__version__', ]This makes the public API explicit and is more maintainable than individual noqa comments.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/__init__.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/__init__.py (2)
tilelang/env.py (3)
enable_cache(269-270)disable_cache(272-273)is_cache_enabled(266-267)tilelang/libinfo.py (1)
find_lib_path(7-35)
🪛 Ruff (0.14.8)
tilelang/__init__.py
120-120: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
121-121: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
124-124: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
125-125: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
145-145: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
146-146: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
147-147: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
149-149: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
150-150: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
153-153: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
154-154: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
157-157: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
158-158: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
159-159: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
160-160: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
161-161: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
163-163: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
164-164: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
165-165: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
166-166: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
168-168: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
169-169: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (5)
tilelang/__init__.py (5)
1-8: LGTM: Necessary imports for lazy loading.The added standard library imports (
contextlib,os,sys) are required for implementing the lazy loading mechanism.
12-50: LGTM: Version computation with proper cleanup.The version computation logic is sound, and the namespace cleanup on line 50 follows best practices.
53-97: LGTM: Well-structured logging infrastructure.The logging infrastructure is properly implemented:
TqdmLoggingHandlercorrectly integrates with tqdm to prevent output interferenceset_log_levelprovides a user-friendly public API- Logger initialization and cleanup follow best practices
119-171: Heavy imports properly wrapped in lazy loading context.The import structure is correct, and moving all heavy imports inside the
with _lazy_load_lib()context achieves the intended lazy loading behavior.
102-117: RTLD_LAZY implementation is correct but requires runtime validation on CPU-only systems.The
_lazy_load_lib()context manager at lines 102-117 is an intentional enhancement (commit: "SetRTLD_LAZYflag onCDLL") that correctly wraps TVM and tilelang library imports. RTLD_LAZY defers symbol resolution until symbols are actually used, which is the appropriate mechanism for allowing imports without CUDA libraries present.The implementation properly:
- Modifies process-level dlopen flags via
sys.setdlopenflags()- Wraps CDLL initialization to append RTLD_LAZY to the mode
- Restores original state in a finally block
CPU-only support is explicitly intended (CMakeLists.txt mentions "Build CPU-only when we explicitly disable CUDA"), confirming the use case is real.
However, no explicit test validating successful import on a CPU-only machine (with CUDA libraries hidden/unavailable) was found. The existing test file (
test_tilelang_language_lazy_jit.py) covers the @lazy_jit decorator for JIT compilation, not the import-time lazy loading behavior. To definitively confirm this solves theOSError: libcuda.so.1problem, add a test that imports tilelang with CUDA libraries unavailable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tilelang/__init__.py (1)
100-117: Platform compatibility:os.RTLD_LAZYis Unix-only.This issue was flagged in a previous review but remains unaddressed.
os.RTLD_LAZYis not available on Windows, and accessing it will raiseAttributeError. Thelibinfo.pysnippet shows Windows is a supported platform (win32checks exist), so this needs a guard.🔎 Proposed fix for Windows compatibility
@contextlib.contextmanager def _lazy_load_lib(): import torch # noqa: F401 # preload torch to avoid dlopen errors + if not hasattr(os, 'RTLD_LAZY'): + # Windows doesn't support dlopen flags; skip lazy loading adjustments + yield + return + old_flags = sys.getdlopenflags() old_init = ctypes.CDLL.__init__ def lazy_init(self, name, mode=ctypes.DEFAULT_MODE, *args, **kwargs): return old_init(self, name, mode | os.RTLD_LAZY, *args, **kwargs) sys.setdlopenflags(old_flags | os.RTLD_LAZY) ctypes.CDLL.__init__ = lazy_init try: yield finally: sys.setdlopenflags(old_flags) ctypes.CDLL.__init__ = old_init
🧹 Nitpick comments (1)
tilelang/__init__.py (1)
119-170: Lazy loading structure achieves the PR objective.The heavy imports (tvm, ctypes library loading, submodules) are correctly wrapped within the
_lazy_load_lib()context, deferring shared library resolution until import time withRTLD_LAZYsemantics.Regarding static analysis hints: The
# noqa: F401directives are intentional for re-exported symbols. If F401 is not enabled in your Ruff configuration, you could remove them for cleaner code, but they're harmless.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tilelang/__init__.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tilelang/__init__.py (1)
tilelang/libinfo.py (1)
find_lib_path(7-35)
🪛 Ruff (0.14.8)
tilelang/__init__.py
82-82: Do not catch blind exception: Exception
(BLE001)
102-102: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
120-120: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
121-121: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
124-124: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
125-125: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
145-145: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
146-146: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
147-147: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
149-149: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
150-150: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
153-153: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
154-154: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
157-157: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
158-158: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
159-159: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
160-160: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
161-161: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
163-163: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
164-164: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
165-165: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
166-166: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
168-168: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
169-169: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Quick Lint
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
🔇 Additional comments (3)
tilelang/__init__.py (3)
1-48: LGTM!The import statements and version computation logic are well-structured with appropriate fallbacks.
66-97: LGTM!The logger initialization with
TqdmLoggingHandleris well-implemented. The broad exception catch inemit()(line 82) follows the standardlogging.Handlerpattern—logging should never raise, andhandleErroris correctly invoked as a fallback.
102-108: RTLD_LAZY defers function symbol resolution correctly, but verify no CUDA data dependencies exist.RTLD_LAZY resolves symbols only as the code that references them is executed, and if the symbol is never referenced, then it is never resolved. This approach should allow
import tilelangto succeed on CPU-only machines withoutlibcuda.so.1.However, lazy binding is performed only for function references; references to variables are always immediately bound when the shared object is loaded. If the tilelang library has direct data dependencies on CUDA symbols (not just function calls), those will fail at dlopen() time regardless of RTLD_LAZY. Confirm that TVM's CUDA integration uses function pointers rather than direct variable references.
bca3f3a to
eadeafd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pyproject.toml(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
a57c8cf to
7ca7298
Compare
8ec5352 to
1929031
Compare
1929031 to
19c3864
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/dist.yml (1)
168-174: Cache may not save due to missingactions: writepermission.Based on learnings, the
astral-sh/setup-uv@v7action's caching feature requiresGITHUB_TOKENwithactions: writepermission to save caches. The current permissions block (line 26) only grantscontents: read, so caches will restore but not save.This is not blocking since caches will still restore from previous runs, but consider adding
actions: writeto enable cache saving:permissions: contents: read actions: write
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/dist.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/dist.yml
919423c to
9a0bec9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
CMakeLists.txt (1)
330-333: Consider usingWARNINGinstead ofSTATUSwhen patchelf is not found.When patchelf is missing, the libcuda.so dependency won't be removed, which means the resulting library may fail to import on CPU-only machines. A
STATUSmessage could be easily missed in build output. UsingWARNINGwould make this more visible to developers.🔎 Suggested change
find_program(PATCHELF_EXECUTABLE patchelf) if (NOT PATCHELF_EXECUTABLE) - message(STATUS "patchelf not found.") + message(WARNING "patchelf not found - libcuda.so dependency will not be removed from built libraries") endif()
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
CMakeLists.txt(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (4)
CMakeLists.txt (4)
309-316: Good refactoring to centralize target handling.Defining
TILELANG_OUTPUT_TARGETSimproves maintainability by ensuring consistent handling across RPATH settings, patching, and installation. Theunset(PATCHELF_EXECUTABLE)ensures a clean state before the conditionalfind_programcall.
337-339: LGTM!Clean refactoring that applies consistent RPATH settings to all output targets via the loop.
359-361: LGTM!Using
${TILELANG_OUTPUT_TARGETS}maintains consistency with the variable defined at line 309 and ensures any future target additions only need updating in one place.
341-357: Well-structured patching approach that correctly handles targets from different directories.The separation of POST_BUILD for local targets and a custom target for subdirectory targets is the right approach. Using
ALLensurespatch_tvm_libsruns during normal builds. Since the install target depends on the ALL target by default, patching is guaranteed to occur before installation regardless of the build workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/target/stubs/cuda.h (1)
69-70: MovecuLaunchKernelExto optional APIs for CUDA < 12.0 compatibility.As flagged in a previous review,
cuLaunchKernelExwas introduced in CUDA 12.0. Listing it inTILELANG_LIBCUDA_API_REQUIREDmeans the lazy loader will throw an error on systems with CUDA 11.x drivers, even if this function is never called. This contradicts the PR's goal of supporting varied environments.🔎 Proposed fix
Move
cuLaunchKernelExto the optional list or conditionally include it based on CUDA version:#define TILELANG_LIBCUDA_API_REQUIRED(_) \ _(cuGetErrorName) \ _(cuGetErrorString) \ _(cuCtxGetDevice) \ _(cuCtxGetLimit) \ _(cuCtxSetLimit) \ _(cuCtxResetPersistingL2Cache) \ _(cuDeviceGetName) \ _(cuDeviceGetAttribute) \ _(cuModuleLoadData) \ _(cuModuleLoadDataEx) \ _(cuModuleUnload) \ _(cuModuleGetFunction) \ _(cuModuleGetGlobal) \ _(cuFuncSetAttribute) \ _(cuLaunchKernel) \ - _(cuLaunchKernelEx) \ _(cuLaunchCooperativeKernel) \ _(cuMemsetD32) \ _(cuStreamSetAttribute) // Optional APIs (may not exist in older drivers or specific configurations) // These are loaded but may be nullptr if not available #if defined(CUDA_VERSION) && (CUDA_VERSION >= 12000) #define TILELANG_LIBCUDA_API_OPTIONAL(_) \ + _(cuLaunchKernelEx) \ _(cuTensorMapEncodeTiled) \ _(cuTensorMapEncodeIm2col)Also move the corresponding extern "C" declaration inside the
#if defined(CUDA_VERSION) && (CUDA_VERSION >= 12000)block at lines 183-186.
🧹 Nitpick comments (2)
src/target/stubs/cuda.h (2)
104-107: Naming inconsistency between X-macro and actual CUDA API names.The X-macro creates function pointer members using names like
cuModuleGetGlobal_andcuMemsetD32_(line 104-107), but the actual CUDA driver API versioned names arecuModuleGetGlobal_v2andcuMemsetD32_v2(as seen in extern "C" declarations at lines 171 and 192). This mismatch is explained in comments (lines 97-99) but could lead to confusion during maintenance.💡 Consider using versioned names in the X-macro
Update the X-macro to use the actual versioned API names to avoid confusion:
#define TILELANG_LIBCUDA_API_REQUIRED(_) \ _(cuGetErrorName) \ _(cuGetErrorString) \ _(cuCtxGetDevice) \ _(cuCtxGetLimit) \ _(cuCtxSetLimit) \ _(cuCtxResetPersistingL2Cache) \ _(cuDeviceGetName) \ _(cuDeviceGetAttribute) \ _(cuModuleLoadData) \ _(cuModuleLoadDataEx) \ _(cuModuleUnload) \ _(cuModuleGetFunction) \ - _(cuModuleGetGlobal) \ + _(cuModuleGetGlobal_v2) \ _(cuFuncSetAttribute) \ _(cuLaunchKernel) \ _(cuLaunchKernelEx) \ _(cuLaunchCooperativeKernel) \ - _(cuMemsetD32) \ + _(cuMemsetD32_v2) \ _(cuStreamSetAttribute)Note: This would require updating the member access in cuda.cc to use the versioned names with trailing underscore (e.g.,
cuModuleGetGlobal_v2_).
145-195: Document error handling behavior for wrapper functions.The extern "C" wrapper functions are declared but their error handling behavior when CUDA is unavailable is not documented. Based on the implementation snippet from
cuda.cc, these wrappers callCUDADriverAPI::get(), which throwsstd::runtime_errorif libcuda.so cannot be loaded.Add documentation clarifying the error handling behavior:
📝 Suggested documentation enhancement
Add to the section comment at lines 138-144:
// ============================================================================ // Global wrapper functions for lazy-loaded CUDA driver API // ============================================================================ // These functions provide drop-in replacements for CUDA driver API calls. // They are exported from the stub library and can be linked against directly. // The implementations are in cuda.cc. +// +// Error handling: All wrapper functions call CUDADriverAPI::get(), which will +// throw std::runtime_error if libcuda.so cannot be loaded or if required +// symbols are missing. Use CUDADriverAPI::is_available() to check CUDA +// availability without throwing.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/target/stubs/cuda.hsrc/target/stubs/vendor/cuda.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
src/target/stubs/cuda.h
🧬 Code graph analysis (1)
src/target/stubs/cuda.h (1)
src/target/stubs/cuda.cc (8)
get(108-118)get(108-108)is_available(106-106)is_available(106-106)get_handle(100-104)get_handle(100-100)cuModuleLoadData(165-167)cuModuleLoadData(165-165)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-13.0
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-13.0
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (2)
src/target/stubs/cuda.h (2)
1-37: LGTM! Clear documentation and well-designed include guard pattern.The file-level documentation clearly explains the purpose and provides concrete usage examples. The include guard pattern (
_TILELANG_CUDA_STUB_INCLUDE_GUARD) is a good design to enforce thatvendor/cuda.hcan only be included through this stub header.
39-48: LGTM! Standard cross-platform symbol visibility pattern.The platform-specific export/import macros follow standard practices for shared library development on Windows and Unix-like systems.
oraluben
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, amazing work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/dist.yml (1)
183-200: Critical:uv run --no-projectstill bypasses the installed wheel.While line 191 has been improved to use
uv pip installdirectly, line 195 still usesuv run --no-project -- python, which creates an ephemeral environment instead of testing the wheel installed intest-venv. Even though the venv is activated,uv run --no-projectignores the activated environment.Additionally, error handling could be improved by placing
set -eat the top of the run block (before the for loop) rather than inside the subshell.🔎 Recommended fix
- name: Test built wheels run: | + set -e for WHEEL in wheelhouse/*.whl; do echo "Testing wheel: ${WHEEL}" - ( - set -e - uv venv --python=3.12 test-venv - source test-venv/bin/activate - uv pip install -v "${WHEEL}" - ( - set -e - cd / - uv run --no-project -- python -c "import tilelang; print(tilelang.__version__)" - ) - deactivate - rm -rf test-venv - ) + uv venv --python=3.12 test-venv + uv pip install --python test-venv -v "${WHEEL}" + # Test from root directory to ensure we're not importing from source + (cd / && "${GITHUB_WORKSPACE}/test-venv/bin/python" -c "import tilelang; print(tilelang.__version__)") + rm -rf test-venv doneThis fix:
- Adds
set -eat the top for fail-fast behavior- Uses
uv pip install --python test-venvto install directly into the venv- Invokes
test-venv/bin/pythondirectly to test the installed wheel- Removes unnecessary activation/deactivation and nested subshells
🧹 Nitpick comments (1)
.github/workflows/dist.yml (1)
157-161: Consider using only UV_TORCH_BACKEND for PyTorch backend selection.Both
UV_INDEXandUV_TORCH_BACKENDare currently set, which may lead to ambiguous package resolution behavior when multiple indices provide the same package. Based on the past discussion and UV's PyTorch integration guide,UV_TORCH_BACKENDis the explicit, recommended approach for selecting CUDA backends.Consider removing line 160 (
UV_INDEX) and relying solely onUV_TORCH_BACKEND(line 161) for clearer, more predictable behavior.Based on learnings from past discussions between maintainers about UV_INDEX vs UV_TORCH_BACKEND.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/dist.yml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-24T17:20:32.819Z
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.
Applied to files:
.github/workflows/dist.yml
📚 Learning: 2025-12-15T08:56:23.827Z
Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.
Applied to files:
.github/workflows/dist.yml
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/dist.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-13.0
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-13.0
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (4)
.github/workflows/dist.yml (4)
38-42: LGTM with minor note.The UV environment variables are configured appropriately. The cache directories will resolve to absolute paths at runtime, which should work correctly with the caching actions.
Note: The inline comment mentions "to be updated" - ensure these paths are finalized before merging.
66-72: LGTM!The setup-uv action is properly configured with caching enabled, which will improve workflow performance.
176-181: LGTM!Consistent setup of UV across both build-sdist and build-wheels jobs enables proper wheel testing.
116-117: The CUDA-13.0 version in these lines is the current standard in the build matrix, alongside CUDA-12.8. No evidence of a CUDA-13.1 version exists in the codebase to support the claim of a downgrade.Likely an incorrect or invalid review comment.
|
cc @LeiWang1999 to stamp this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
.github/workflows/dist.yml (1)
189-207: Critical:uv run --no-projectbypasses the installed wheel, invalidating the test.The test creates and activates a venv (Lines 195-196), installs the wheel (Line 197), but then uses
uv run --no-project(Line 201) which spawns an ephemeral environment that doesn't contain the installed wheel. This completely defeats the validation purpose.Additionally, error handling is inadequate - the outer subshell doesn't have
set -e, so failures in venv creation or installation won't fail the CI step.🔎 Proposed fix to directly invoke the venv's Python interpreter
- name: Test built wheels run: | + set -e for WHEEL in wheelhouse/*.whl; do echo "Testing wheel: ${WHEEL}" - ( - set -e - uv venv --python=3.12 test-venv - source test-venv/bin/activate - uv pip install -v "${WHEEL}" - ( - set -e - cd / - uv run --no-project -- python -c "import tilelang; print(tilelang.__version__)" - ) - deactivate - rm -rf test-venv - ) + uv venv --python=3.12 test-venv + uv pip install --python test-venv -v "${WHEEL}" + # Test from root directory to ensure we're not importing from source + (cd / && "${GITHUB_WORKSPACE}/test-venv/bin/python" -c "import tilelang; print(tilelang.__version__)") + rm -rf test-venv doneThis approach:
- Adds
set -eat the top to fail fast on any error- Removes the activation/deactivation which is unnecessary when directly targeting a venv
- Uses
uv pip install --python test-venvto install directly into the venv- Invokes
test-venv/bin/pythondirectly to test the installed package- Removes unnecessary subshells
🧹 Nitpick comments (2)
.github/workflows/dist.yml (2)
160-161: Consider whether both UV_INDEX and UV_TORCH_BACKEND are necessary.Based on previous discussion in this PR,
UV_TORCH_BACKENDis the more explicit and recommended approach for PyTorch backend selection. Setting bothUV_INDEXand relying onUV_INDEX_STRATEGY: "unsafe-best-match"(Line 38) may introduce ambiguity when multiple indexes provide the same package.If
UV_TORCH_BACKENDalone is sufficient for your use case, consider removing theUV_INDEXconfiguration to simplify the setup.🔎 Proposed simplification
- echo "UV_INDEX=https://download.pytorch.org/whl/cu${CUDA_VERSION_MAJMIN_NODOT}" | tee -a "${GITHUB_ENV}" echo "UV_TORCH_BACKEND=cu${CUDA_VERSION_MAJMIN_NODOT}" | tee -a "${GITHUB_ENV}"
40-42: Remove misleading "to be updated" comments on cache environment variables.The manually configured cache paths (
XDG_CACHE_HOME,PIP_CACHE_DIR,UV_CACHE_DIRat lines 40-42) align with setup-uv's documented best practices for self-hosted runners, where workspace-relative cache paths prevent unbounded cache growth. However, the "# to be updated" comments are misleading—these paths are correct and don't require adjustment. Remove the comments to clarify intent.Note: For GitHub-hosted runners where
enable-cache: autoapplies, the setup-uv@v7 action handles uv caching automatically; the manualUV_CACHE_DIRenv var is redundant but harmless. Ensure thatPIP_CACHE_DIRremains if pip is used separately outside of uv workflows.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/dist.yml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-24T17:20:32.819Z
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.
Applied to files:
.github/workflows/dist.yml
📚 Learning: 2025-12-15T08:56:23.827Z
Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.
Applied to files:
.github/workflows/dist.yml
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/dist.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with Nightly-CUDA-13.1
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with Nightly-CUDA-13.1
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build SDist
🔇 Additional comments (2)
.github/workflows/dist.yml (2)
172-172: LGTM: Package manager updated to dnf.The change from
yumtodnfis appropriate for modern Fedora/RHEL-based container images used by cibuildwheel. This aligns with the current standard package manager in these distributions.
182-187: LGTM: Python and uv setup for wheel validation.The addition of this setup step is appropriate for testing built wheels. Using Python 3.12 and enabling environment activation provides a clean environment for validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @.github/workflows/dist.yml:
- Around line 160-167: The workflow sets UV_INDEX to the PyTorch nightly index
when matrix.target.toolkit starts with "Nightly-" which triggers a known uv
infinite-download bug; change the nightly branch that sets UV_INDEX (and avoids
setting UV_TORCH_BACKEND) to instead avoid pointing UV at the nightly
index—either pin a specific torch wheel URL/version into UV_INDEX, skip setting
UV_INDEX and use an editable/local/explicit install for nightly builds, or add
logic to set UV_INDEX to a non-nightly/temporal mirror; update the branch that
checks matrix.target.toolkit, the UV_INDEX assignment, and any use of
CUDA_VERSION_MAJMIN_NODOT so nightly jobs don’t use
https://download.pytorch.org/whl/nightly/... directly.
♻️ Duplicate comments (1)
.github/workflows/dist.yml (1)
195-213:uv run --no-projectbypasses the installed wheel in test-venv.The test creates a venv and installs the wheel (line 203), but then uses
uv run --no-project(line 207) to run the import test. The--no-projectflag creates an ephemeral isolated environment that doesn't use the packages installed intest-venv, defeating the purpose of the validation.Additionally,
set -eis only inside the inner subshell (line 205), so wheel installation failures won't fail the outer loop.🔎 Proposed fix using direct venv interpreter
- name: Test built wheels run: | + set -e for WHEEL in wheelhouse/*.whl; do echo "Testing wheel: ${WHEEL}" - ( - set -e - uv venv --python=3.12 test-venv - source test-venv/bin/activate - uv pip install --refresh -v "${WHEEL}" - ( - set -e - cd / - uv run --no-project -- python -c "import tilelang; print(tilelang.__version__)" - ) - deactivate - rm -rf test-venv - ) + uv venv --python=3.12 test-venv + uv pip install --python test-venv --refresh -v "${WHEEL}" + # Test from root directory to ensure we're not importing from source + (cd / && "${GITHUB_WORKSPACE}/test-venv/bin/python" -c "import tilelang; print(tilelang.__version__)") + rm -rf test-venv doneThis approach:
- Adds
set -eat the top level to fail fast on any error- Uses
uv pip install --python test-venvto install directly into the venv without activation- Invokes
test-venv/bin/pythondirectly to test the installed wheel- Removes unnecessary nested subshells and activation/deactivation steps
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/dist.yml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-24T17:20:32.819Z
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.
Applied to files:
.github/workflows/dist.yml
📚 Learning: 2025-12-15T08:56:23.827Z
Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.
Applied to files:
.github/workflows/dist.yml
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/dist.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with Nightly-CUDA-13.0
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with Nightly-CUDA-13.0
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
🔇 Additional comments (4)
.github/workflows/dist.yml (4)
116-117: LGTM! CUDA version change aligns with PyTorch nightly availability.Changing from CUDA 13.1 to 13.0 ensures compatibility with available PyTorch nightly wheels. As mentioned in past discussions, the minor CUDA version is not critical when kernels are JIT-compiled.
178-178: LGTM! Updated to modern package manager.Replacing
yumwithdnfis appropriate for recent Linux distributions used by cibuildwheel.
188-193: LGTM! Setup step for wheel testing.The uv setup step is appropriate for the subsequent wheel testing workflow.
40-42:github.workspaceis not interpolated in workflow-level env block.The cache directory paths use
${{ github.workspace }}, but the workflow-levelenvblock is evaluated before job contexts are available. These variables will contain the literal string${{ github.workspace }}rather than the actual workspace path.Move these environment variable definitions to job-level
envblocks (underbuild-sdistandbuild-wheelsjobs) wheregithub.workspaceis available, or use relative paths like.cache,.cache/pip, and.cache/uv.🔎 Proposed fix to move cache paths to job-level env
Remove from workflow-level env (lines 40-42):
- XDG_CACHE_HOME: "${{ github.workspace }}/.cache" # to be updated - PIP_CACHE_DIR: "${{ github.workspace }}/.cache/pip" # to be updated - UV_CACHE_DIR: "${{ github.workspace }}/.cache/uv" # to be updatedThen add to each job's env block (e.g., under
build-sdist.envandbuild-wheels.env):XDG_CACHE_HOME: "${{ github.workspace }}/.cache" PIP_CACHE_DIR: "${{ github.workspace }}/.cache/pip" UV_CACHE_DIR: "${{ github.workspace }}/.cache/uv"Or use relative paths in the workflow-level env:
- XDG_CACHE_HOME: "${{ github.workspace }}/.cache" # to be updated - PIP_CACHE_DIR: "${{ github.workspace }}/.cache/pip" # to be updated - UV_CACHE_DIR: "${{ github.workspace }}/.cache/uv" # to be updated + XDG_CACHE_HOME: ".cache" + PIP_CACHE_DIR: ".cache/pip" + UV_CACHE_DIR: ".cache/uv"⛔ Skipped due to learnings
Learnt from: XuehaiPan Repo: tile-ai/tilelang PR: 973 File: .github/workflows/ci.yml:13-15 Timestamp: 2025-10-10T13:29:29.347Z Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @.github/workflows/dist.yml:
- Around line 40-42: Update the inline comments on the cache env vars in the
workflow: either remove the "# to be updated" comment if XDG_CACHE_HOME,
PIP_CACHE_DIR, and UV_CACHE_DIR values are correct and final, or replace each
comment with a precise note describing what must change (e.g., target path, use
of runner temp, or reason for customization) so the intent is clear; apply this
to the XDG_CACHE_HOME, PIP_CACHE_DIR, and UV_CACHE_DIR entries in dist.yml.
♻️ Duplicate comments (2)
.github/workflows/dist.yml (2)
160-167: Critical: PyTorch nightly index bug remains unresolved.This segment still sets
UV_INDEXto the PyTorch nightly URL (line 162), which was flagged in previous reviews as triggering a known uv bug (issues #9651/#10307) causing infinite download loops. The stable builds (lines 163-166) set bothUV_INDEXandUV_TORCH_BACKEND, but previous discussion suggestsUV_TORCH_BACKENDalone may be sufficient for stable builds.Consider:
- For nightly builds: Pin a specific torch wheel URL or use an alternative installation method to avoid the nightly index bug
- For stable builds: Evaluate whether
UV_INDEXis needed whenUV_TORCH_BACKENDis set, as the backend selection may be sufficient
195-219: Critical: Test still usesuv run --no-projectwhich bypasses the installed wheel.The test creates and activates
test-venv, installs the wheel into it (line 209), but then usesuv run --no-project(line 213) to run the import test. The--no-projectflag creates an ephemeral environment that bypasses the venv and the installed wheel, defeating the purpose of the validation.The fix from previous reviews remains valid: invoke the venv's Python interpreter directly instead of using
uv run --no-project.🔎 Recommended fix using direct venv Python
- name: Test built wheels run: | + set -e for WHEEL in wheelhouse/*.whl; do echo "Testing wheel: ${WHEEL}" - ( - set -e - uv venv --python=3.12 test-venv - source test-venv/bin/activate - - uv pip install --upgrade pip setuptools wheel - if [[ "${UV_INDEX}" == *"/nightly/"* ]]; then - uv pip install --prerelease=allow -v torch - fi - - uv pip install -v "${WHEEL}" - ( - set -e - cd / - uv run --no-project -- python -c "import tilelang; print(tilelang.__version__)" - ) - deactivate - rm -rf test-venv - ) + uv venv --python=3.12 test-venv + uv pip install --python test-venv --upgrade pip setuptools wheel + if [[ "${UV_INDEX}" == *"/nightly/"* ]]; then + uv pip install --python test-venv --prerelease=allow -v torch + fi + uv pip install --python test-venv -v "${WHEEL}" + # Test from root to ensure we're not importing from source + (cd / && "${GITHUB_WORKSPACE}/test-venv/bin/python" -c "import tilelang; print(tilelang.__version__)") + rm -rf test-venv doneThis eliminates venv activation/deactivation, uses
--python test-venvfor installs, and invokes the venv's Python directly.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/dist.yml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-24T17:20:32.819Z
Learnt from: clouds56
Repo: tile-ai/tilelang PR: 1527
File: tilelang/env.py:0-0
Timestamp: 2025-12-24T17:20:32.819Z
Learning: The nvidia-cuda-nvcc PyPI package installs to `nvidia/cu13/bin/` (for CUDA 13), `nvidia/cu12/bin/` (for CUDA 12), and `nvidia/cu11/bin/` (for CUDA 11) in the site-packages directory, not to `nvidia/cuda_nvcc/bin/`. These paths should be used when detecting CUDA installations from PyPI packages in tilelang/env.py.
Applied to files:
.github/workflows/dist.yml
📚 Learning: 2025-12-15T08:56:23.827Z
Learnt from: cherichy
Repo: tile-ai/tilelang PR: 1421
File: tilelang/contrib/cutedsl/reduce.py:161-184
Timestamp: 2025-12-15T08:56:23.827Z
Learning: In Tilelang's CUDA backend and CuTeDSL backend, barrier IDs 1 and 2 are reserved for internal use (such as in AllReduce operations). User-defined barriers should use IDs starting from 3 to avoid synchronization conflicts.
Applied to files:
.github/workflows/dist.yml
📚 Learning: 2025-10-10T13:29:29.347Z
Learnt from: XuehaiPan
Repo: tile-ai/tilelang PR: 973
File: .github/workflows/ci.yml:13-15
Timestamp: 2025-10-10T13:29:29.347Z
Learning: In .github/workflows/ci.yml for tilelang (GitHub Actions), actions/cachev4 and setup-python’s cache feature require GITHUB_TOKEN with actions: write to save caches; with a permissions block that only sets contents: read, unspecified actions permission becomes none, so caches will restore but not save.
Applied to files:
.github/workflows/dist.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Build SDist
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with CUDA-12.8
- GitHub Check: Build wheels for Python 3.9 on ubuntu-latest with Nightly-CUDA-13.0
- GitHub Check: Build wheels for Python 3.9 on ubuntu-24.04-arm with Nightly-CUDA-13.0
- GitHub Check: Build wheels for Python 3.9 on macos-latest with Metal
🔇 Additional comments (4)
.github/workflows/dist.yml (4)
66-72: LGTM!The uv setup with caching is configured correctly.
116-117: LGTM!The CUDA version update to 13.0 for nightly builds is appropriate.
178-178: LGTM!Updating from
yumtodnfis appropriate for modern RHEL/Fedora-based build containers.
188-194: LGTM!The uv setup with caching is configured correctly for the build-wheels job.
Resolves #1478
Summary by CodeRabbit
New Features
Performance / UX
Chores
CI Improvements
✏️ Tip: You can customize this high-level summary in your review settings.