-
Notifications
You must be signed in to change notification settings - Fork 383
[BugFix] Fix some bugs in lowering ParallelOp and VectorizeLoop #1607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
📝 WalkthroughWalkthroughThis PR extends buffer scope handling by introducing an Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @tilelang/engine/phase.py:
- Around line 171-172: The unconditional debug prints in LowerAndLegalize (the
print(mod) and tilelang.analysis.ASTPrinter()(mod) calls) should be removed or
wrapped in the same config check used earlier; guard them with
should_enable_ast_print() so they only run when debugging is enabled, e.g., call
should_enable_ast_print() before invoking print(mod) or ASTPrinter; ensure you
reference the ASTPrinter class (tilelang.analysis.ASTPrinter) and the
LowerAndLegalize function when making the change.
🧹 Nitpick comments (2)
testing/python/issue/test_tilelang_issue_1549.py (1)
26-28: Remove unused kernel creation at line 26.Line 26 creates a kernel that is immediately overwritten by line 28. This is unnecessary and potentially confusing.
🔎 Proposed fix
- kernel = get_wrong_kernel() M = 2048 kernel = get_wrong_kernel(M)src/transform/loop_vectorize.cc (1)
193-193: Minor: Fix grammatical issues in comments.
- Line 193: "Specially" → "Specifically" or "In particular"
- Line 207: "indices is invariant" → "indices are invariant"
🔎 Proposed fixes
- // Specially, if it's a BufferStore, we should not vectorize it. + // Specifically, if it's a BufferStore, we should not vectorize it. if (is_store) { vector_size_ = 1; }} else if (is_store) { - // If the indices is invariant for BufferStore, we should also not + // If the indices are invariant for BufferStore, we should also not // vectorize it. vector_size_ = 1; }Also applies to: 207-208
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/op/utils.hsrc/transform/loop_vectorize.ccsrc/transform/lower_tile_op.cctesting/python/issue/test_tilelang_issue_1549.pytilelang/engine/phase.py
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-18T04:50:00.512Z
Learnt from: silentCoder-dev
Repo: tile-ai/tilelang PR: 1464
File: testing/python/language/test_tilelang_language_rand.py:14-14
Timestamp: 2025-12-18T04:50:00.512Z
Learning: In `testing/python/language/test_tilelang_language_rand.py`, the TileLang kernel uses `blk_M = M` (single block) and calls `rng_rand()` four times per element to align results with the Triton implementation, which uses `blk_M = 128` (multiple blocks) and calls the RNG once per element. These differences compensate for internal RNG behavior differences between TileLang and Triton.
Applied to files:
testing/python/issue/test_tilelang_issue_1549.py
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
testing/python/issue/test_tilelang_issue_1549.py
📚 Learning: 2025-09-12T09:47:46.474Z
Learnt from: kurisu6912
Repo: tile-ai/tilelang PR: 794
File: tilelang/transform/add_bufstore_wrapper.py:30-33
Timestamp: 2025-09-12T09:47:46.474Z
Learning: In TVM's PyStmtExprMutator, visit_block_ methods typically call super().visit_block_(op) to process child nodes and update internal state, but return the original op when the block itself doesn't need transformation. The pattern `return op` is correct for blocks that serve as containers where mutations happen at deeper levels.
Applied to files:
src/transform/loop_vectorize.cc
🧬 Code graph analysis (5)
tilelang/engine/phase.py (2)
tilelang/tools/Analyzer.py (1)
analysis(209-218)tilelang/analysis/ast_printer.py (1)
ASTPrinter(79-102)
src/transform/lower_tile_op.cc (2)
src/op/utils.h (1)
IsFragmentBuffer(33-35)src/transform/common/loop_fusion_utils.h (1)
IsFragmentBuffer(70-75)
testing/python/issue/test_tilelang_issue_1549.py (1)
tilelang/language/kernel.py (2)
num_threads(221-225)threads(214-218)
src/transform/loop_vectorize.cc (2)
src/transform/atomicadd_vectorize.cc (6)
UpdateVectorSize(111-151)UpdateVectorSize(111-112)node(186-211)node(186-186)node(213-291)node(213-213)src/transform/loop_partition.cc (8)
node(44-51)node(44-44)node(52-59)node(52-52)node(174-187)node(174-174)node(237-245)node(237-237)
src/op/utils.h (1)
src/transform/legalize_safe_memory_access.cc (6)
buffer(61-69)buffer(61-61)buffer(72-115)buffer(72-73)buffer(266-271)buffer(266-266)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (4)
src/op/utils.h (1)
50-57: LGTM! Clean extension of local buffer detection.The addition of the
allow_varparameter properly extendsIsLocalBufferto optionally recognizelocal.varscope while maintaining backward compatibility through the default parameter value. This addresses the PR objective of fixing conditions that didn't account forlocal.var.src/transform/lower_tile_op.cc (1)
738-738: LGTM! Consistent update of IsLocalBuffer calls.All call sites are systematically updated to pass
truefor theallow_varparameter, ensuring that bothlocalandlocal.varscopes are recognized in:
- Store-into-local detection (line 738)
- Local-register-only analysis (lines 751, 755)
- Non-local buffer detection (lines 769-770, 774-775)
This properly addresses the PR objective of fixing IsLocal conditions that didn't account for
local.var.Also applies to: 751-751, 755-755, 769-770, 774-775
src/transform/loop_vectorize.cc (2)
173-174: LGTM! Correct logic to prevent invalid store vectorization.The addition of the
is_storeparameter properly addresses the PR objective: "Loops whose BufferStore indices are invariant should not be vectorized." The implementation correctly disables vectorization for stores in two scenarios:
- Lines 192-197: When the element offset is independent of the loop variable (all iterations would write to the same location)
- Lines 206-210: When the offset is invariant within vector boundaries (all vector lanes would write to the same location)
Both conditions correctly identify cases where store vectorization would be semantically incorrect.
Also applies to: 191-210
132-132: LGTM! Correct differentiation between load and store paths.The call sites properly pass
falsefor loads (line 132) andtruefor stores (line 140), enabling the vectorization logic to apply different rules for each case.Also applies to: 140-140
|
@regression-perf |
Performance Regression Test ReportTriggered by: @SiriusNEO Results
Artifacts
|
#1549 exposes some bugs in current lowering ParallelOp and VectorizeLoop implementation, mainly two problems:
IsLocalconditions don't takelocal.varinto account.Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.