Skip to content

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Dec 26, 2025

This pull request improves the handling of local variable buffers in the fill operation by introducing a new utility function and updating the logic to support an additional buffer scope. The main focus is to ensure that buffers with the "local.var" scope are processed similarly to those with the "local" scope.

Buffer scope handling enhancements:

  • Added the IsLocalVarBuffer function to utils.h to check if a buffer has the "local.var" scope.
  • Updated the fill operation in fill.cc to treat buffers with the "local.var" scope the same as those with the "local" scope by using the new IsLocalVarBuffer check.

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of local variable buffers in fill operations so they receive the same optimization path as standard local buffers.
  • API

    • Added recognition for an additional local variable buffer scope to ensure consistent behavior.
  • Examples

    • Updated example implementation to use local array storage for non-zero counting to match runtime expectations.

✏️ Tip: You can customize this high-level summary in your review settings.

…IsLocalBuffer function to check only for "local" scope.
…dling

- Introduced IsLocalVarBuffer function to identify local variable buffers.
- Updated FillNode::Lower to handle both local and local variable buffers in the vectorized thread loop logic.
@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 26, 2025

📝 Walkthrough

Walkthrough

Treats buffers with scope "local.var" like existing local buffers by adding IsLocalVarBuffer and extending FillNode::Lower to build a SIMT loop and vectorize when destination is a local variable buffer.

Changes

Cohort / File(s) Summary
Local variable buffer check
src/op/utils.h
Added inline bool IsLocalVarBuffer(const Buffer &buffer) that returns true when buffer->scope == "local.var".
Fill lowering change
src/op/fill.cc
Broadened condition from IsLocalBuffer(dst) to `IsLocalBuffer(dst)
Example: local storage adjustment
examples/gemm_sp/example_custom_compress.py
Replaced dynamic/shared scalar allocations with fixed local arrays for non_zero_cnt and non_zero_elt_log_idx, updating usages to index into arrays (e.g., non_zero_cnt[0]).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through code with tiny paws,
Found "local.var" and fixed the laws,
Now fills loop swift and vector-keen,
Local kin together, neat and clean. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing the T.Fill operation to handle local.var buffers, which directly aligns with the primary objective of the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d402fc and 59f02ae.

📒 Files selected for processing (1)
  • examples/gemm_sp/example_custom_compress.py
🧰 Additional context used
🧬 Code graph analysis (1)
examples/gemm_sp/example_custom_compress.py (1)
tilelang/language/allocate.py (1)
  • alloc_local (58-69)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test for Python 3.12 with CUDA-12.8 (on self-hosted-nvidia)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
🔇 Additional comments (3)
examples/gemm_sp/example_custom_compress.py (3)

260-262: LGTM! Addresses the buggy alloc_var issue.

The conversion from scalar variables to local arrays with explicit shapes (1-element for counter, elem-element for indices) addresses the TODO comment about alloc_var being buggy. This change aligns with the PR's goal of fixing local.var buffer handling in T.Fill operations.


267-276: Verify consistent array indexing in compression logic.

All uses of non_zero_cnt and non_zero_elt_log_idx have been correctly updated to use array indexing. The logic properly tracks non-zero elements and their positions.

Note: Line 273 could potentially cause out-of-bounds access if non_zero_cnt[0] exceeds elem - 1 (i.e., more than 2 non-zero elements in a group, violating 2:4 sparsity). The TODO comment on line 276 acknowledges that a device_assert should be added to guard this condition.


277-284: LGTM! Conditional logic correctly updated.

The special-case handling for single non-zero elements correctly uses array indexing for both non_zero_cnt[0] and non_zero_elt_log_idx[0]. The logic matches the metadata encoding requirements described in the comments above.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/op/fill.cc (1)

140-157: Update documentation to reflect local.var buffer handling.

The function documentation describes handling for "local" buffers (line 149) but doesn't mention "local.var" buffers, which are now also supported on line 177.

🔎 Proposed documentation update
  * Lowers the FillNode into a Stmt according to the destination buffer scope:
  * - "local.fragment" and shared ("shared", "shared.dyn"): create a parallel
  *   operation from a SIMT loop, infer its layout, partition the root loop by
  *   the thread variable, vectorize the resulting thread loop, and, if a
  *   per-thread predicate exists, guard the vectorized loop with that
  *   predicate.
- * - "local": build a SIMT loop and return its vectorized form.
+ * - "local" and "local.var": build a SIMT loop and return its vectorized form.
  * - other scopes: fatal error.
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b58ed0 and 5d402fc.

📒 Files selected for processing (2)
  • src/op/fill.cc
  • src/op/utils.h
🧰 Additional context used
🧬 Code graph analysis (2)
src/op/fill.cc (1)
src/op/utils.h (2)
  • IsLocalBuffer (50-52)
  • IsLocalVarBuffer (54-56)
src/op/utils.h (1)
src/transform/legalize_safe_memory_access.cc (6)
  • buffer (61-69)
  • buffer (61-61)
  • buffer (72-115)
  • buffer (72-73)
  • buffer (266-271)
  • buffer (266-266)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with Nightly-ROCm-7.1 (on self-hosted-amd)
🔇 Additional comments (1)
src/op/utils.h (1)

54-56: LGTM! Clean utility function following established patterns.

The new IsLocalVarBuffer function correctly mirrors the existing buffer scope checkers and will enable proper handling of "local.var" buffers throughout the codebase.

Comment on lines +177 to 180
} else if (IsLocalBuffer(dst) || IsLocalVarBuffer(dst)) {
auto init_loop = MakeSIMTLoop(analyzer);
auto vectorized_thread_loop = VectorizeLoop(init_loop, analyzer);
return vectorized_thread_loop;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for IsLocalBuffer usage throughout the codebase
rg -nP '\bIsLocalBuffer\s*\(' --type=cpp -C3

Repository: tile-ai/tilelang

Length of output: 3716


🏁 Script executed:

#!/bin/bash
# Search for IsLocalVarBuffer definition and usage
rg -nP '\bIsLocalVarBuffer\s*\(' --type=cpp -C3

Repository: tile-ai/tilelang

Length of output: 796


🏁 Script executed:

#!/bin/bash
# Look at copy.cc around the IsLocalBuffer checks to understand the context
sed -n '770,790p' src/op/copy.cc

Repository: tile-ai/tilelang

Length of output: 944


Update copy operations and layout inference to handle local.var buffers consistently.

The fill.cc change correctly extends local buffer handling to include "local.var" buffers by checking IsLocalBuffer(dst) || IsLocalVarBuffer(dst). However, other operations in the codebase do not follow this same pattern:

  • src/op/copy.cc (lines 779-780) checks only IsLocalBuffer() for both source and destination buffers without considering IsLocalVarBuffer()
  • src/transform/layout_inference.cc has multiple IsLocalBuffer() checks (lines 1268, 1288, 1292, 1313, 1318) that similarly lack IsLocalVarBuffer() checks

These operations should be updated to treat "local.var" buffers the same as "local" buffers where appropriate, consistent with the fill.cc changes.

🤖 Prompt for AI Agents
In src/op/fill.cc lines 177-180 you expanded handling to treat local.var like
local buffers; mirror that change in src/op/copy.cc (around lines 779-780) and
src/transform/layout_inference.cc (around lines 1268, 1288, 1292, 1313, 1318) by
updating all IsLocalBuffer(...) checks to also accept IsLocalVarBuffer(...).
Replace patterns that only call IsLocalBuffer(x) with a combined predicate
(IsLocalBuffer(x) || IsLocalVarBuffer(x)) for both source and destination
checks, ensuring logic and short-circuit behavior remain the same and update any
related comments/tests if present.

… local variable handling

- Changed allocation of non_zero_cnt and non_zero_elt_log_idx from shared to local variables.
- Updated logic to correctly reference the first element of non_zero_cnt for counting non-zero elements.
- Adjusted conditions to use the updated local variable references.
@LeiWang1999 LeiWang1999 merged commit 875b42f into tile-ai:main Dec 26, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant