skip aggressive validation check on allocation domain for vectorization by jjsjann123 · Pull Request #5622 · NVIDIA/Fuser

jjsjann123 · 2025-12-03T23:58:08Z

Stacked PR:

PR0: #5622 skip aggressive validation check on allocation domain for vectorization <-- this one
PR1: #5184 Support Split between logical domain to allocation domain to represent padding

This PR

Vectorization validation requires that the vectorized ID projected to the innermost ID on the allocation domain, even for cases where allocation domain is ignored during allocation pass (e.g. cache allocated on local tensor). This PR skips the validation check for the case above, in order to allow the behavior change on cacheBefore, where we no longer replay the entire allocation domain on output to cache.

Changes in this PR:

add early return in allocation domain validation, when allocation domain is ignored.
lift canUsePresetAllocationDomain from allocatiion.cpp to utils.cpp, allowing code re-use in validation.cpp.

jjsjann123 · 2025-12-03T23:58:15Z

!test

github-actions · 2025-12-03T23:59:09Z

Review updated until commit 6a05cd0

Description

Skip vectorization validation when allocation domain is ignored during allocation pass
Move canUsePresetAllocationDomain function from allocation.cpp to utils.cpp for code reuse
Add early return in validation to avoid assertion on cache created by cacheBefore
Add test case for cacheBefore scenario with no allocation domain

Changes walkthrough

Relevant files

Refactoring

allocation.cpp `Remove canUsePresetAllocationDomain function` csrc/device_lower/pass/allocation.cpp Removed `canUsePresetAllocationDomain` function (moved to utils.cpp) Function was ~65 lines of logic for determining if preset allocation domain should be used	+0/-62

Enhancement

utils.cpp `Add canUsePresetAllocationDomain utility function` csrc/device_lower/utils.cpp Added `canUsePresetAllocationDomain` function from allocation.cpp Function determines if tensor's allocation domain should be honored based on memory type and usage Handles global tensors, shared tensors with swizzle/bulk, TMA stores, MmaOps, and scatter operations	+61/-0
utils.h `Add function declaration for canUsePresetAllocationDomain` csrc/device_lower/utils.h Added declaration for `canUsePresetAllocationDomain` function Makes the function available for use in validation.cpp	+2/-0

Bug fix

validation.cpp `Skip allocation domain validation when domain is ignored` csrc/device_lower/validation.cpp Added early return in `validateAllocationVectorizedId` when `!canUsePresetAllocationDomain(tv)` Skip validation for TensorViews whose allocation domain is ignored during allocation pass Added comment explaining the rationale for skipping validation	+11/-0

Tests

test_vectorization.cpp `Add test for cacheBefore vectorization validation` tests/cpp/test_vectorization.cpp Added `CacheBeforeNoAllocationDomainVectorizationValidation` test case Tests scenario where cache tensor has no allocation domain set (local tensor) Validates that vectorization works correctly with cacheBefore operation	+56/-0

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Function Implementation Consistency

The canUsePresetAllocationDomain function was moved from allocation.cpp to utils.cpp. While the implementation appears identical, ensure that all necessary headers and dependencies are properly included in utils.cpp, and that the function signature matches exactly between declaration and definition.

bool canUsePresetAllocationDomain(TensorView* tv) {
  if (!tv->hasAllocation()) {
    return false;
  }
  // Honor the allocation domain if the tensor is global or Hopper MMA's
  // output
  if (tv->getMemoryType() == MemoryType::Global ||
      (tv->definition() != nullptr && tv->definition()->isA<MmaOp>() &&
       isHopper(tv->definition()->as<MmaOp>()->macro()))) {
    return true;
  }
  // If it's a shared memory tensor, the set domain is likely
  // valid if Swizzle or Bulk is used. Also, if the allocation
  // domain is just a permutation of the loop domain, use the
  // set allocation domain. This seems to happen only with
  // AllocationDomainTest.TransposedIntermediate.
  if (tv->getMemoryType() == MemoryType::Shared) {
    if (std::any_of(
            tv->getAllocationDomain().begin(),
            tv->getAllocationDomain().end(),
            [](IterDomain* allocation_domain) {
              return dynamic_cast<Swizzle*>(allocation_domain->definition()) !=
                  nullptr ||
                  allocation_domain->getParallelType() == ParallelType::Bulk;
            }) ||
        std::is_permutation(
            tv->getLoopDomain().begin(),
            tv->getLoopDomain().end(),
            tv->getAllocationDomain().begin(),
            tv->getAllocationDomain().end())) {
      return true;
    }

    // Honor the set allocation domain if the tensor is used by a
    // TMA store or MmaOp
    if (std::ranges::any_of(tv->uses(), [](Expr* expr) {
          return ir_utils::isCpAsyncBulkStore(expr) || expr->isA<MmaOp>();
        })) {
      return true;
    }

    // If a shared memory output produced by scatter has an
    // allocation domain explicitly set, it's likely to be the
    // valid allocation domain.
    if (auto def = tv->definition(); def != nullptr && def->isA<ScatterOp>()) {
      return true;
    }
  }
  return false;
}

Validation Logic Correctness

The early return check if (!canUsePresetAllocationDomain(tv)) return; skips validation for tensors whose allocation domains are ignored. Verify this logic correctly identifies cases where allocation domain validation should be skipped, particularly for cache operations and local tensors.

if (!canUsePresetAllocationDomain(tv)) {
  return;
}

Test failures

(Medium, 1) Tensor numerical mismatches in nvFuser matmul tests (HopperMatmulTest on H100)

Test Name H100 Source

HopperMatmulTest.HSH_NT_UseScheduler_MultipleInstructionsPerWarpTile ❌ Link

jjsjann123 · 2025-12-04T00:07:16Z

!test

jjsjann123 · 2025-12-04T16:03:01Z

!test

greptile-apps · 2025-12-04T16:06:38Z

Greptile Overview

Greptile Summary

Refactored allocation domain validation to skip checks for tensors whose allocation domains are ignored during allocation pass, enabling cacheBefore to work without replaying the entire allocation domain on cache outputs.

Key Changes:

Moved canUsePresetAllocationDomain from allocation.cpp to utils.cpp for code reuse across allocation and validation passes
Added null check for tv->definition() to prevent potential crashes when checking MmaOp conditions
Added early return in validateAllocationVectorizedId to skip validation for local/cache tensors whose allocation domains are ignored
Added test case verifying vectorization works correctly with cache tensors that lack allocation domains

Technical Details:
The validation pass runs before the allocation pass, so it doesn't have access to GpuLower::current()->getAllocationInfo(tv).ids. Previously, validation would check allocation domains even for local tensors where those domains are ignored during codegen. This PR aligns validation behavior with the actual allocation behavior by using canUsePresetAllocationDomain to determine whether allocation domain validation should be performed.

Confidence Score: 4/5

This PR is safe to merge with low risk once the null pointer issue is verified as fixed
The refactoring is well-structured with proper code reuse. The null check addition addresses a potential crash. The logic aligns validation with allocation behavior. Minor concern: the previous thread noted a null pointer issue on line 2240, but the diff shows the null check was added (tv->definition() != nullptr). Test coverage is good. Marked as "Do Not Review" suggests this may be WIP.
No files require special attention - the refactoring is straightforward and the null check appears to be properly added

Important Files Changed

File Analysis

Filename	Score	Overview
csrc/device_lower/pass/allocation.cpp	5/5	Moved `canUsePresetAllocationDomain` function to utils.cpp for reuse in validation
csrc/device_lower/utils.cpp	4/5	Added `canUsePresetAllocationDomain` with null check for `tv->definition()` to prevent potential crashes
csrc/device_lower/validation.cpp	4/5	Added early return in `validateAllocationVectorizedId` to skip validation for tensors whose allocation domain is ignored during allocation pass

Sequence Diagram

sequenceDiagram
    participant V as Vectorization Validator
    participant U as canUsePresetAllocationDomain
    participant A as Allocation Pass
    participant C as Cache Tensor (Local)
    participant O as Output Tensor (Global)

    Note over V: validateAllocationVectorizedId called

    V->>U: Check if allocation domain should be used
    U->>C: tv->hasAllocation()?
    C-->>U: true (but local tensor)
    U->>C: tv->getMemoryType()
    C-->>U: MemoryType::Local
    U-->>V: false (allocation domain ignored)
    V->>V: Early return - skip validation
    
    Note over V: For output tensor
    V->>U: Check if allocation domain should be used
    U->>O: tv->hasAllocation()?
    O-->>U: true
    U->>O: tv->getMemoryType()
    O-->>U: MemoryType::Global
    U-->>V: true (allocation domain used)
    V->>V: Validate vectorized ID on allocation domain
    
    Note over A: During allocation pass
    A->>U: canUsePresetAllocationDomain(cache)
    U-->>A: false
    A->>A: Use loop domain for cache allocation
    
    A->>U: canUsePresetAllocationDomain(output)
    U-->>A: true
    A->>A: Use preset allocation domain

greptile-apps

_{5 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

csrc/device_lower/utils.cpp

jjsjann123 · 2025-12-04T16:16:40Z

tests/cpp/test_vectorization.cpp

+  for (auto tv : {tv1, cache}) {
+    // NOTE: split by 2 here would trigger indexing error on codegen, caused by
+    // the reorder on allocation info. See issue comment:
+    // https://github.com/NVIDIA/Fuser/issues/5611#issuecomment-3604515068


The codegen issue referred here is described in #5611 .

We still reorder allocation info based on getMaybeAllocationDomain(), which sets the storage in the wrong order.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps

_{5 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

jjsjann123 · 2025-12-04T16:55:34Z

!test

jjsjann123 added 5 commits December 3, 2025 11:39

duplicating check function

1778819

more changes

570ff2a

adding cpp test

fc16504

fixing test

49fbdb8

fixing test to avoid indexing issue

e6a3628

lintrunner

1d91388

jjsjann123 changed the title ~~[DoNotReview]~~ [DoNotReview] skip validation check on allocation domain for vectorization validation Dec 4, 2025

jjsjann123 marked this pull request as ready for review December 4, 2025 16:02

Merge branch 'main' into jj/skip_vectorization_allocation_validation

777e17e

greptile-apps bot reviewed Dec 4, 2025

View reviewed changes

csrc/device_lower/utils.cpp Show resolved Hide resolved

jjsjann123 commented Dec 4, 2025

View reviewed changes

Update csrc/device_lower/utils.cpp

6a05cd0

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

jjsjann123 changed the title ~~[DoNotReview] skip validation check on allocation domain for vectorization validation~~ skip aggressive validation check on allocation domain for vectorization validation Dec 4, 2025

greptile-apps bot reviewed Dec 4, 2025

View reviewed changes

jjsjann123 requested a review from naoyam December 4, 2025 16:55

jjsjann123 changed the title ~~skip aggressive validation check on allocation domain for vectorization validation~~ skip aggressive validation check on allocation domain for vectorization Dec 4, 2025

jjsjann123 mentioned this pull request Dec 4, 2025

Support Split between logical domain to allocation domain to represent padding #5184

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip aggressive validation check on allocation domain for vectorization#5622

skip aggressive validation check on allocation domain for vectorization#5622
jjsjann123 wants to merge 8 commits intomainfrom
jj/skip_vectorization_allocation_validation

jjsjann123 commented Dec 3, 2025 •

edited

Loading

Uh oh!

jjsjann123 commented Dec 3, 2025

Uh oh!

github-actions bot commented Dec 3, 2025 •

edited by xwang233

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

jjsjann123 commented Dec 4, 2025

Uh oh!

jjsjann123 commented Dec 4, 2025

Uh oh!

greptile-apps bot commented Dec 4, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

jjsjann123 Dec 4, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

jjsjann123 commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jjsjann123 commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Stacked PR:

This PR

Uh oh!

jjsjann123 commented Dec 3, 2025

Uh oh!

github-actions bot commented Dec 3, 2025 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

jjsjann123 commented Dec 4, 2025

Uh oh!

jjsjann123 commented Dec 4, 2025

Uh oh!

greptile-apps bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jjsjann123 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

jjsjann123 commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jjsjann123 commented Dec 3, 2025 •

edited

Loading

github-actions bot commented Dec 3, 2025 •

edited by xwang233

Loading

greptile-apps bot commented Dec 4, 2025 •

edited

Loading