Skip to content

skip aggressive validation check on allocation domain for vectorization#5622

Open
jjsjann123 wants to merge 8 commits intomainfrom
jj/skip_vectorization_allocation_validation
Open

skip aggressive validation check on allocation domain for vectorization#5622
jjsjann123 wants to merge 8 commits intomainfrom
jj/skip_vectorization_allocation_validation

Conversation

@jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Dec 3, 2025

Stacked PR:

PR0: #5622 skip aggressive validation check on allocation domain for vectorization <-- this one
PR1: #5184 Support Split between logical domain to allocation domain to represent padding

This PR

Vectorization validation requires that the vectorized ID projected to the innermost ID on the allocation domain, even for cases where allocation domain is ignored during allocation pass (e.g. cache allocated on local tensor). This PR skips the validation check for the case above, in order to allow the behavior change on cacheBefore, where we no longer replay the entire allocation domain on output to cache.

Changes in this PR:

  1. add early return in allocation domain validation, when allocation domain is ignored.
  2. lift canUsePresetAllocationDomain from allocatiion.cpp to utils.cpp, allowing code re-use in validation.cpp.

@jjsjann123
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Review updated until commit 6a05cd0

Description

  • Skip vectorization validation when allocation domain is ignored during allocation pass

  • Move canUsePresetAllocationDomain function from allocation.cpp to utils.cpp for code reuse

  • Add early return in validation to avoid assertion on cache created by cacheBefore

  • Add test case for cacheBefore scenario with no allocation domain

Changes walkthrough

Relevant files
Refactoring
allocation.cpp
Remove canUsePresetAllocationDomain function                         

csrc/device_lower/pass/allocation.cpp

  • Removed canUsePresetAllocationDomain function (moved to utils.cpp)
  • Function was ~65 lines of logic for determining if preset allocation
    domain should be used
  • +0/-62   
    Enhancement
    utils.cpp
    Add canUsePresetAllocationDomain utility function               

    csrc/device_lower/utils.cpp

  • Added canUsePresetAllocationDomain function from allocation.cpp
  • Function determines if tensor's allocation domain should be honored
    based on memory type and usage
  • Handles global tensors, shared tensors with swizzle/bulk, TMA stores,
    MmaOps, and scatter operations
  • +61/-0   
    utils.h
    Add function declaration for canUsePresetAllocationDomain

    csrc/device_lower/utils.h

  • Added declaration for canUsePresetAllocationDomain function
  • Makes the function available for use in validation.cpp
  • +2/-0     
    Bug fix
    validation.cpp
    Skip allocation domain validation when domain is ignored 

    csrc/device_lower/validation.cpp

  • Added early return in validateAllocationVectorizedId when
    !canUsePresetAllocationDomain(tv)
  • Skip validation for TensorViews whose allocation domain is ignored
    during allocation pass
  • Added comment explaining the rationale for skipping validation
  • +11/-0   
    Tests
    test_vectorization.cpp
    Add test for cacheBefore vectorization validation               

    tests/cpp/test_vectorization.cpp

  • Added CacheBeforeNoAllocationDomainVectorizationValidation test case
  • Tests scenario where cache tensor has no allocation domain set (local
    tensor)
  • Validates that vectorization works correctly with cacheBefore
    operation
  • +56/-0   

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Function Implementation Consistency

    The canUsePresetAllocationDomain function was moved from allocation.cpp to utils.cpp. While the implementation appears identical, ensure that all necessary headers and dependencies are properly included in utils.cpp, and that the function signature matches exactly between declaration and definition.

    bool canUsePresetAllocationDomain(TensorView* tv) {
      if (!tv->hasAllocation()) {
        return false;
      }
      // Honor the allocation domain if the tensor is global or Hopper MMA's
      // output
      if (tv->getMemoryType() == MemoryType::Global ||
          (tv->definition() != nullptr && tv->definition()->isA<MmaOp>() &&
           isHopper(tv->definition()->as<MmaOp>()->macro()))) {
        return true;
      }
      // If it's a shared memory tensor, the set domain is likely
      // valid if Swizzle or Bulk is used. Also, if the allocation
      // domain is just a permutation of the loop domain, use the
      // set allocation domain. This seems to happen only with
      // AllocationDomainTest.TransposedIntermediate.
      if (tv->getMemoryType() == MemoryType::Shared) {
        if (std::any_of(
                tv->getAllocationDomain().begin(),
                tv->getAllocationDomain().end(),
                [](IterDomain* allocation_domain) {
                  return dynamic_cast<Swizzle*>(allocation_domain->definition()) !=
                      nullptr ||
                      allocation_domain->getParallelType() == ParallelType::Bulk;
                }) ||
            std::is_permutation(
                tv->getLoopDomain().begin(),
                tv->getLoopDomain().end(),
                tv->getAllocationDomain().begin(),
                tv->getAllocationDomain().end())) {
          return true;
        }
    
        // Honor the set allocation domain if the tensor is used by a
        // TMA store or MmaOp
        if (std::ranges::any_of(tv->uses(), [](Expr* expr) {
              return ir_utils::isCpAsyncBulkStore(expr) || expr->isA<MmaOp>();
            })) {
          return true;
        }
    
        // If a shared memory output produced by scatter has an
        // allocation domain explicitly set, it's likely to be the
        // valid allocation domain.
        if (auto def = tv->definition(); def != nullptr && def->isA<ScatterOp>()) {
          return true;
        }
      }
      return false;
    }
    Validation Logic Correctness

    The early return check if (!canUsePresetAllocationDomain(tv)) return; skips validation for tensors whose allocation domains are ignored. Verify this logic correctly identifies cases where allocation domain validation should be skipped, particularly for cache operations and local tensors.

    if (!canUsePresetAllocationDomain(tv)) {
      return;
    }

    Test failures

    • (Medium, 1) Tensor numerical mismatches in nvFuser matmul tests (HopperMatmulTest on H100)

      Test Name H100 Source
      HopperMatmulTest.HSH_NT_UseScheduler_MultipleInstructionsPerWarpTile Link

    @jjsjann123
    Copy link
    Collaborator Author

    !test

    @jjsjann123 jjsjann123 changed the title [DoNotReview] [DoNotReview] skip validation check on allocation domain for vectorization validation Dec 4, 2025
    @jjsjann123 jjsjann123 marked this pull request as ready for review December 4, 2025 16:02
    @jjsjann123
    Copy link
    Collaborator Author

    !test

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Dec 4, 2025

    Greptile Overview

    Greptile Summary

    Refactored allocation domain validation to skip checks for tensors whose allocation domains are ignored during allocation pass, enabling cacheBefore to work without replaying the entire allocation domain on cache outputs.

    Key Changes:

    • Moved canUsePresetAllocationDomain from allocation.cpp to utils.cpp for code reuse across allocation and validation passes
    • Added null check for tv->definition() to prevent potential crashes when checking MmaOp conditions
    • Added early return in validateAllocationVectorizedId to skip validation for local/cache tensors whose allocation domains are ignored
    • Added test case verifying vectorization works correctly with cache tensors that lack allocation domains

    Technical Details:
    The validation pass runs before the allocation pass, so it doesn't have access to GpuLower::current()->getAllocationInfo(tv).ids. Previously, validation would check allocation domains even for local tensors where those domains are ignored during codegen. This PR aligns validation behavior with the actual allocation behavior by using canUsePresetAllocationDomain to determine whether allocation domain validation should be performed.

    Confidence Score: 4/5

    • This PR is safe to merge with low risk once the null pointer issue is verified as fixed
    • The refactoring is well-structured with proper code reuse. The null check addition addresses a potential crash. The logic aligns validation with allocation behavior. Minor concern: the previous thread noted a null pointer issue on line 2240, but the diff shows the null check was added (tv->definition() != nullptr). Test coverage is good. Marked as "Do Not Review" suggests this may be WIP.
    • No files require special attention - the refactoring is straightforward and the null check appears to be properly added

    Important Files Changed

    File Analysis

    Filename Score Overview
    csrc/device_lower/pass/allocation.cpp 5/5 Moved canUsePresetAllocationDomain function to utils.cpp for reuse in validation
    csrc/device_lower/utils.cpp 4/5 Added canUsePresetAllocationDomain with null check for tv->definition() to prevent potential crashes
    csrc/device_lower/validation.cpp 4/5 Added early return in validateAllocationVectorizedId to skip validation for tensors whose allocation domain is ignored during allocation pass

    Sequence Diagram

    sequenceDiagram
        participant V as Vectorization Validator
        participant U as canUsePresetAllocationDomain
        participant A as Allocation Pass
        participant C as Cache Tensor (Local)
        participant O as Output Tensor (Global)
    
        Note over V: validateAllocationVectorizedId called
    
        V->>U: Check if allocation domain should be used
        U->>C: tv->hasAllocation()?
        C-->>U: true (but local tensor)
        U->>C: tv->getMemoryType()
        C-->>U: MemoryType::Local
        U-->>V: false (allocation domain ignored)
        V->>V: Early return - skip validation
        
        Note over V: For output tensor
        V->>U: Check if allocation domain should be used
        U->>O: tv->hasAllocation()?
        O-->>U: true
        U->>O: tv->getMemoryType()
        O-->>U: MemoryType::Global
        U-->>V: true (allocation domain used)
        V->>V: Validate vectorized ID on allocation domain
        
        Note over A: During allocation pass
        A->>U: canUsePresetAllocationDomain(cache)
        U-->>A: false
        A->>A: Use loop domain for cache allocation
        
        A->>U: canUsePresetAllocationDomain(output)
        U-->>A: true
        A->>A: Use preset allocation domain
    
    Loading

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    5 files reviewed, 1 comment

    Edit Code Review Agent Settings | Greptile

    for (auto tv : {tv1, cache}) {
    // NOTE: split by 2 here would trigger indexing error on codegen, caused by
    // the reorder on allocation info. See issue comment:
    // https://github.com/NVIDIA/Fuser/issues/5611#issuecomment-3604515068
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    The codegen issue referred here is described in #5611 .

    We still reorder allocation info based on getMaybeAllocationDomain(), which sets the storage in the wrong order.

    Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
    @jjsjann123 jjsjann123 changed the title [DoNotReview] skip validation check on allocation domain for vectorization validation skip aggressive validation check on allocation domain for vectorization validation Dec 4, 2025
    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    5 files reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    @jjsjann123
    Copy link
    Collaborator Author

    !test

    @jjsjann123 jjsjann123 requested a review from naoyam December 4, 2025 16:55
    @jjsjann123 jjsjann123 changed the title skip aggressive validation check on allocation domain for vectorization validation skip aggressive validation check on allocation domain for vectorization Dec 4, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant