Skip to content

refactor canUsePresetAllocationDomain#5590

Closed
jjsjann123 wants to merge 2 commits intomainfrom
jj/refactor_allocation_domain_lowering
Closed

refactor canUsePresetAllocationDomain#5590
jjsjann123 wants to merge 2 commits intomainfrom
jj/refactor_allocation_domain_lowering

Conversation

@jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Nov 25, 2025

Pull the function canUsePresetAllocationDomain from allocation pass during lowering into a util function.

The reason is to have a consistent way to check if a certain allocation domain is being ignored by codegen, which is useful for things like alias analysis, where they can safely skip the validation.

The refactored util function is used in follow up PR #5184

@jjsjann123
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Nov 25, 2025

Review updated until commit 41b077c

Description

  • Move canUsePresetAllocationDomain function from allocation lowering pass to utils

  • Make function reusable across different parts of codebase (allocation and validation)

  • Add optional ignore_empty_alloc parameter with default value true

  • Update call sites to use ir_utils::canUsePresetAllocationDomain instead of local method

Changes walkthrough

Relevant files
Enhancement
allocation.cpp
Remove local method and update function call                         

csrc/device_lower/pass/allocation.cpp

  • Remove canUsePresetAllocationDomain method from AllocationDomainSetup
    class
  • Update function call to use ir_utils::canUsePresetAllocationDomain(tv)
  • +1/-63   
    validation.cpp
    Add allocation domain validation skip logic                           

    csrc/device_lower/validation.cpp

  • Add validation skip logic using
    ir_utils::canUsePresetAllocationDomain(tv)
  • Skip allocation domain validation when domain is ignored in codegen
  • +5/-0     
    utils.cpp
    Implement utility function for allocation domain checking

    csrc/ir/utils.cpp

  • Add canUsePresetAllocationDomain utility function implementation
  • Include logic for global tensors, shared memory tensors, and special
    cases
  • Add optional ignore_empty_alloc parameter with default true
  • +53/-0   
    utils.h
    Add function declaration and documentation                             

    csrc/ir/utils.h

  • Add function declaration for canUsePresetAllocationDomain
  • Include comprehensive documentation explaining function purpose
  • Document when preset allocation domains should be used or ignored
  • +14/-0   

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Function signature compatibility

    The function signature has changed from TensorView* tv to const TensorView* tv and a new ignore_empty_alloc parameter was added. While this is a breaking change, the existing caller in allocation.cpp has been updated. Verify that all other potential callers throughout the codebase are compatible with this change.

    bool canUsePresetAllocationDomain(
        const TensorView* tv,
        bool ignore_empty_alloc) {
      if (ignore_empty_alloc && !tv->hasAllocation()) {
        return false;
      }
      // Honor the allocation domain if the tensor is global or Hopper MMA's
      // output
      if (tv->getMemoryType() == MemoryType::Global ||
          (tv->definition() != nullptr && tv->definition()->isA<MmaOp>() &&
           isHopper(tv->definition()->as<MmaOp>()->macro()))) {
        return true;
      }
      // If it's a shared memory tensor, the set domain is likely
      // valid if Swizzle or Bulk is used. Also, if the allocation
      // domain is just a permutation of the loop domain, use the
      // set allocation domain. This seems to happen only with
      // AllocationDomainTest.TransposedIntermediate.
      if (tv->getMemoryType() == MemoryType::Shared) {
        if (std::any_of(
                tv->getAllocationDomain().begin(),
                tv->getAllocationDomain().end(),
                [](IterDomain* allocation_domain) {
                  return dynamic_cast<Swizzle*>(allocation_domain->definition()) !=
                      nullptr ||
                      allocation_domain->getParallelType() == ParallelType::Bulk;
                }) ||
            std::is_permutation(
                tv->getLoopDomain().begin(),
                tv->getLoopDomain().end(),
                tv->getAllocationDomain().begin(),
                tv->getAllocationDomain().end())) {
          return true;
        }
    
        // Honor the set allocation domain if the tensor is used by a
        // TMA store or MmaOp
        if (std::ranges::any_of(tv->uses(), [](Expr* expr) {
              return ir_utils::isCpAsyncBulkStore(expr) || expr->isA<MmaOp>();
            })) {
          return true;
        }
    
        // If a shared memory output produced by scatter has an
        // allocation domain explicitly set, it's likely to be the
        // valid allocation domain.
        if (auto def = tv->definition(); def != nullptr && def->isA<ScatterOp>()) {
          return true;
        }
      }
      return false;
    }
    Validation logic correctness

    The new validation skip logic checks !ir_utils::canUsePresetAllocationDomain(tv) to skip validation. This seems correct as it skips validation when the allocation domain will be ignored by codegen, but verify this logic matches the intended behavior.

    if (!ir_utils::canUsePresetAllocationDomain(tv)) {
      return;
    }

    Test failures

    • (Medium, 1) Tensor numerical mismatches in nvFuser matmul tests

      Test Name H100 Source
      HopperMatmulTest.HSH_NT_UseScheduler_MultipleInstructionsPerWarpTile Link

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Nov 25, 2025

    Greptile Overview

    Greptile Summary

    Refactored canUsePresetAllocationDomain from a private method in AllocationDomainSetup class to a reusable utility function in ir_utils. The refactored version addresses the null pointer vulnerability from previous review by adding a null check before dereferencing tv->definition(), and introduces an optional ignore_empty_alloc parameter for flexibility. The function is now used in both the allocation pass and validation logic for consistent handling of allocation domains.

    Key changes:

    • Moved function from csrc/device_lower/pass/allocation.cpp to csrc/ir/utils.cpp with public API in csrc/ir/utils.h
    • Fixed potential null pointer dereference by checking tv->definition() != nullptr before calling isA<MmaOp>()
    • Added ignore_empty_alloc parameter (defaults to true) to control behavior when allocation domain is not set
    • Updated call site in allocation pass to use ir_utils::canUsePresetAllocationDomain(tv)
    • Added new usage in validation pass to skip validation when allocation domain will be ignored by codegen

    Confidence Score: 5/5

    • This PR is safe to merge with minimal risk
    • This is a clean refactoring that moves a function to a more appropriate location and fixes a null pointer issue from previous review. The logic is preserved with the addition of proper null checks. Both call sites maintain correct behavior.
    • No files require special attention

    Important Files Changed

    File Analysis

    Filename Score Overview
    csrc/ir/utils.cpp 5/5 Moved canUsePresetAllocationDomain from allocation.cpp; added null pointer check and ignore_empty_alloc parameter
    csrc/ir/utils.h 5/5 Added function declaration for canUsePresetAllocationDomain with comprehensive documentation
    csrc/device_lower/pass/allocation.cpp 5/5 Removed local canUsePresetAllocationDomain implementation, now calls ir_utils::canUsePresetAllocationDomain
    csrc/device_lower/validation.cpp 5/5 Added early return check using ir_utils::canUsePresetAllocationDomain to skip validation when allocation domain is ignored

    Sequence Diagram

    sequenceDiagram
        participant AllocationPass as AllocationDomainSetup
        participant ValidationPass as VectorizeValidator
        participant Utils as ir_utils::canUsePresetAllocationDomain
        participant TV as TensorView
        
        Note over AllocationPass,ValidationPass: Before: Function existed only in AllocationDomainSetup
        
        AllocationPass->>Utils: canUsePresetAllocationDomain(tv)
        Utils->>TV: hasAllocation() [if ignore_empty_alloc=true]
        TV-->>Utils: false
        Utils-->>AllocationPass: false (skip preset domain)
        
        AllocationPass->>Utils: canUsePresetAllocationDomain(tv)
        Utils->>TV: hasAllocation()
        TV-->>Utils: true
        Utils->>TV: getMemoryType()
        TV-->>Utils: Global
        Utils-->>AllocationPass: true (use preset domain)
        
        AllocationPass->>Utils: canUsePresetAllocationDomain(tv)
        Utils->>TV: hasAllocation()
        TV-->>Utils: true
        Utils->>TV: getMemoryType()
        TV-->>Utils: Shared
        Utils->>TV: definition()
        TV-->>Utils: nullptr [NULL CHECK ADDED]
        Utils-->>AllocationPass: false (skip preset domain)
        
        Note over ValidationPass,Utils: New: Validation now uses same function
        
        ValidationPass->>Utils: canUsePresetAllocationDomain(tv)
        Utils->>TV: hasAllocation()
        TV-->>Utils: false
        Utils-->>ValidationPass: false
        ValidationPass->>ValidationPass: Early return (skip validation)
    
    Loading

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    4 files reviewed, 2 comments

    Edit Code Review Agent Settings | Greptile

    // If a shared memory output produced by scatter has an
    // allocation domain explicitly set, it's likely to be the
    // valid allocation domain.
    if (auto def = tv->definition(); def != nullptr && def->isA<ScatterOp>()) {
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    logic: same null pointer issue - tv->definition() may be null

    Suggested change
    if (auto def = tv->definition(); def != nullptr && def->isA<ScatterOp>()) {
    if (auto def = tv->definition(); def != nullptr && def->isA<ScatterOp>()) {

    Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
    @jjsjann123
    Copy link
    Collaborator Author

    !test

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    4 files reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    @jjsjann123 jjsjann123 requested a review from wujingyue November 25, 2025 22:20
    Comment on lines -66 to -67
    // AliasTest.NotAllOutputAlias_Reduction has a tensor, tv6, that
    // is a Local tensor with CA position of 4 but has an allocation
    Copy link
    Collaborator

    @wujingyue wujingyue Nov 26, 2025

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This sounds like a bug/limitation of alias analysis. The fix could be as simple as letting

    bool okToRelayout(
    always treat non-global tensors as undetermined.

    How do I verify my assumption? This canUsePresetAllocationDomain workaround got merged in months ago -- it's unclear to me how to revert it.

    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    #5594 seems to pass existing tests. But I don't know how to trigger the issue that leads to canUsePresetAllocationDomain.

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I don't know how to trigger the issue that leads to canUsePresetAllocationDomain.

    I wonder if we can adopt an existing test using the bulk load and fork the loaded cache to some other output that is safe alias.

    always treat non-global tensors as undetermined.

    TL;DR: I think that's probably a safe bet for the codegen status as-is.

    If you look at this function, there's also cases where we want to respect allocation domain on shared memory buffers as well, if there's a swizzle on the transform path. I'm uncertain how that could affect the alias forwarding, i.e. Whether alias forwarding would correctly compute the output alias tensor properly handling or ignoring the allocation domain set on intermediate TVs. If alias analysis only forwards global tensors, I think it's safe to ignore anything else along the way?

    It always mess with my head since that pass runs both on unsegmented and segmented groups.

    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    If alias analysis only forwards global tensors, I think it's safe to ignore anything else along the way?

    Yes -- that's what #5594 tries to do. Is it sufficient to avoid this change? If so, is it even sufficient to avoid canUsePresetAllocationDomain entirely?

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    That one looks good to me.

    If the analysis only sticks to global tensors as anchor point, I agree that we don't need to stick with the same analysis that lowering uses for ignoring allocation domain.

    @jjsjann123
    Copy link
    Collaborator Author

    closing in favor of #5594

    @jjsjann123 jjsjann123 closed this Nov 26, 2025
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants