Skip to content

Visit allIDs in IterVisitor#5384

Merged
jacobhinkle merged 4 commits intomainfrom
jh/iter_visitor_allids
Oct 16, 2025
Merged

Visit allIDs in IterVisitor#5384
jacobhinkle merged 4 commits intomainfrom
jh/iter_visitor_allids

Conversation

@jacobhinkle
Copy link
Collaborator

@jacobhinkle jacobhinkle commented Oct 14, 2025

This is split off from #5345, so I don't have a specific repro for any incorrect behavior that this fixes.

Previously, we only traversed all producers of the loop domain in IterVisitor::traverseBetween. That is a problem in cases where we schedule like producer of a reshape, or in exotic cases like #5345 where the domains are disconnected.

This PR ensures that we traverse every ID in the TensorDomain regardless of the relations between the domains contained within.

Previously, we only traversed all producers of the loop domain in
IterVisitor::traverseBetween. That is a problem in cases where we
schedule like producer of a reshape, or in exotic cases like #5345 where
the domains are disconnected.

This PR ensures that we traverse every ID in the TensorDomain regardless
of the relations between the domains contained within. Note that it calls
TensorDomain::allIDs when getting the "next" statements, which will do a
redundant topological sort.
@github-actions
Copy link

github-actions bot commented Oct 14, 2025

Review updated until commit 3b03346

Description

  • Fix disconnected domain traversal in IterVisitor

  • Traverse all tensor domains via allDomains()

  • Ensure complete ID coverage in scheduling

  • Avoid redundant topological sorts in allIDs


Changes walkthrough 📝

Relevant files
Enhancement
nodes.cpp
Add allDomains() and refactor allIDs()                                     

csrc/ir/nodes.cpp

  • Introduced allDomains() returning pointers to all domain vectors
  • Refactored allIDs() to use allDomains() for consistency
  • Maintained topological uniqueness via VectorOfUniqueEntries
  • Removed redundant sorting by reusing domain lists
  • +6/-1     
    internal_base_nodes.h
    Declare allDomains() in TensorDomain                                         

    csrc/ir/internal_base_nodes.h

  • Added declaration of allDomains() method
  • Kept allIDs() declaration for backward compatibility
  • +2/-0     
    Bug fix
    iter_visitor.cpp
    Traverse all domains in IterVisitor                                           

    csrc/iter_visitor.cpp

  • Modified handle(TensorDomain*) to iterate over all domains
  • Used allDomains() to collect all IterDomain pointers
  • Ensures traversal of reshape and disconnected domains
  • +3/-2     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Duplicate Function Definition

    The function allIDs is defined twice in the same file, which will lead to a compilation error. Only one definition should be retained.

    std::vector<const std::vector<IterDomain*>*> TensorDomain::allDomains() const {
      std::vector<const std::vector<IterDomain*>*> all_domains = {
          &loop_domain_,
          &logical_domain_,
          &root_domain_,
          &initial_loop_domain_,
          &allocation_domain_,
          &additional_ids_};
      if (alternate_loop_domain_.has_value()) {
        all_domains.push_back(&alternate_loop_domain_.value());
      }
      return all_domains;
    }
    
    std::vector<IterDomain*> TensorDomain::allIDs() const {
      const std::vector<const std::vector<IterDomain*>*> all_domains = allDomains();
      VectorOfUniqueEntries<IterDomain*> discovered_ids;
      for (auto domain : all_domains) {
        discovered_ids.pushBack(*domain);
      }
    Potential Performance Overhead

    The use of insert with iterators in a loop may lead to performance overhead. Consider using reserve and push_back for better performance.

    for (const std::vector<IterDomain*>* dom : stmt->allDomains()) {
      next_stmts_.insert(next_stmts_.end(), dom->begin(), dom->end());
    }

    @jacobhinkle
    Copy link
    Collaborator Author

    !test --diff

    @jacobhinkle jacobhinkle requested a review from naoyam October 14, 2025 12:56
    @jacobhinkle jacobhinkle marked this pull request as ready for review October 14, 2025 12:56
    @jacobhinkle
    Copy link
    Collaborator Author

    !test --diff

    }

    std::vector<IterDomain*> TensorDomain::allIDs() const {
    const std::vector<const std::vector<IterDomain*>*> all_domains = allDomains();
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I factored this out into another method so we can avoid running getExprsBetween which can trigger an infinite recursion if we call allIDs from IterVisitor also.

    next_stmts_.insert(
    next_stmts_.end(), stmt->loop().begin(), stmt->loop().end());
    for (const std::vector<IterDomain*>* dom : stmt->allDomains()) {
    next_stmts_.insert(next_stmts_.end(), dom->begin(), dom->end());
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    👍

    @naoyam
    Copy link
    Collaborator

    naoyam commented Oct 14, 2025

    The change looks good, but why so many test failures?

    @jjsjann123
    Copy link
    Collaborator

    !test

    Copy link
    Collaborator

    @jjsjann123 jjsjann123 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    :shipit:

    @jacobhinkle
    Copy link
    Collaborator Author

    The change looks good, but why so many test failures?

    Looks like all failures were due to CI infra issues past couple days.

    @jacobhinkle jacobhinkle merged commit a5e4aa2 into main Oct 16, 2025
    58 of 60 checks passed
    @jacobhinkle jacobhinkle deleted the jh/iter_visitor_allids branch October 16, 2025 18:22
    jjsjann123 added a commit that referenced this pull request Oct 21, 2025
    ## Stacked PRs
    
    #5230 moe layer with nvfp4 grouped_mm
    #5345 exposing layout op at direct python binding  <-- this PR
    #5198 refactor number of groups in layout op
    #5174 allow layout op in automatic scheduler
    
    ## This PR
    
    Expose layout op at python direct binding.
    Added nvfp4 grouped gemm in python test.
    
    Minor fixes:
    
    1. ~Added support of allocation domain for output of layout op in
    concretization pass to maintain the dependency on padded allocation
    domain to its logical domain.~ No longer needed, handled in
    #5384
    2. Skipped validation for `setAllocationDomain`
    3. updated reference implementation to match the math order in nvfuser
    decomposed nvfp4 quantization.
    
    TODO:
    
    python tests requires IdModel Indexer in order to work. See issue #5200,
    as well as suggested WAR in
    #5200 (comment)
    tbqh pushed a commit that referenced this pull request Nov 12, 2025
    This is split off from #5345, so I don't have a specific repro for any
    incorrect behavior that this fixes.
    
    Previously, we only traversed all producers of the loop domain in
    IterVisitor::traverseBetween. That is a problem in cases where we
    schedule like producer of a reshape, or in exotic cases like #5345 where
    the domains are disconnected.
    
    This PR ensures that we traverse every ID in the TensorDomain regardless
    of the relations between the domains contained within.
    
    ---------
    
    Co-authored-by: jjsjann123 <jiej@nvidia.com>
    tbqh pushed a commit that referenced this pull request Nov 12, 2025
    ## Stacked PRs
    
    #5230 moe layer with nvfp4 grouped_mm
    #5345 exposing layout op at direct python binding  <-- this PR
    #5198 refactor number of groups in layout op
    #5174 allow layout op in automatic scheduler
    
    ## This PR
    
    Expose layout op at python direct binding.
    Added nvfp4 grouped gemm in python test.
    
    Minor fixes:
    
    1. ~Added support of allocation domain for output of layout op in
    concretization pass to maintain the dependency on padded allocation
    domain to its logical domain.~ No longer needed, handled in
    #5384
    2. Skipped validation for `setAllocationDomain`
    3. updated reference implementation to match the math order in nvfuser
    decomposed nvfp4 quantization.
    
    TODO:
    
    python tests requires IdModel Indexer in order to work. See issue #5200,
    as well as suggested WAR in
    #5200 (comment)
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants