Conversation
Previously, we only traversed all producers of the loop domain in IterVisitor::traverseBetween. That is a problem in cases where we schedule like producer of a reshape, or in exotic cases like #5345 where the domains are disconnected. This PR ensures that we traverse every ID in the TensorDomain regardless of the relations between the domains contained within. Note that it calls TensorDomain::allIDs when getting the "next" statements, which will do a redundant topological sort.
|
Review updated until commit 3b03346 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
|
!test --diff |
|
!test --diff |
| } | ||
|
|
||
| std::vector<IterDomain*> TensorDomain::allIDs() const { | ||
| const std::vector<const std::vector<IterDomain*>*> all_domains = allDomains(); |
There was a problem hiding this comment.
I factored this out into another method so we can avoid running getExprsBetween which can trigger an infinite recursion if we call allIDs from IterVisitor also.
| next_stmts_.insert( | ||
| next_stmts_.end(), stmt->loop().begin(), stmt->loop().end()); | ||
| for (const std::vector<IterDomain*>* dom : stmt->allDomains()) { | ||
| next_stmts_.insert(next_stmts_.end(), dom->begin(), dom->end()); |
|
The change looks good, but why so many test failures? |
|
!test |
Looks like all failures were due to CI infra issues past couple days. |
## Stacked PRs #5230 moe layer with nvfp4 grouped_mm #5345 exposing layout op at direct python binding <-- this PR #5198 refactor number of groups in layout op #5174 allow layout op in automatic scheduler ## This PR Expose layout op at python direct binding. Added nvfp4 grouped gemm in python test. Minor fixes: 1. ~Added support of allocation domain for output of layout op in concretization pass to maintain the dependency on padded allocation domain to its logical domain.~ No longer needed, handled in #5384 2. Skipped validation for `setAllocationDomain` 3. updated reference implementation to match the math order in nvfuser decomposed nvfp4 quantization. TODO: python tests requires IdModel Indexer in order to work. See issue #5200, as well as suggested WAR in #5200 (comment)
This is split off from #5345, so I don't have a specific repro for any incorrect behavior that this fixes. Previously, we only traversed all producers of the loop domain in IterVisitor::traverseBetween. That is a problem in cases where we schedule like producer of a reshape, or in exotic cases like #5345 where the domains are disconnected. This PR ensures that we traverse every ID in the TensorDomain regardless of the relations between the domains contained within. --------- Co-authored-by: jjsjann123 <jiej@nvidia.com>
## Stacked PRs #5230 moe layer with nvfp4 grouped_mm #5345 exposing layout op at direct python binding <-- this PR #5198 refactor number of groups in layout op #5174 allow layout op in automatic scheduler ## This PR Expose layout op at python direct binding. Added nvfp4 grouped gemm in python test. Minor fixes: 1. ~Added support of allocation domain for output of layout op in concretization pass to maintain the dependency on padded allocation domain to its logical domain.~ No longer needed, handled in #5384 2. Skipped validation for `setAllocationDomain` 3. updated reference implementation to match the math order in nvfuser decomposed nvfp4 quantization. TODO: python tests requires IdModel Indexer in order to work. See issue #5200, as well as suggested WAR in #5200 (comment)
This is split off from #5345, so I don't have a specific repro for any incorrect behavior that this fixes.
Previously, we only traversed all producers of the loop domain in IterVisitor::traverseBetween. That is a problem in cases where we schedule like producer of a reshape, or in exotic cases like #5345 where the domains are disconnected.
This PR ensures that we traverse every ID in the TensorDomain regardless of the relations between the domains contained within.