skip aggressive validation check on allocation domain for vectorization#5622
skip aggressive validation check on allocation domain for vectorization#5622jjsjann123 wants to merge 8 commits intomainfrom
Conversation
|
!test |
|
Review updated until commit 6a05cd0 Description
|
| Relevant files | |||||
|---|---|---|---|---|---|
| Refactoring |
| ||||
| Enhancement |
| ||||
| Bug fix |
| ||||
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Function Implementation Consistency
canUsePresetAllocationDomain function was moved from allocation.cpp to utils.cpp. While the implementation appears identical, ensure that all necessary headers and dependencies are properly included in utils.cpp, and that the function signature matches exactly between declaration and definition. |
Test failures
-
(Medium, 1)
Tensor numerical mismatches in nvFuser matmul tests (HopperMatmulTest on H100)Test Name H100 Source HopperMatmulTest.HSH_NT_UseScheduler_MultipleInstructionsPerWarpTile ❌ Link
|
!test |
|
!test |
Greptile OverviewGreptile SummaryRefactored allocation domain validation to skip checks for tensors whose allocation domains are ignored during allocation pass, enabling Key Changes:
Technical Details: Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant V as Vectorization Validator
participant U as canUsePresetAllocationDomain
participant A as Allocation Pass
participant C as Cache Tensor (Local)
participant O as Output Tensor (Global)
Note over V: validateAllocationVectorizedId called
V->>U: Check if allocation domain should be used
U->>C: tv->hasAllocation()?
C-->>U: true (but local tensor)
U->>C: tv->getMemoryType()
C-->>U: MemoryType::Local
U-->>V: false (allocation domain ignored)
V->>V: Early return - skip validation
Note over V: For output tensor
V->>U: Check if allocation domain should be used
U->>O: tv->hasAllocation()?
O-->>U: true
U->>O: tv->getMemoryType()
O-->>U: MemoryType::Global
U-->>V: true (allocation domain used)
V->>V: Validate vectorized ID on allocation domain
Note over A: During allocation pass
A->>U: canUsePresetAllocationDomain(cache)
U-->>A: false
A->>A: Use loop domain for cache allocation
A->>U: canUsePresetAllocationDomain(output)
U-->>A: true
A->>A: Use preset allocation domain
|
| for (auto tv : {tv1, cache}) { | ||
| // NOTE: split by 2 here would trigger indexing error on codegen, caused by | ||
| // the reorder on allocation info. See issue comment: | ||
| // https://github.com/NVIDIA/Fuser/issues/5611#issuecomment-3604515068 |
There was a problem hiding this comment.
The codegen issue referred here is described in #5611 .
We still reorder allocation info based on getMaybeAllocationDomain(), which sets the storage in the wrong order.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
!test |
Stacked PR:
PR0: #5622 skip aggressive validation check on allocation domain for vectorization <-- this one
PR1: #5184 Support Split between logical domain to allocation domain to represent padding
This PR
Vectorization validation requires that the vectorized ID projected to the innermost ID on the allocation domain, even for cases where allocation domain is ignored during allocation pass (e.g. cache allocated on local tensor). This PR skips the validation check for the case above, in order to allow the behavior change on cacheBefore, where we no longer replay the entire allocation domain on output to cache.
Changes in this PR:
canUsePresetAllocationDomainfrom allocatiion.cpp to utils.cpp, allowing code re-use in validation.cpp.