More precise WAR for resize vectorization by naoyam · Pull Request #4305 · NVIDIA/Fuser

naoyam · 2025-04-24T01:21:17Z

This is a follow-up to #3906, which added a WAR to #3640. While it's safe, it turned out it's just too conservative. For example, here's a concat pattern appearing in the backward of Litgpt Llama RoPE:

Inputs:
  T0_g___bfloat[bS0{1}, iS1{8}, iS2{4}, iS3{8192}, iS4{128}]
  T1_g___bfloat[bS5{1}, iS6{8}, bS7{1}, iS8{8192}, iS9{128}]
  T2_g___bfloat[bS10{1}, iS11{8}, bS12{1}, iS13{8192}, iS14{128}]
Outputs:
  T8_g___bfloat[bS43{1}, iS44{8192}, iS52{6144}rf]

%kernel_math {
T3_l___bfloat[bS15{1}, iS16{8}, iS18{6}rf, iS19{8192}, iS20{128}]
   = pad( T0_g___bfloat[bS0{1}, iS1{8}, iS2{4}, iS3{8192}, iS4{128}], {0, 0, 0, 0, 0, 2, 0, 0, 0, 0} )
i31 = 0 + 4;
T4_l___bfloat[bS21{1}, iS22{8}, iS24{( ( ( 0 + 4 ) + 1 ) + 1 )}rf, iS25{8192}, iS26{128}]
   = pad( T1_g___bfloat[bS5{1}, iS6{8}, bS7{1}, iS8{8192}, iS9{128}], {0, 0, 0, 0, i31, 1, 0, 0, 0, 0} )
i47 = i31 + 1;
T5_l___bfloat[bS27{1}, iS28{8}, iS30{( ( ( 0 + 4 ) + 1 ) + 1 )}rf, iS31{8192}, iS32{128}]
   = pad( T2_g___bfloat[bS10{1}, iS11{8}, bS12{1}, iS13{8192}, iS14{128}], {0, 0, 0, 0, i47, 0, 0, 0, 0, 0} )
T6_l___bfloat[bS33{1}, iS34{8}, iS35{6}, iS36{8192}, iS37{128}]
   = cat( T3_l___bfloat[bS15{1}, iS16{8}, iS18{6}rf, iS19{8192}, iS20{128}], T4_l___bfloat[bS21{1}, iS22{8}, iS24{( ( ( 0 + 4 ) + 1 ) + 1 )}rf, iS25{8192}, iS26{128}], T5_l___bfloat[bS27{1}, iS28{8}, iS30{( ( ( 0 + 4 ) + 1 ) + 1 )}rf, iS31{8192}, iS32{128}], 2 )
T7_l___bfloat[bS38{1}, iS41{8192}, iS39{8}, iS40{6}, iS42{128}]
   = Set.Permute( T6_l___bfloat[bS33{1}, iS34{8}, iS35{6}, iS36{8192}, iS37{128}], cache_op=Streaming )
T8_g___bfloat[bS43{1}, iS44{8192}, iS52{6144}rf] = view( T7_l___bfloat[bS38{1}, iS41{8192}, iS39{8}, iS40{6}, iS42{128}] )
} // %kernel_math

This is currently taken by the pointwise scheduler, which attempts to vectorize the innermost ID of the output (i.e., iS52{6144}). Since the resize ops of the three pad ops are reachable from iS52, the WAR of #3640 simply takes them into consideration by calculating gcd with the left and right expand factors. In this case, since there's an expand factor of 1, the resulting vectorization factor is also just 1, which is clearly not what we want. Here, while the resized ID itself is not vectorizable due to the expand factor of 1, all of the resized tensors have large enough inner IDs that should allow the maximum vectorization.

To make the WAR a little less conservative, this PR also checks if the constraint by a Resize expr may be missed by the vectorization analysis. In the above case, that should not happen as there's only one path through each of the resize-based tensor ops.

This change is still not able to eliminate false positives completely. See one of the new tests that is currently disabled.

The codediff results all seem to make sense. http://nv/eFb. Previously some of the tests did not have vectorization due to the WAR, which is relaxed in this PR and allows some vectorization.

github-actions · 2025-04-24T01:22:03Z

Review updated until commit c01a3b8

Description

Improved WAR for vectorizing through resized iter domains
Added precise check for resize expr reachability
Enhanced test cases for vectorization with resize

Changes walkthrough 📝

Relevant files

Enhancement

vectorize_helper.cpp `Enhance resize vectorization WAR` csrc/scheduler/vectorize_helper.cpp Added `CanSkipResize` class for permissive BFS traversal Updated `getResizeVectorizationFactors` to use `CanSkipResize` Improved logic to collect resize factors	+85/-64

Tests

test_resize.cpp `Add precise vectorization tests` tests/cpp/test_resize.cpp Renamed `VectorizeSliceMultiplePaths` to `VectorizeInnerSliceMultiplePaths` Added `DISABLED_VectorizeOuterSliceMultiplePaths` test Added `VectorizeOuterPad` test	+90/-1

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review Logic Error The `CanSkipResize::run` function is called with `resize` as an argument, but inside the function, `resize` is redefined as a `Resize` from `logical_id->definition()`. This could lead to incorrect behavior or segmentation faults if `logical_id->definition()` does not return a `Resize`. for (auto resize : resize_based_ops) { auto resize_out_tv = resize->output(0)->as<TensorView>(); for (const auto logical_id : resize_out_tv->getLogicalDomain()) { auto resize = dynamic_cast<Resize>(logical_id->definition()); if (resize == nullptr) { continue; } Redundant Code* The `resize` variable is defined twice in the loop, which is redundant and can be confusing. The outer `resize` should be used directly without redefining it inside the loop. for (auto resize : resize_based_ops) { auto resize_out_tv = resize->output(0)->as<TensorView>(); for (const auto logical_id : resize_out_tv->getLogicalDomain()) { auto resize = dynamic_cast<Resize>(logical_id->definition()); if (resize == nullptr) { continue; } Test Naming* The test `VectorizeInnerSliceMultiplePaths` is named similarly to `VectorizeSliceMultiplePaths`, which might cause confusion. Consider renaming it to better reflect its purpose. // one of the paths from tv6 to tv0 is considered.

naoyam · 2025-04-24T01:22:16Z

!test --diff

naoyam · 2025-04-24T18:56:31Z

!test --diff

naoyam · 2025-04-25T02:06:48Z

!test --diff

naoyam · 2025-04-25T05:11:06Z

tests/cpp/test_resize.cpp

  EXPECT_EQ(tv6->getLoopDomain().back()->extent()->evaluate(), 2);
 }

+// The current analysis is not precise enough to pass this test


In this test, tv0 is resized in two different ways. The spanning-tree based analysis is not guaranteed to correctly identify the vectorization constraint.

The WAR when applied to this case is still too conservative.

naoyam · 2025-04-25T05:23:00Z

!build

jjsjann123

LGTM, glad to see that we are addressing the earlier comment

jjsjann123 · 2025-04-25T18:46:47Z

tests/cpp/test_resize.cpp


+// Check if vectorization is properly applied even when a resized ID
+// is reachable from vectorized IDs. Pattern extracted from Litgpt
+// LLama RoPE backward.


nitpick on comment.

This is the case where it's safe to skip the additional resize check. So that means the resized ID is NOT reachable from vectorized IDs.

jjsjann123 · 2025-04-25T18:52:17Z

csrc/scheduler/vectorize_helper.cpp

    }
-    max_vec_size = std::gcd(max_vec_size, inferred_val.as<int64_t>());
+    auto inferred_val_int = inferred_val.as<int64_t>();
+    if (inferred_val_int == 0) {


this is for dynamic resize extents that would be 0?

jjsjann123 · 2025-04-25T18:58:53Z

csrc/scheduler/vectorize_helper.cpp

+    resize_in_out_groups.pushBack(graph.toGroup(resize->out()));
+    CanSkipResize bfs(graph, ref_groups, resize_in_out_groups, resize);
+    bfs.traverse();
+    return bfs.allToNodesVisited();


qq: here we are calling allToNodesVisited()? but the init function below has /*require_all_to_visited=*/false,, so we are returning true here as long as a single node is visited in the target, which I think is the right behavior.

But the function name is somewhat confusing.

No, the traversal should continue until no further progress is made. The require_all_to_visited flag means it's considered an error if not all of the to nodes were not able to reach.

Here, we just want to check all of the to nodes are reachable. It isn't an error even if not.

so we are returning true here as long as a single node is visited in the target, which I think is the right behavior.

sorry I got confused myself. This returns true indicating it's safe to skip the check. So allToNodesVisited is the proper name for the function.

Thanks for elaborating on this one.

WIP

ccee6eb

naoyam added 2 commits April 24, 2025 11:55

rollback and fix

e15b853

Merge remote-tracking branch 'origin/main' into fix_resize_vec

b1ec560

fix

380ac2d

naoyam added 2 commits April 24, 2025 21:51

cleanup

b7a229a

cleanup

c01a3b8

naoyam commented Apr 25, 2025

View reviewed changes

naoyam marked this pull request as ready for review April 25, 2025 05:11

naoyam added the rope label Apr 25, 2025

naoyam requested a review from jjsjann123 April 25, 2025 05:25

jjsjann123 approved these changes Apr 25, 2025

View reviewed changes

jjsjann123 reviewed Apr 25, 2025

View reviewed changes

naoyam merged commit 07effe8 into main Apr 25, 2025
16 checks passed

naoyam deleted the fix_resize_vec branch April 25, 2025 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More precise WAR for resize vectorization#4305

More precise WAR for resize vectorization#4305
naoyam merged 6 commits intomainfrom
fix_resize_vec

naoyam commented Apr 24, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 24, 2025 •

edited

Loading

Uh oh!

naoyam commented Apr 24, 2025

Uh oh!

naoyam commented Apr 24, 2025

Uh oh!

naoyam commented Apr 25, 2025

Uh oh!

naoyam Apr 25, 2025

Uh oh!

naoyam commented Apr 25, 2025

Uh oh!

jjsjann123 left a comment

Uh oh!

jjsjann123 Apr 25, 2025

Uh oh!

jjsjann123 Apr 25, 2025

Uh oh!

jjsjann123 Apr 25, 2025

Uh oh!

naoyam Apr 25, 2025

Uh oh!

jjsjann123 Apr 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

naoyam commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

naoyam commented Apr 24, 2025

Uh oh!

naoyam commented Apr 24, 2025

Uh oh!

naoyam commented Apr 25, 2025

Uh oh!

naoyam Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

naoyam commented Apr 25, 2025

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

naoyam Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naoyam commented Apr 24, 2025 •

edited

Loading

github-actions bot commented Apr 24, 2025 •

edited

Loading