No alias when an expanded broadcast IterDomain is transformed. by wujingyue · Pull Request #1124 · NVIDIA/Fuser

wujingyue · 2023-10-21T05:12:43Z

This is a follow up to #1097. See code comments and unit tests for why it's needed.

This PR also changed the way we traverse Exprs in a fusion. Previously, we traverse from only fusion inputs and collect only TensorViews that alias an input. Now, we traverse each Expr and capture all local aliases even if they are not an alias of any input. This change gives us more test coverage.

wujingyue · 2023-10-21T05:17:35Z

!build

jjsjann123

LGTM.
Let's merge it since it patched bug with the expanded dimensions~~

jjsjann123 · 2023-10-23T01:03:59Z

csrc/optimization/alias_analysis.cpp

-      }
+  if (ViewOp* view = dynamic_cast<ViewOp*>(expr)) {
+    TensorView* in = dynamic_cast<TensorView*>(view->in());
+    if (in == nullptr) {


nitpick: are we expecting view op to have non-tensor IO? wondering if we should have assert or throw a warning here?

Wdyt, @naoyam ? If the I/O is always a TensorView, I'd like to see that reflected in ViewOp's API. E.g. its in() and out() should return a TensorView instead of a Val and its constructor should do the assert.

^^^yes, that's what I was asking as well.

Can ViewOp take a non-TensorView argument?

In the case of ViewOp, I believe we can change the type to TensorView, but generally speaking, TensorView inputs and outputs are replaced with kir::TensorIndex during the lowering to the Kernel IR. For ViewOp, it's replaced with LoadStoreOp as part of fusion simplifications, so there should be no ViewOp with kir::TensorIndex.

Personally, I just don't remember which Expr types may have TensorIndex too, so I just tend to do ->as<TensorView>() always or use something like ir_utils::getTvOutput(Expr*).

jjsjann123 · 2023-10-23T01:12:18Z

csrc/optimization/alias_analysis.cpp

+      {out_root.begin(), out_root.end()},
+      {out_rfactor.begin(), out_rfactor.end()});
+
+  for (const auto* transform : transforms) {


nitpick: can we put a comment here that we are being conservative here?

i.e. split expanded dimension should be fine I guess, as long as we don't merge it with other domains.
Meanwhile, I'm wondering if these analysis can be done safer through some replay. tagging @jacobhinkle for comments.

split expanded dimension should be fine I guess, as long as we don't merge it with other domains.

I agree with you, but the following test fails at the moment. I think nvFuser isn't smart enough yet to figure out the split expanded broadcast can remain stride=0. I'll file an enhancement...

TEST_F(AliasAnalysisTest, View_SplitExpandedBroadcast) { Fusion fusion; FusionGuard fg(&fusion); TensorView* in = makeContigConcreteTensor({4, 5}); fusion.addInput(in); TensorView* out = broadcast(in, {false, false, true}); out = expand( out, {IrBuilder::create<Val>(4), IrBuilder::create<Val>(5), IrBuilder::create<Val>(6)}); out = reshape(out, {4, 5, 6}, {4, 5, 2, 3}); fusion.addOutput(out); FusionExecutor fe; at::Tensor in_tensor = at::randn({4, 5}, at::dtype(at::kFloat).device(at::kCUDA, 0)); fe.compileFusion(&fusion, {in_tensor}); at::Tensor out_tensor = fe.runFusion({in_tensor})[0]; EXPECT_THAT(out_tensor.strides(), ElementsAre(5, 1, 0, 0)); }

Good catch!

wujingyue · 2023-10-23T02:40:28Z

LGTM. Let's merge it since it patched bug with the expanded dimensions~~

FWIW, there's no immediate bug. Without this PR, we only collect aliases between fusion inputs and fusion outputs via ViewOp. Things like broadcast and expand that trigger the bug are excluded for consideration.

jacobhinkle

Just a comment on the reshape analysis.

jacobhinkle · 2023-10-23T12:51:31Z

csrc/optimization/alias_analysis.cpp

+      if (expanded_broadcast_dims.count(input)) {
+        return true;
+      }


nit: why use an unordered_set here instead of just placing the id->isBroadcast() && id->hasExpandedExtent() check here?

expanded_broadcast_dims collects IterDomains from out_tv's leaf domain. For some reason, the isBroadcast bit is missing in out_tv's leaf domain and only appears in in_tv's rfactor domain. See bS7 and iS14 in the following example.

Let me know there's a better way to handle this.

%kernel { T1_l[ iS2{4}, iS3{5}, bS4{1} ] = broadcast( T0_g[ iS0{4}, iS1{5} ] ) i10 = (nvfuser_index_t)(6); T2_l[ iS5{4}, iS6{5}, bS7{( (nvfuser_index_t)(6) )} ] = expand( T1_l[ iS2{4}, iS3{5}, bS4{1} ], {4, 5, i10} ) T3_g[ iS16{40}rf, iS17{( ceilDiv(( ( 4 * 5 ) * ( (nvfuser_index_t)(6) ) ), 40) )}rf ] = view( T2_l[ iS5{4}, iS6{5}, bS7{( (nvfuser_index_t)(6) )} ] ) TransformPrinter : T0_g[ iS0{4}, iS1{5} ] root domain : (iS0{4}, iS1{5}) contiguity: t t leaf domain : (iS0{4}, iS1{5}) T1_l[ iS2{4}, iS3{5}, bS4{1} ] root domain : (iS2{4}, iS3{5}, bS4{1}) contiguity: t t n leaf domain : (iS2{4}, iS3{5}, bS4{1}) T2_l[ iS5{4}, iS6{5}, bS7{( (nvfuser_index_t)(6) )} ] root domain : (iS5{4}, iS6{5}, bS7{( (nvfuser_index_t)(6) )}) contiguity: t t n leaf domain : (iS5{4}, iS6{5}, bS7{( (nvfuser_index_t)(6) )}) T3_g[ iS16{40}rf, iS17{( ceilDiv(( ( 4 * 5 ) * ( (nvfuser_index_t)(6) ) ), 40) )}rf ] root domain : (iS11{4}rf, iS12{5}rf, iS14{( (nvfuser_index_t)(6) )}rf) Merge: iS11{4}rf and iS12{5}rf -> iS13{( 4 * 5 )}rf Merge: iS13{( 4 * 5 )}rf and iS14{( (nvfuser_index_t)(6) )}rf -> iS15{( ( 4 * 5 ) * ( (nvfuser_index_t)(6) ) )}rf Outer split: iS15{( ( 4 * 5 ) * ( (nvfuser_index_t)(6) ) )}rf by factor 40 -> iS16{40}rf, iS17{( ceilDiv(( ( 4 * 5 ) * ( (nvfuser_index_t)(6) ) ), 40) )}rf, start offset: 0, stop offset: 0 rfactor domain : (iS16{40}rf, iS17{( ceilDiv(( ( 4 * 5 ) * ( (nvfuser_index_t)(6) ) ), 40) )}rf) contiguity: t t leaf domain : (iS16{40}rf, iS17{( ceilDiv(( ( 4 * 5 ) * ( (nvfuser_index_t)(6) ) ), 40) )}rf) }

Thanks for pointing this out. We should probably fix this by making the root domain of the reshaped tesnor expanded in these cases as well. I will check for that when tackling #1126, unless @naoyam there is some reason for root IDs to be Iteration here as if they are concretizing the expanded input.

This looks strange. I'll take a look.

Fixed in #1174.

jacobhinkle · 2023-10-23T13:34:33Z

csrc/optimization/alias_analysis.cpp

+// Whether a ViewOp transforms any expanded broadcast IterDomain in the input.
+// This is a corner case in which we can't always turn `out` into an alias.
+//
+// For example, given
+//
+//   t0 = makeContigConcreteTensor({4, 5});
+//   t1 = broadcast(t0, {false, false, true});
+//   t2 = expand(t1, {4, 5, 6});
+//
+// `reshape(t2, {40, 3})` and `reshape(t2, {4, 30})` have to copy data.
+// `reshape(t2, {40, 3})` splits the expanded broadcast IterDomain (which is 6)
+// into 2 and 3 and merges the 2 with preceding IterDomains. `reshape(t2, {4,
+// 30})` merges the expanded broadcast IterDomain. However, the output of
+// `reshape(t2, {20, 6})` can simply be an alias because the expanded broadcast
+// IterDomain is forwarded not transformed.
+//
+// As a future improvement, when an expanded broadcast dimension is only split,
+// the output of the reshape can be an alias. However, nvFuser currently decides
+// to materialize the expansion, making the output not an alias (#1126).
+//
+// Obviously, this function assumes `in` and `out` are the input and output
+// TensorView of the same ViewOp.
+bool transformsExpandedBroadcastIterDomain(TensorView* in, TensorView* out) {


Let t and f denote contiguous and non-contiguous Iteration domains and let b and edenote regular broadcast and expanded broadcast to some non-unit size. The output IDs follow these rules I believe (please double check), where C denotes a required copy because the output cannot be expressed with a single stride:

Split: t -> (t, t) f -> (t, f) b -> (b, b) <- split factor must be 1 e -> (e, e), (b, e), or (e, b) depending on split factor (after #1126) -- last two are BroadcastOp Merge: o i -> o\i | t f b e ----+----------- t | t f t C f | C C f C b | t f b e e | C C e e <-- after #1126

Statuses b and e are represented in the IR with isBroadcast() and hasExpandedExtent(). You'd have to propagate contiguity t and f since that is not currently tracked at the IterDomain level but instead it is a TensorDomain property. When you encounter a C, you can stop propagating since you know you can't alias at that point.

Note that this is not quite symmetric: Merge t f can be no-copy since the output just uses the inner stride, but Merge f t consists of disjointed contiguous segments so it can't be handled with one stride parameter.

It's possible that transforms generating C outputs could be later inverted, but analyzeView should not create such sequences.

@jacobhinkle Not exactly sure what you mean by:

b -> (b, b) <- split factor must be 1

We don't enforce any restriction of splitting broadcast domains. For example, it's possible to split by, say, 4, which would create a broadcast domain of size 4. It's not marked as expanded IIRC.

We don't enforce any restriction of splitting broadcast domains.

We don't enforce it on the scheduling op split(), but in a reshape wouldn't we always be preserving numel in each SplitTransform and MergeTransform transform?

Yes, broadcast domains should be squeezed before reshape transformations, except for expanded domains

naoyam · 2023-10-23T21:33:40Z

csrc/optimization/alias_analysis.cpp

+//   t2 = expand(t1, {4, 5, 6});
+//
+// `reshape(t2, {40, 3})` and `reshape(t2, {4, 30})` have to copy data.
+// `reshape(t2, {40, 3})` splits the expanded broadcast IterDomain (which is 6)


nit: our reshape transformation always merges domains until large enough to create the first output domain, so it should look like:

merge: 4, 5 -> 20 merge: 20, 6 -> 120 outer-split: 120 -> 40, 3

Thanks! A PR is coming your way to fix that comment...

#1124 left it there accidentally.

@naoyam

As @naoyam said in #1124 (comment).

A follow-up to #1124 (comment)

wujingyue added 5 commits October 20, 2023 20:17

Go through every exprs in the fusion.

553d3db

More tests on expanded broadcast dimensions.

2831e7d

WIP. Unit tests are passing.

f14280e

Refactor.

cd96aec

Remove debugging.

bbdbb6d

wujingyue requested review from jjsjann123 and tfogal October 21, 2023 15:08

Refine tests.

c41e243

jjsjann123 approved these changes Oct 23, 2023

View reviewed changes

wujingyue mentioned this pull request Oct 23, 2023

Split/merged expanded broadcast dimensions can remain broadcast. #1126

Open

Comments.

cd9e72c

jacobhinkle reviewed Oct 23, 2023

View reviewed changes

wujingyue merged commit cdf9e07 into main Oct 23, 2023

wujingyue deleted the alias branch October 23, 2023 17:19

naoyam reviewed Oct 23, 2023

View reviewed changes

This was referenced Oct 23, 2023

Remove debug output. #1130

Merged

Fix a comment. #1131

Merged

wujingyue added a commit that referenced this pull request Oct 24, 2023

Remove debug output. (#1130)

2b0ed57

#1124 left it there accidentally.

naoyam pushed a commit that referenced this pull request Oct 24, 2023

Fix a comment. (#1131)

417f8a4

As @naoyam said in #1124 (comment).

wujingyue mentioned this pull request Oct 24, 2023

Enforce ViewOp to take and produce a TensorView. #1134

Merged

wujingyue added a commit that referenced this pull request Oct 24, 2023

Enforce ViewOp to take and produce a TensorView. (#1134)

6128fb7

A follow-up to #1124 (comment)

wujingyue added the bug label Oct 26, 2023

Conversation

wujingyue commented Oct 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wujingyue commented Oct 21, 2023

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wujingyue commented Oct 23, 2023

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jacobhinkle Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wujingyue commented Oct 21, 2023 •

edited

Loading

jacobhinkle Oct 23, 2023 •

edited

Loading

jacobhinkle Oct 23, 2023 •

edited

Loading