preserving allocation domain transform on aliased output by jjsjann123 · Pull Request #3970 · NVIDIA/Fuser

jjsjann123 · 2025-02-26T08:19:05Z

The issue was that when we specify output alias, we need to update the (allocation domain and contiguity) of the alias to be identical to the source.

jjsjann123 · 2025-02-26T08:19:44Z

!test

github-actions · 2025-02-26T08:19:56Z

Review updated until commit 0fe131d

Description

Added selfAllocationReplay to preserve allocation domain and contiguity.
Ensured memory layout consistency for aliased outputs.
Added a Python test for non-contiguous input handling.

Changes walkthrough 📝

Relevant files

Enhancement

fusion.cpp `Preserve allocation domain in aliasing` csrc/fusion.cpp Included `transform_replay.h`. Added `selfAllocationReplay` call in `aliasOutputToInput`.	+11/-0
remove_bcast_squeeze.cpp `Preserve allocation domain in replacement` csrc/preseg_passes/remove_bcast_squeeze.cpp Included `transform_replay.h`. Added `selfAllocationReplay` call in `maybeDoReplacement`.	+9/-0
transform_replay.cpp `Implement and enhance self allocation replay` csrc/transform_replay.cpp Added `selfAllocationReplay` function. Enhanced error messages in `fullSelfReplay`.	+104/-1
transform_replay.h `Declare self allocation replay function` csrc/transform_replay.h Declared `selfAllocationReplay` function.	+6/-0

Tests

test_python_frontend.py `Add test for non-contiguous input handling` tests/python/test_python_frontend.py Added a test case for inplace update on non-contiguous inputs.	+41/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Possible Issue

Ensure that the TransformReplay::selfAllocationReplay function correctly handles all edge cases, especially when dealing with symbolic IDs and reduction IDs.

// We need to replay the allocation domain and contiguity of input on output,
// this is because these two share the data buffer and should interpret it
// identically. Technically these two don't have to be identical, but rather
// compatible.
NVF_CHECK(
    output->isA<TensorView>() && input->isA<TensorView>(),
    "aliasing output to input is only supported for TensorView");
TransformReplay::selfAllocationReplay(
    input->as<TensorView>()->domain(), output->as<TensorView>()->domain());

Code Complexity

The selfAllocationReplay function is quite complex. Consider breaking it down into smaller, more manageable functions to improve readability and maintainability.

  const TensorDomain* self,
  TensorDomain* new_self) {
FUSER_PERF_SCOPE("TransformReplay::selfAllocationReplay");

// NOTE: We could also have reduction IDs involved in transformation that
// leads to allocation domain, so technically we should have included
// reduction IDs in the replay as well. The reason that we skipped them here
// is because this function is used by `RemoveBcastSqueeze`, where we could
// have mismatch reduction IDs on the logical between `self` and
// `new_self`.
auto new_self_logical = TensorDomain::noReductions(new_self->logical());
auto self_logical = TensorDomain::noReductions(self->logical());

NVF_ERROR(
    new_self_logical.size() == self_logical.size(),
    "Invalid number of IterDomains provided: ",
    new_self_logical.size(),
    " vs ",
    self_logical.size());

// Map for replay
id_map axis_map;
{
  int64_t i = 0;
  for (IterDomain* id : self_logical) {
    // Note: we don't want to check for equal `isRFactorProduct`, since we
    // could replay Allocation of the output of a reduction to a later
    // consumer tensor, which would not have the rfactor flag on.
    IterDomain* new_id = new_self_logical[i];
    // Note: this function can be used prior to concretization, where we might
    // have unresolved symbolic ID, where the broadcast flag might mismatch.
    // We skip the check if either id or new_id is symbolic and expect a
    // correct user program.
    NVF_ERROR(
        new_id->isSymbolic() || id->isSymbolic() ||
            new_id->isBroadcast() == id->isBroadcast(),
        "Axes ",
        id,
        " and ",
        new_id,
        " do not match for self replay.");
    axis_map[id] = new_id;
    i++;
  }
}

// Replay producer dimensions.
const std::vector<IterDomain*>& self_allocation = self->maybeAllocation();
const std::vector<std::optional<bool>>& self_contiguity = self->contiguity();
const std::vector<IterDomain*>& self_allocation_no_reduction =
    TensorDomain::noReductions(self_allocation);

// we replay only non-reduction IDs. The reason is that, we might have
// non-mapping reduction IDs between self and new_self. This is used in
// `RemoveBcastSqueeze`.
ReplaySelf replay(self_allocation_no_reduction, axis_map);
std::vector<IterDomain*> new_alloc_domain;
std::vector<std::optional<bool>> new_contiguity;
new_alloc_domain.reserve(self_allocation.size());
new_contiguity.reserve(self_allocation.size());

{
  // Push back the reduction IDs that are not mapped
  for (auto id : new_self->logical()) {
    if (id->isReduction()) {
      new_alloc_domain.push_back(id);
      // NOLINTNEXTLINE (modernize-use-emplace)
      new_contiguity.push_back(std::nullopt);
    }
  }

  // Pushing the mapped IDs and corresponding contiguity flags
  for (size_t i : c10::irange(self_allocation.size())) {
    IterDomain* id = self_allocation[i];
    if (id->isReduction()) {
      continue;
    }
    auto it = replay.getReplay().find(id);
    NVF_ERROR(
        it != replay.getReplay().end(), "failed to replay IterDomain: ", id);
    // The possibility of a mismatch is when one of the IDs are symbolic. We
    // need to ensure that new_contiguity is consistent with new_alloc_domain,
    // otherwise the later setAllocationDomain would fail checks.
    if (it->second->isBroadcast() == self_contiguity[i].has_value()) {
      // whether we resolve to true or false shouldn't matter since it's going
      // to be concretized as a broadcast dimension
      new_contiguity.push_back(
          it->second->isBroadcast() ? std::nullopt
                                    : std::make_optional(true));
    } else {
      new_contiguity.push_back(self_contiguity[i]);
    }
    new_alloc_domain.push_back(it->second);
  }
}

return new_self->setAllocationDomain(new_alloc_domain, new_contiguity);

Test Coverage

Ensure that the new test case covers all relevant scenarios and edge cases, especially those involving non-contiguous inputs and aliased outputs.

# See https://github.com/NVIDIA/Fuser/issues/3957
# This test checks that our alias update keeps the output layout consistency
def test_inplace_update_on_non_contiguous_inputs(self):
    inputs = [
        torch.randn(5, dtype=torch.float32, device="cuda:0").as_strided(
            (2, 2), (1, 3)
        ),
    ]

    def fusion_func(fd: FusionDefinition) -> None:
        T0 = fd.define_tensor(
            shape=[2, 2],
            contiguity=[False, True],
            dtype=DataType.Float,
            is_cpu=False,
            stride_order=[0, 1],
        )
        S1 = fd.define_scalar(0.00000, dtype=DataType.Double)
        T2 = fd.ops.gt(T0, S1)
        S3 = fd.define_scalar(0.00000, dtype=DataType.Double)
        T4 = fd.ops.where(T2, T0, S3)
        T5 = fd.ops.cast(T4, dtype=DataType.Float)
        T6 = fd.ops.set(T5)
        fd.add_output(T6, T0)
        fd.add_output(T6)

    ref_inp = inputs[0].clone()

    # is_clonable is not supported yet, because the print out would explicitly mark output
    # with stride_order and overwrite its contiguity flag. This would violates the memory
    # layout required by the alias.
    nvf_out, _ = self.exec_nvfuser(
        fusion_func,
        inputs,
        is_clonable=False,
    )

    assert len(nvf_out) == 1
    self.assertEqual(nvf_out[0], inputs[0])
    self.assertEqual(nvf_out[0], ref_inp.relu())

csrc/ir/utils.cpp

jjsjann123 · 2025-02-26T08:24:24Z

csrc/fusion.cpp

+  NVF_CHECK(output->isA<TensorView>() && input->isA<TensorView>(),
+            "aliasing output to input is only supported for TensorView");
+  auto new_domain = TransformReplay::selfAllocationReplay(output->as<TensorView>()->domain(), input->as<TensorView>()->domain());
+  output->as<TensorView>()->setAllocationDomain(new_domain->allocation(), new_domain->contiguity());


when we are aliasing output to input, the two needs to have identical allocation domain and contiguity. (technically we only needed compatible allocation domain, but it's easier to have a fixed target).

csrc/transform_replay.cpp

jjsjann123 · 2025-02-26T20:57:37Z

!test

jjsjann123 · 2025-02-26T20:57:47Z

keeping my 🤞

jjsjann123 · 2025-02-27T08:45:22Z

!test --diff

jjsjann123 · 2025-02-27T18:44:13Z

!test --diff

jjsjann123 · 2025-02-27T19:44:41Z

!test --diff

jjsjann123 · 2025-02-27T19:48:06Z

!test --diff

jjsjann123 · 2025-02-27T19:55:53Z

csrc/transform_replay.cpp

      new_self_root->contiguity());
 }

+TensorDomain* TransformReplay::selfAllocationReplay(


I tagged @wujingyue because I think you asked for something similar to this?

Though I'm not totally sure if this does what you want, because I'm addressing some requirement from remove_bcast_squeeze, which I think we can also refactor this one and put some logic on the user side, if we needed a more generic replay later.

wujingyue

LGTM overall -- there are some details that I want to make sure I understand to have a good mental model

csrc/transform_replay.h

csrc/transform_replay.cpp

wujingyue · 2025-02-27T21:03:49Z

csrc/transform_replay.cpp

      new_self_root->contiguity());
 }

+TensorDomain* TransformReplay::selfAllocationReplay(


Note: this can probably be consolidated with the existing

Fuser/csrc/transform_replay.cpp

Line 1220 in 7524c9f

TensorDomain* fullReplay(

and #3950

csrc/transform_replay.cpp

naoyam · 2025-02-28T04:29:03Z

csrc/preseg_passes/remove_bcast_squeeze.cpp

    }
  }

+  // If we are replacing an output, we need to preserve the memory layout by


Can we do this check in Fusion::replaceOutput?

That check needs to be opt-in. We have places where we are explicitly altering memory layout. i.e. mark alias prepare and sharding propagation that would intentionally change the memory layout.

But it would be nice to have a check on that.

So the check would actually be checking for compatible transforms. i.e. the two transforms might not necessarily be the same, but they should be equivalent... This feels like something @wujingyue and @zasdfgbnm were discussing together.

Opened this to track #3994

…location transform

jjsjann123 · 2025-02-28T20:22:21Z

!test --diff

jjsjann123 · 2025-03-03T17:52:05Z

I'm surprised seeing a segfault with the python_bc_advisory thing. I didn't change any python API. No idea where that segfault is coming from.

jjsjann123 · 2025-03-03T17:55:14Z

!test --diff

csrc/transform_replay.cpp

wujingyue · 2025-03-03T22:41:30Z

csrc/transform_replay.cpp

+      NVF_ERROR(
+          it != replay.getReplay().end(),
+          "Error during replay, didn't replay an axis.");
+      if (it->second->isBroadcast() == self_contiguity[i].has_value()) {


This feels like a bug or a misuse. Is there a good use case for is-broadcast to be inconsistent between self and new_self?

This is the same reason with the comment above about symbolic IDs. If we are dealing with symbolic ID, we just need to make sure that the contiguity flag on the given ID is consistent with itself.

Later during concretization, it will fix the consistency globally.

csrc/transform_replay.cpp

Co-authored-by: Jingyue Wu <wujingyue@gmail.com>

jjsjann123 · 2025-03-04T19:56:27Z

!test --diff

jjsjann123 · 2025-03-05T02:21:40Z

Failure doesn't seem to be relevant. We also have all test cleared in 4e6d81e.

I'm merging this PR as-is.

jjsjann123 added 3 commits February 25, 2025 22:11

allocation replay

10982aa

fixing the hiding part

ab222b7

typo

98be94f

jjsjann123 changed the title ~~Fix 3957~~ preserving allocation domain transform on aliased output Feb 26, 2025

jjsjann123 commented Feb 26, 2025

View reviewed changes

csrc/ir/utils.cpp Outdated Show resolved Hide resolved

jjsjann123 commented Feb 26, 2025

View reviewed changes

csrc/transform_replay.cpp Show resolved Hide resolved

jjsjann123 mentioned this pull request Feb 26, 2025

Stride mismatch error with correct contiguity info #3957

Closed

jjsjann123 added 8 commits February 26, 2025 09:53

fixing reduction assert

e8f75a9

errrr

b0920ec

fixing maybeallocation usage

78839ce

fixing concretization issue

2e9a68c

fixing broadcast/contiguity in allocation replay

c9a01f5

fixing remove_bcast_squeeze allocation replay

29dcddf

missing include

376365f

clangformat

bda2c9d

jjsjann123 and others added 2 commits February 27, 2025 00:42

fixing false positive check

3869a12

Merge branch 'main' into fix_3957

651d183

jjsjann123 added 5 commits February 27, 2025 09:39

how did this even work in the first place :shrug

82c0849

err

5b25d8c

fixing reduction things missed in allocation

988af1d

signed comparison

90c64a3

err reduction should have null on contiguity

80b9ae6

clangtidy

576c72d

jjsjann123 added 2 commits February 27, 2025 11:43

black

6256831

Merge remote-tracking branch 'origin/fix_3957' into fix_3957

6df4f08

updating test to have non-trivial stride order

987f6fe

jjsjann123 commented Feb 27, 2025

View reviewed changes

wujingyue reviewed Feb 27, 2025

View reviewed changes

naoyam reviewed Feb 28, 2025

View reviewed changes

jjsjann123 added 4 commits February 28, 2025 09:21

Merge branch 'main' into fix_3957

650db5f

some quick API change based on review comment

e8c2bbd

refactor per review comment on handling of reduction IDs in replay al…

babc433

…location transform

oops

ddff3c5

jjsjann123 mentioned this pull request Feb 28, 2025

checking for compatible allocation domain on Fusion::replaceOutput #3994

Open

fixing replay pass by reference lifetime issue

5a002f0

jjsjann123 requested a review from wujingyue February 28, 2025 20:21

Merge branch 'main' into fix_3957

e62006c

Merge remote-tracking branch 'origin/main' into jjsjann123/WIP

4e6d81e

wujingyue approved these changes Mar 3, 2025

View reviewed changes

jjsjann123 and others added 4 commits March 4, 2025 10:26

Apply suggestions from code review

d7272f2

Co-authored-by: Jingyue Wu <wujingyue@gmail.com>

review comment

28ecea1

Merge remote-tracking branch 'origin/main' into HEAD

37a051a

adding comments and minor code changes per review comments

0fe131d

jjsjann123 merged commit 37bc4fa into main Mar 5, 2025
55 of 59 checks passed

jjsjann123 deleted the fix_3957 branch March 5, 2025 02:21

jjsjann123 mentioned this pull request Mar 6, 2025

fixing wrong stride retrieved from fusion #4028

Merged

Conversation

jjsjann123 commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjsjann123 commented Feb 26, 2025

Uh oh!

github-actions bot commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

Uh oh!

jjsjann123 Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jjsjann123 commented Feb 26, 2025

Uh oh!

jjsjann123 commented Feb 26, 2025

Uh oh!

jjsjann123 commented Feb 27, 2025

Uh oh!

jjsjann123 commented Feb 27, 2025

Uh oh!

jjsjann123 commented Feb 27, 2025

Uh oh!

jjsjann123 commented Feb 27, 2025

Uh oh!

jjsjann123 Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

wujingyue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wujingyue Feb 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

naoyam Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 commented Feb 28, 2025

Uh oh!

jjsjann123 commented Mar 3, 2025

Uh oh!

jjsjann123 commented Mar 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wujingyue Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Mar 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jjsjann123 commented Mar 4, 2025

Uh oh!

jjsjann123 commented Mar 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

jjsjann123 commented Feb 26, 2025 •

edited

Loading

github-actions bot commented Feb 26, 2025 •

edited

Loading

jjsjann123 Feb 26, 2025 •

edited

Loading