Skip to content

TransformReplay::selfReplay replays contiguity#5316

Open
wujingyue wants to merge 21 commits intomainfrom
wjy/replay
Open

TransformReplay::selfReplay replays contiguity#5316
wujingyue wants to merge 21 commits intomainfrom
wjy/replay

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Oct 3, 2025

Fixes bugs like #5356

We should probably add some knobs so callers decide what to replay, in a separate PR. So far, this function tries to replay everything (namely, loop, allocation and contiguity), which seems to work fine.

@wujingyue
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Oct 3, 2025

Review updated until commit 129e9bb

Description

  • Fix contiguity replay in TransformReplay::selfReplay

  • Improve error handling and validation in replay logic

  • Refactor selfReplay to handle loop and allocation domains robustly

  • Add test cases for contiguity and aliasing behavior


Changes walkthrough 📝

Relevant files
Bug fix
nodes.cpp
Improve contiguity validation                                                       

csrc/ir/nodes.cpp

  • Replace NVF_CHECK with NVF_CHECK_EQ for clearer error messages
  • Validate contiguity vector size against allocation domain size
  • +3/-6     
    Enhancement
    transform_replay.cpp
    Refactor and fix selfReplay logic                                               

    csrc/transform_replay.cpp

  • Refactor selfReplay to use a lambda for creating IterDomainMap
  • Handle loop and allocation replay with proper reduction ID management
  • Add strict checks for empty transform sequences on non-empty domains
  • Correctly propagate contiguity during symbolic-to-concrete ID mapping
  • +143/-85
    Tests
    test_alias.cpp
    Add test for slice accumulation aliasing                                 

    tests/cpp/test_alias.cpp

  • Add AccumulateSlices test for aliasing with symbolic dimensions
  • Include ATen ops for tensor creation and comparison
  • Update variable declarations to use auto
  • +55/-25 
    test_replay.cpp
    Test contiguity replay with empty allocation                         

    tests/cpp/test_replay.cpp

  • Add ContiguityWithEmptyAllocation test case
  • Use IsFalse matcher for contiguity validation
  • Verify contiguity propagation when allocation is empty
  • +16/-0   

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review

    Contiguity Replay Logic

    The new code in selfReplay handles contiguity replay when the allocation domain is empty, but the logic for mapping contiguity flags during symbolic-to-concrete ID transitions may not fully preserve expected invariants, especially around broadcast and symbolic dimensions.

    } else {
      NVF_ERROR(
          !new_self->hasAllocation(),
          "It is unclear what the correct contract should be when replaying an "
          "empty transform sequence on a non-empty allocation domain. "
          "Fortunately, we do not have a use case for this scenario.");
      const std::vector<IterDomain*>& new_logical = new_self->logical();
      const auto new_rank = std::ssize(new_logical);
      std::vector<std::optional<bool>> new_contiguities(new_rank, std::nullopt);
    
      int new_pos = 0;
      for (auto [id, contiguity] : zip(self->logical(), self->contiguity())) {
        IterDomain* new_id = getOrDefault(replay, id);
        if (new_id == nullptr) {
          continue;
        }
    
        // Find the corresponding contiguity in new_logical. Mapped IterDomains
        // in self->logical() and new_logical follow the same order. So it's safe
        // to only increment `new_pos`.
        while (new_pos < new_rank && new_logical.at(new_pos) != new_id) {
          new_pos++;
        }
        NVF_ERROR_LT(
            new_pos,
            new_rank,
            "Failed to find ",
            new_id->toString(),
            " in ",
            new_logical);
        std::optional<bool>& new_contiguity = new_contiguities.at(new_pos);
    
        new_contiguity = contiguity;
        // When used during or before concretization, TransformReplay::selfReplay
        // can be applied to replay transformations from symbolic dimensions to
        // concrete dimensions, or in the reverse direction. Therefore,
        // `new_contiguity` is not always identical to `contiguity`.
        if (new_id->isBroadcast()) {
          new_contiguity = std::nullopt;
        } else if (new_id->isSymbolic()) {
          // See AliasTest.AccumulateSlices for an example. aliasOutputToInput is
          // called before concretization and tries to replay contiguity from a
          // broadcast IterDomain to a symbolic IterDomain. However, a symbolic
          // IterDomain can't have contiguity null.
          if (!new_contiguity.has_value()) {
            new_contiguity = true;
          }
        }
      }
    
      new_self->setContiguity(new_contiguities);
    }
    Reduction Axis Handling

    The logic for handling reduction axes in both loop and allocation replay relies on the mapped_new_ids set, but there is a risk of incorrect ordering or omission when new_id->isReduction() axes are added without ensuring alignment with the original domain structure.

    if (self->loop() != self->logical()) {
      std::vector<IterDomain*> new_loop;
      for (auto* new_id : new_self->logical()) {
        if (mapped_new_ids.count(new_id) == 0) {
          NVF_ERROR(
              new_id->isReduction(),
              new_id->toString(),
              " should be a reduction.");
          new_loop.push_back(new_id);
        }
    Contiguity Validation

    The updated validateContiguity function now uses NVF_CHECK_EQ with a custom error message, but it skips validation of individual contiguity entries when broadcast or reduction axes are involved, which could allow invalid states to pass silently.

    NVF_CHECK_EQ(
        contiguity.size(),
        allocation_domain.size(),
        "Invalid contiguity information provided, incorrect size.");

    @wujingyue wujingyue changed the title Replay loop unconditionally TransformReplay::selfReplay replays contiguity Oct 6, 2025
    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue wujingyue requested a review from Priya2698 October 7, 2025 04:19
    fusion->addInput(in);
    fusion->addInput(i);
    fusion->addOutput(acc_out);
    fusion->aliasOutputToInput(acc_out, acc_in, AllocationType::ReuseBuffer);
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This calls TransformReplay::selfReplay to replay transformations from concrete dimensions to symbolic dimensions.

    @Priya2698 Priya2698 requested a review from naoyam October 7, 2025 17:52
    @wujingyue wujingyue requested a review from Priya2698 October 7, 2025 22:35
    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @naoyam
    Copy link
    Collaborator

    naoyam commented Oct 8, 2025

    It would be really helpful to have some quick PR introduction to remind the context. My mental capacity is not big enough to remember everything currently going on.

    @Priya2698
    Copy link
    Collaborator

    Priya2698 commented Oct 8, 2025

    Thanks @wujingyue for the clarifications.
    Changes LGTM, I'll defer approval to @naoyam since he had concerns about changes in this function.

    @wujingyue
    Copy link
    Collaborator Author

    It would be really helpful to have some quick PR introduction to remind the context. My mental capacity is not big enough to remember everything currently going on.

    Done

    @wujingyue
    Copy link
    Collaborator Author

    !test

    }

    new_self->setAllocationDomain(new_allocation, new_contiguities);
    } else {
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    This feels a little unexpected to me because, even though there's nothing to replay for the allocation, the contiguity could be modified.

    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    What would happen if self doesn't have an allocation domain but new_self does? Would it work?

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Good catch. Added a check and a comment.

    @naoyam
    Copy link
    Collaborator

    naoyam commented Oct 17, 2025

    !test --diff

    @naoyam
    Copy link
    Collaborator

    naoyam commented Oct 17, 2025

    Fixes bugs like #5356

    We should probably add some knobs so callers decide what to replay, in a separate PR. So far, this function tries to replay everything (namely, loop, allocation and contiguity), which seems to work fine.

    Changes of contiguity may not cause test failures but could result in, e.g., different vectorizations. Started the diff check just in case.

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue wujingyue requested a review from naoyam October 29, 2025 05:59
    @wujingyue
    Copy link
    Collaborator Author

    !test --diff

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants