Skip to content

fixing wrong stride retrieved from fusion#4028

Merged
jjsjann123 merged 1 commit intomainfrom
fixing_wrong_stride_size
Mar 7, 2025
Merged

fixing wrong stride retrieved from fusion#4028
jjsjann123 merged 1 commit intomainfrom
fixing_wrong_stride_size

Conversation

@jjsjann123
Copy link
Collaborator

We forgot to consider output with aliases when constructing kernel arguments.
This ended up giving us wrong arguments passed for kernel, producing wrong results.

@jjsjann123
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Mar 6, 2025

Description

  • Correctly handle stride retrieval for aliased tensors

  • Ensure kernel arguments are correctly constructed


Changes walkthrough 📝

Relevant files
Bug fix
allocations.cpp
Correct stride retrieval for aliased tensors                         

csrc/runtime/allocations.cpp

  • Adjusted condition to handle non-New allocation types
  • Added logic to retrieve aliased tensor for ReuseBuffer type
  • Ensured tensor evaluation and stride retrieval are correct
  • +6/-1     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The condition if (alias_info.type != AllocationType::New) seems redundant and might lead to incorrect behavior if alias_info.type is neither New nor ReuseBuffer.

    if (alias_info.type != AllocationType::New) {
      // For reuse buffer alias, we need to get the aliased_io's size/stride
      if (alias_info.type == AllocationType::ReuseBuffer) {

    @jjsjann123 jjsjann123 requested a review from csarofeen March 6, 2025 19:47
    @jjsjann123 jjsjann123 marked this pull request as ready for review March 6, 2025 19:53
    if (alias_info.type != AllocationType::New) {
    // For reuse buffer alias, we need to get the aliased_io's size/stride
    if (alias_info.type == AllocationType::ReuseBuffer) {
    tv = alias_info.aliased_io->as<TensorView>();
    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Without this patch, we'll ended up with passing the wrong kernel arguments when we have inplace updates on inputs with a non-trivial allocation domain.

    The earlier refactor PR broke this test:

    fd.add_output(T6, T0)
    fd.add_output(T6)

    Unfortunately the two PRs ( the refactor #3952 and the test #3970 ) were merged in parallel and never combined in the CI.

    Copy link
    Collaborator

    @csarofeen csarofeen left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Approving, but we should understand this only works if we guarantee that input and outputs are exactly the same. i.e. they have exact matching of logical domains where every ID is exact matched in the same order and no other extraneous IDs (at least those that are not reduction). Even a broadcast would invalidate this approach. J, please follow up with a PR to make sure this is the case.

    @jjsjann123 jjsjann123 merged commit 22d4f97 into main Mar 7, 2025
    53 checks passed
    @jjsjann123 jjsjann123 deleted the fixing_wrong_stride_size branch March 7, 2025 01:19
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants