[Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP … #1712

tomjen12 · 2025-12-23T05:10:07Z

Motivation

When running distributed training (DDP) with torch.compile enabled, the training crashes during the AOTAutograd graph partitioning phase (min_cut_rematerialization_partition).

The error indicates that a tensor view operation in the backward graph is invalid relative to the graph partition boundary:

torch._dynamo.exc.BackendCompilerFailed: backend='compile_fn' raised:
AssertionError: Node view_21 was invalid, but is output

Technical Details

The _flash_attn_backward op mutates gradients (dq, dk, dv) in-place. Previously, these side effects were not registered, causing AOTAutograd to misinterpret aliasing relationships (e.g., view ops on gradients) as invalid nodes.

I updated the @torch_compile_guard to explicitly include mutates_args=["dq", "dk", "dv"], allowing correct schema inference and graph partitioning.

@torch_compile_guard(mutates_args=["dq", "dk", "dv"], gen_fake=_flash_attn_backward_fake)

Test Plan

Run a training loop on a GPT-3 6.7B model using DDP and torch.compile(model)

Test Result

Before Fix: The script crashes immediately during the first backward pass with AssertionError: Node view_21 was invalid.

After Fix: The graph compiles successfully, and the training loop proceeds without errors. Validated on a gpt-3-6.7B model.

smci355-ccs-aus-m01-17[0:00/0][2025-12-22 20:37:48,037] [train.py:129] train_step: 0, loss=0.00033572062966413796, iter_dt=146.44924759864807, fps_gpu=0.013656608229774632, fps_tot=0.10925286583819706
smci355-ccs-aus-m01-17[0:00/0][2025-12-22 20:37:51,171] [train.py:129] train_step: 1, loss=25.630146026611328, iter_dt=0.5292143821716309, fps_gpu=3.7791867858787236, fps_tot=30.23349428702979

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…error

Copilot

Pull request overview

This PR fixes a crash in distributed training (DDP) with torch.compile enabled by adding mutation annotations to the _flash_attn_backward function. The issue occurred during AOTAutograd graph partitioning when the compiler couldn't properly track that certain gradient tensors were being mutated in-place.

Adds mutates_args=["dq", "dk", "dv"] to the @torch_compile_guard decorator for _flash_attn_backward
Enables correct schema inference and graph partitioning for AOTAutograd
Resolves the "Node view_21 was invalid, but is output" assertion error

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

#1712) * [Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP error * Apply black formatting

[Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP …

f686422

…error

tomjen12 requested review from a team and ZhangLirong-amd December 23, 2025 05:10

tomjen12 added 2 commits December 22, 2025 23:15

Apply black formatting

192ff90

Merge branch 'main' into fix/flash-attn-backward-mutation

89932b4

Copilot AI review requested due to automatic review settings December 29, 2025 02:10

Copilot started reviewing on behalf of tomjen12 December 29, 2025 02:11 View session

Copilot AI reviewed Dec 29, 2025

View reviewed changes

valarLip approved these changes Dec 29, 2025

View reviewed changes

valarLip merged commit 7a905f0 into main Jan 5, 2026
27 checks passed

valarLip deleted the fix/flash-attn-backward-mutation branch January 5, 2026 13:31

farlukas pushed a commit that referenced this pull request Jan 5, 2026

[Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP … (

3207946

#1712) * [Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP error * Apply black formatting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP … #1712

[Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP … #1712

Uh oh!

tomjen12 commented Dec 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP … #1712

[Fix] Add mutates_args to flash_attn_backward to fix AOTAutograd DDP … #1712

Uh oh!

Conversation

tomjen12 commented Dec 23, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants