Skip to content

[graph_trainer] Save CUDA-to-CPU copies in SAC pass to match core behavior#2811

Closed
SherlockNoMad wants to merge 1 commit intogh/SherlockNoMad/16/basefrom
gh/SherlockNoMad/16/head
Closed

[graph_trainer] Save CUDA-to-CPU copies in SAC pass to match core behavior#2811
SherlockNoMad wants to merge 1 commit intogh/SherlockNoMad/16/basefrom
gh/SherlockNoMad/16/head

Conversation

@SherlockNoMad
Copy link
Copy Markdown
Contributor

@SherlockNoMad SherlockNoMad commented Apr 3, 2026

…avior

Core's _apply_op_sac always marks aten._to_copy CUDA->CPU transfers as
MUST_SAVE to avoid wastefully recomputing device transfers during
backward (e.g., MoE D2H sync for all-to-all metadata).

Graph_trainer's apply_sac_pass was missing this check, causing these
transfers to fall through to PREFER_RECOMPUTE. Add the same logic at
the FX graph level by inspecting the source node's fake tensor device
and the target device kwarg.

[ghstack-poisoned]
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 3, 2026
SherlockNoMad added a commit that referenced this pull request Apr 9, 2026
…core behavior

Port the fix from PR #2811 to graph_trainer's `apply_sac_pass`. Core's
`_apply_op_sac` always marks `aten._to_copy` CUDA->CPU transfers as
MUST_SAVE to avoid wastefully recomputing device transfers during
backward (e.g., MoE D2H sync for all-to-all metadata). Graph_trainer's
`apply_sac_pass` was missing this check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant