Skip to content

[GraphTrainer][AutoDev] Add gradient accumulation integration test#2914

Draft
SherlockNoMad wants to merge 2 commits intomainfrom
graph_trainer/test-gradient-accumulation
Draft

[GraphTrainer][AutoDev] Add gradient accumulation integration test#2914
SherlockNoMad wants to merge 2 commits intomainfrom
graph_trainer/test-gradient-accumulation

Conversation

@SherlockNoMad
Copy link
Copy Markdown
Contributor

Summary

  • Port of PR [graph_trainer] Add gradient accumulation integration test #2814: adds a JIT gradient accumulation integration test for GraphTrainer
  • Test uses local_batch_size=8 with global_batch_size=64 on 4 GPUs, resulting in 2 gradient accumulation steps per optimizer step
  • Placed in the JIT mode test section, after jit_optional_checkpoint and before AOT mode tests

Test plan

  • CI passes on the new jit_gradient_accumulation test with 4 GPUs
  • Existing integration tests remain unaffected

Port PR #2814: Add a JIT gradient accumulation integration test that
verifies training with 2 gradient accumulation steps (local_batch_size=8,
global_batch_size=64, 4 GPUs).
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 9, 2026
# multiply by 2: 32 * 2 = 64.
"--module graph_trainer.llama3",
"--config graph_trainer_llama3_debugmodel",
"--compile.mode jit",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, but don't do this for jit mode, do this for aot_fx_trace, and test it with GraphTrainer.

we are planning to deprecate jit and aot, remember that.

Now try add the same gradient accumulation test for aot_fx_trace.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoDev: Changed the test from JIT mode to aot_fx_trace mode and moved it to the aot_fx_trace test section. JIT and AOT are being deprecated in favor of aot_fx_trace.

…race mode

Move the gradient accumulation integration test from the JIT section to
the aot_fx_trace section, as JIT and AOT modes are being deprecated in
favor of aot_fx_trace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant