Skip to content

[GraphTrainer][AutoDev] Fix missing CP wiring in llama3 and deepseek_v3 parallelize#2910

Draft
SherlockNoMad wants to merge 2 commits intomainfrom
graph_trainer/fix-cp-wiring
Draft

[GraphTrainer][AutoDev] Fix missing CP wiring in llama3 and deepseek_v3 parallelize#2910
SherlockNoMad wants to merge 2 commits intomainfrom
graph_trainer/fix-cp-wiring

Conversation

@SherlockNoMad
Copy link
Copy Markdown
Contributor

Summary

  • Port the fix from PR [graph_trainer] Fix missing CP wiring in llama3 and deepseek_v3 parallelize #2808 to GraphTrainer's experiment parallelize files
  • llama3's parallelize_llama was calling apply_tp() without passing enable_cp and enable_sp, causing context parallelism to silently malfunction
  • Both llama3 and deepseek_v3 were missing the apply_cp_to_attention_module() call that configures attention modules for CP ring attention communication

Changes

  • llama3/parallelize.py: Add enable_cp and enable_sp keyword arguments to the apply_tp() call, and add the apply_cp_to_attention_module() call after maybe_enable_async_tp
  • deepseek_v3/parallelize.py: Add the apply_cp_to_attention_module() call after the TP/EP block (deepseek_v3 already had enable_cp/enable_sp in its apply_non_moe_tp() call)

Test plan

  • Run pre-commit run --all-files (passed, no new issues)
  • Run integration tests with CP enabled for llama3 and deepseek_v3

Ports fix from #2808.

…v3 parallelize

Port the fix from PR #2808 to GraphTrainer's experiment parallelize files.

llama3's parallelize_llama was calling apply_tp() without passing enable_cp
and enable_sp, causing context parallelism to silently malfunction. Both
llama3 and deepseek_v3 were missing the apply_cp_to_attention_module() call
that configures attention modules for CP ring attention communication.

Changes:
- Add enable_cp and enable_sp kwargs to apply_tp() call in llama3
- Add apply_cp_to_attention_module() call in both llama3 and deepseek_v3
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 9, 2026
)
maybe_enable_async_tp(parallelism, compile_config, tp_mesh)

if parallel_dims.cp_enabled:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have test coverage for this?

can you try a config with cpu enable, and see that it take to support CP?

same comment applies to dsv3.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoDev: There was already test coverage for CP in JIT mode (lines 100-128 for llama3 HSDP+CP and FSDP+TP+CP, line 354-366 for deepseek_v3 FSDP+CP), but no aot_fx_trace CP tests existed.

Added two aot_fx_trace integration tests with context parallelism enabled:

  • llama3: aot_fx_trace FSDP+TP+CP with dp_shard=2, tp=2, cp=2 on 8 GPUs (skip_rocm_test=True since aot_fx_trace applies cudagraph by default)
  • deepseek_v3: aot_fx_trace FSDP+CP with dp_shard=4, cp=2 on 8 GPUs (follows the same pattern as the existing JIT FSDP+CP test; uses SDPA config since CP only supports SDPA for dsv3)

Note: dsv3 CP test does not include EP/ETP since the existing JIT CP test also omits it — keeping the test focused on CP wiring.

…xt parallelism

Add integration tests for context parallelism (CP) in aot_fx_trace mode
for both llama3 and deepseek_v3, covering the CP wiring added in the
parent commit.

- llama3: aot_fx_trace FSDP+TP+CP (dp_shard=2, tp=2, cp=2, 8 GPUs)
- deepseek_v3: aot_fx_trace FSDP+CP (dp_shard=4, cp=2, 8 GPUs)
@SherlockNoMad SherlockNoMad marked this pull request as ready for review April 10, 2026 01:09
@SherlockNoMad SherlockNoMad force-pushed the graph_trainer/fix-cp-wiring branch from 9b439fe to 958b539 Compare April 10, 2026 18:26
@SherlockNoMad SherlockNoMad marked this pull request as draft April 10, 2026 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants