Add support for EP to context parallelism in self-attention#2023
Merged
copybara-service[bot] merged 1 commit intomainfrom Aug 8, 2025
Merged
Add support for EP to context parallelism in self-attention#2023copybara-service[bot] merged 1 commit intomainfrom
copybara-service[bot] merged 1 commit intomainfrom
Conversation
e719d35 to
18d8769
Compare
9e07f65 to
9eb87ff
Compare
RissyRan
reviewed
Jul 29, 2025
Collaborator
RissyRan
left a comment
There was a problem hiding this comment.
Thanks Shuning! Great work!
e943bff to
b5241dc
Compare
gobbleturk
reviewed
Aug 1, 2025
RissyRan
reviewed
Aug 1, 2025
RissyRan
approved these changes
Aug 1, 2025
gobbleturk
reviewed
Aug 1, 2025
Collaborator
gobbleturk
left a comment
There was a problem hiding this comment.
Have you considered an approach like conditionally modifying the rules (instead of creating new ones?) This is an approach used for pipeline parallelism
Line 788 in fdf479f
there are pros and cons of both, both pretty ugly IMO but at least when modifying rules there are
- less rules
- no if statements which rules to use
gobbleturk
approved these changes
Aug 4, 2025
richjames0
reviewed
Aug 5, 2025
Collaborator
richjames0
left a comment
There was a problem hiding this comment.
Really impressive that you understood this and got it working!
a9f3ea9 to
40f8ad2
Compare
3dd6196 to
251a4ce
Compare
Collaborator
Author
|
Resolved merge conflict with nnx migration for attention layer
Re-testing on local v5p-8, diff (before vs. after nnx migration)
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Goal
For mixture of expert models, we may use expert parallelism. For attention layer, EP acts as FSDP currently. Built upon previous context parallelism work, this PR is to introduce the option of using EP as CP for attention. This is joint effort with @RissyRan.
FIXES: b/418396648
Main code changes
attentions.pybase.ymlexpert_shard_attention_option: fsdp or contextunit test:
tests.attention_testAttentionTest.test_tpu_flash_attention_cp_and_ep&MLATest.test_tpu_flash_attention_cp_and_ep(extended from cp test)Use case
ici_expert_parallelism=4, ici_context_parallelism=1, expert_shard_attention_option=context, shard context by 4 in attention, shard expert by 4 for moeici_expert_parallelism=2, ici_context_parallelism=2, expert_shard_attention_option=context, shard context by 4 in attention, shard expert by 2 and context by 2 for moeTests
Tested on v5p-8
Verify sharding shape
context_parallel_load_balance=True, parallelism:ici_expert_parallelism=2, ici_context_parallelism=2, expert_shard_attention_option=contextandici_expert_parallelism=2, ici_context_parallelism=2, expert_shard_attention_option=fsdpVerify attention output logit against dot product
context_parallel_load_balance={True, False}, parallelism: {ici_expert_parallelism=4, expert_shard_attention_option=context,ici_expert_parallelism=2, expert_shard_attention_option=context, ici_context_parallelism=2}Checklist
Before submitting this PR, please make sure (put X in square brackets):