Skip to content

fix: fix gemma3#2185

Merged
terrykong merged 8 commits intomainfrom
yukih/fix-fsdp2-move-device
Apr 5, 2026
Merged

fix: fix gemma3#2185
terrykong merged 8 commits intomainfrom
yukih/fix-fsdp2-move-device

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Apr 1, 2026

as title, fix gemma3 after transformers v5 bump

  1. fix model layer name and add missing token_type_ids for gemma3
  2. fix broadcast buffers in a deterministic order in fsdp2 (dtensor v1)

fix the following release/perf tests:

  • grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long
  • grpo-gemma3-27b-it-8n4g-fsdp2tp4-actckpt-long

@yuki-97 yuki-97 requested review from a team as code owners April 1, 2026 15:54
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 1, 2026
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 1, 2026

/ok to test 55b9d86

terrykong
terrykong previously approved these changes Apr 1, 2026
@terrykong terrykong enabled auto-merge (squash) April 1, 2026 16:00
@yuki-97 yuki-97 marked this pull request as draft April 1, 2026 17:02
auto-merge was automatically disabled April 1, 2026 17:02

Pull request was converted to draft

@yuki-97 yuki-97 force-pushed the yukih/fix-fsdp2-move-device branch from 55b9d86 to c4f3002 Compare April 3, 2026 08:48
@yuki-97 yuki-97 changed the title fix: fix fsdp2 (dtensor v1) move to device fix: fix fsdp2 (dtensor v1) gemma3 Apr 3, 2026
@yuki-97 yuki-97 force-pushed the yukih/fix-fsdp2-move-device branch 3 times, most recently from 9d81d75 to b974855 Compare April 3, 2026 09:21
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 3, 2026

/ok to test b974855

@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 3, 2026

/ok to test 007a51d

yuki-97 added 4 commits April 3, 2026 06:04
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 force-pushed the yukih/fix-fsdp2-move-device branch from 007a51d to e88218a Compare April 3, 2026 13:49
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 changed the title fix: fix fsdp2 (dtensor v1) gemma3 fix: fix gemma3 Apr 3, 2026
yuki-97 added 2 commits April 4, 2026 10:36
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 4, 2026

/ok to test c9a8066

@yuki-97 yuki-97 marked this pull request as ready for review April 5, 2026 04:10
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 5, 2026

/ok to test 880a0b8

@terrykong terrykong enabled auto-merge (squash) April 5, 2026 06:12
@terrykong terrykong merged commit 986d48a into main Apr 5, 2026
45 of 48 checks passed
@terrykong terrykong deleted the yukih/fix-fsdp2-move-device branch April 5, 2026 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants