Skip to content

fix: add embed_vision to MULTIMODAL_SUFFIXES and set lbs=2 for Gemma4 PP2 recipe#1911

Merged
HuiyingLi merged 2 commits intoNVIDIA-NeMo:mainfrom
khazic:fix/gemma4-pp-followups
Apr 20, 2026
Merged

fix: add embed_vision to MULTIMODAL_SUFFIXES and set lbs=2 for Gemma4 PP2 recipe#1911
HuiyingLi merged 2 commits intoNVIDIA-NeMo:mainfrom
khazic:fix/gemma4-pp-followups

Conversation

@khazic
Copy link
Copy Markdown
Contributor

@khazic khazic commented Apr 20, 2026

What does this PR do ?

Two small follow-up fixes to #1904, caught in review by @HuiyingLi: correct `embed_vision` stage-0 assignment for Gemma4 PP, and fix `local_batch_size` in the PP2 recipe to avoid a 100% pipeline bubble.

Changelog

  • Add `embed_vision` to `MULTIMODAL_SUFFIXES` in `hf_utils.py` so Gemma4's vision projection layer is correctly assigned to PP Stage 0 alongside `vision_tower`. It was missing because original PP testing used pure-text inputs only.
  • Set `local_batch_size: 2` in `gemma4_31b_tp4_pp2.yaml`. `lbs=1` technically runs (the per-rank stage count for 1F1B is 1, so no warning triggers), but produces zero pipeline overlap (100% bubble). Local runs were always done with `lbs=2`; the recipe was submitted with `lbs=1` by mistake.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

Additional Information

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

khazic added 2 commits April 20, 2026 13:53
…signment

Signed-off-by: khazic <khazzz1c@gmail.com>
…= pp_stages

Signed-off-by: khazic <khazzz1c@gmail.com>
@khazic khazic force-pushed the fix/gemma4-pp-followups branch from 50d6310 to e97b160 Compare April 20, 2026 05:54
@khazic khazic changed the title fix: Gemma4 PP follow-ups — embed_vision stage-0 assignment and lbs=2 for PP2 recipe fix: add embed_vision to MULTIMODAL_SUFFIXES and set lbs=2 for Gemma4 PP2 recipe Apr 20, 2026
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Apr 20, 2026

/ok to test e97b160

Copy link
Copy Markdown
Contributor

@HuiyingLi HuiyingLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!

@HuiyingLi HuiyingLi enabled auto-merge (squash) April 20, 2026 08:00
@HuiyingLi HuiyingLi merged commit c2b7f37 into NVIDIA-NeMo:main Apr 20, 2026
57 of 58 checks passed
linnanwang pushed a commit that referenced this pull request Apr 24, 2026
… PP2 recipe (#1911)

* fix: add embed_vision to MULTIMODAL_SUFFIXES for Gemma4 PP stage-0 assignment

Signed-off-by: khazic <khazzz1c@gmail.com>

* fix: set local_batch_size=2 for PP2 recipe to ensure n_microbatches >= pp_stages

Signed-off-by: khazic <khazzz1c@gmail.com>

---------

Signed-off-by: khazic <khazzz1c@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants