Skip to content

fix: Nemotron v3 inputs_embeds generation#1583

Merged
akoumpa merged 1 commit intomainfrom
fix/nemotron-v3-inputs-embeds-generate
Mar 19, 2026
Merged

fix: Nemotron v3 inputs_embeds generation#1583
akoumpa merged 1 commit intomainfrom
fix/nemotron-v3-inputs-embeds-generate

Conversation

@pzelasko
Copy link
Copy Markdown
Contributor

What does this PR do ?

This fixes a cached generation bug in Nemotron-v3 when generation is driven by inputs_embeds instead of normal input_ids, which affects multimodal callers that expand placeholders into longer embedding sequences before calling generate().

The failure had two parts in prepare_inputs_for_generation():

  • On prefill, cache_position was derived from input_ids.shape[1], which is wrong when inputs_embeds carries the real prompt and the internal input_ids buffer is empty.
  • On decode, a full cache_position vector could be forwarded even though only one token was being decoded. Nemotron-v3’s Mamba cache update expects a single decode position there.

That combination corrupts incremental decoding state. In practice it can cause models to fail during autoregressive multimodal generation.

The fix unblocks correct multimodal autoregressive generation for Nemotron-v3-based models using inputs_embeds, including the SpeechLM flow that injects audio features into the prompt before calling generate().

Changelog

  • Add specific line by line info of high level changes in this PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Mar 19, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@akoumpa akoumpa changed the title Fix Nemotron v3 inputs_embeds generation fix: Nemotron v3 inputs_embeds generation Mar 19, 2026
@akoumpa
Copy link
Copy Markdown
Contributor

akoumpa commented Mar 19, 2026

/ok to test a952c7a

Copy link
Copy Markdown
Contributor

@akoumpa akoumpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @pzelasko 🙇

@akoumpa akoumpa merged commit a3987c9 into main Mar 19, 2026
51 of 53 checks passed
@akoumpa akoumpa deleted the fix/nemotron-v3-inputs-embeds-generate branch March 19, 2026 23:52
pthombre pushed a commit that referenced this pull request Mar 20, 2026
Fix Nemotron v3 inputs_embeds generation

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
akoumpa added a commit that referenced this pull request Mar 20, 2026
* feat: Integrate Wan with multi-resolution DL

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Required changes for compatability with AM container

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix overrides

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* feat: Add docs about diffusion support in AM

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Remove older data processing tools

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/dataset.md

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Apply suggestions from code review

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Revert non-doc code changes to match main

This PR is docs-only; restore test and tool files to main's state.

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Restructure the diffusion finetuning doc

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* fix: Nemotron v3 inputs_embeds generation (#1583)

Fix Nemotron v3 inputs_embeds generation

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* fix: checkpointing for PEFT. (#1576)

* Fix checkpointing for PEFT.

Previously, the state_dict in the modelstate class had an if/elseif/else statement where
peft was handled in two caces and non-peft on the third one.

The first case of peft, was handling correctly, while the second was including
buffers causing issues in downstream consumers.

This fix simplifies the logic (simple if/else) and bypassed the issues with the
buffer.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update stateful_wrappers.py

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update test

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* improve error logging in test; pass is_peft to optimizerstate

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix & logging

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add filter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* u

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* convert may return tuple or dict :S

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/dataset-overview.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/overview.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Add cr changes

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* ci: Move source install fla to dev group (#1580)

* Move source install fla to dev group

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

* Update uv lock

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update model coverage

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update overview doc

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update Hunyuan number of params

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix docs build issue

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

---------

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
torsli pushed a commit that referenced this pull request Mar 24, 2026
Fix Nemotron v3 inputs_embeds generation
torsli pushed a commit that referenced this pull request Mar 24, 2026
* feat: Integrate Wan with multi-resolution DL

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Required changes for compatability with AM container

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix overrides

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* feat: Add docs about diffusion support in AM

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Remove older data processing tools

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/dataset.md

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Apply suggestions from code review

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Revert non-doc code changes to match main

This PR is docs-only; restore test and tool files to main's state.

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Restructure the diffusion finetuning doc

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* fix: Nemotron v3 inputs_embeds generation (#1583)

Fix Nemotron v3 inputs_embeds generation

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* fix: checkpointing for PEFT. (#1576)

* Fix checkpointing for PEFT.

Previously, the state_dict in the modelstate class had an if/elseif/else statement where
peft was handled in two caces and non-peft on the third one.

The first case of peft, was handling correctly, while the second was including
buffers causing issues in downstream consumers.

This fix simplifies the logic (simple if/else) and bypassed the issues with the
buffer.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update stateful_wrappers.py

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update test

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* improve error logging in test; pass is_peft to optimizerstate

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix & logging

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add filter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* u

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* convert may return tuple or dict :S

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/dataset-overview.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/overview.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Add cr changes

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* ci: Move source install fla to dev group (#1580)

* Move source install fla to dev group

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

* Update uv lock

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update model coverage

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update overview doc

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update Hunyuan number of params

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix docs build issue

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

---------

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
linnanwang pushed a commit that referenced this pull request Apr 24, 2026
Fix Nemotron v3 inputs_embeds generation
linnanwang pushed a commit that referenced this pull request Apr 24, 2026
* feat: Integrate Wan with multi-resolution DL

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Required changes for compatability with AM container

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix overrides

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* feat: Add docs about diffusion support in AM

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Remove older data processing tools

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/dataset.md

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Apply suggestions from code review

Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Revert non-doc code changes to match main

This PR is docs-only; restore test and tool files to main's state.

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Restructure the diffusion finetuning doc

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* fix: Nemotron v3 inputs_embeds generation (#1583)

Fix Nemotron v3 inputs_embeds generation

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* fix: checkpointing for PEFT. (#1576)

* Fix checkpointing for PEFT.

Previously, the state_dict in the modelstate class had an if/elseif/else statement where
peft was handled in two caces and non-peft on the third one.

The first case of peft, was handling correctly, while the second was including
buffers causing issues in downstream consumers.

This fix simplifies the logic (simple if/else) and bypassed the issues with the
buffer.

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Update stateful_wrappers.py

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update test

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* improve error logging in test; pass is_peft to optimizerstate

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix & logging

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add filter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fmt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* u

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* convert may return tuple or dict :S

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/diffusion/finetune.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/dataset-overview.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update docs/guides/overview.md

Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Add cr changes

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* ci: Move source install fla to dev group (#1580)

* Move source install fla to dev group

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

* Update uv lock

Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update model coverage

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update overview doc

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Update Hunyuan number of params

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

* Fix docs build issue

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>

---------

Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com>
Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com>
Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants