fix: Nemotron v3 inputs_embeds generation#1583
Merged
Conversation
Contributor
|
/ok to test a952c7a |
pthombre
pushed a commit
that referenced
this pull request
Mar 20, 2026
Fix Nemotron v3 inputs_embeds generation Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com>
akoumpa
added a commit
that referenced
this pull request
Mar 20, 2026
* feat: Integrate Wan with multi-resolution DL Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Required changes for compatability with AM container Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix overrides Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * feat: Add docs about diffusion support in AM Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Remove older data processing tools Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/dataset.md Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Apply suggestions from code review Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Revert non-doc code changes to match main This PR is docs-only; restore test and tool files to main's state. Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Restructure the diffusion finetuning doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: Nemotron v3 inputs_embeds generation (#1583) Fix Nemotron v3 inputs_embeds generation Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: checkpointing for PEFT. (#1576) * Fix checkpointing for PEFT. Previously, the state_dict in the modelstate class had an if/elseif/else statement where peft was handled in two caces and non-peft on the third one. The first case of peft, was handling correctly, while the second was including buffers causing issues in downstream consumers. This fix simplifies the logic (simple if/else) and bypassed the issues with the buffer. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update stateful_wrappers.py * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * improve error logging in test; pass is_peft to optimizerstate Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix & logging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add filter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * u Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * convert may return tuple or dict :S Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/dataset-overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Add cr changes Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * ci: Move source install fla to dev group (#1580) * Move source install fla to dev group Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update uv lock Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update model coverage Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update overview doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update Hunyuan number of params Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix docs build issue Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> --------- Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
torsli
pushed a commit
that referenced
this pull request
Mar 24, 2026
Fix Nemotron v3 inputs_embeds generation
torsli
pushed a commit
that referenced
this pull request
Mar 24, 2026
* feat: Integrate Wan with multi-resolution DL Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Required changes for compatability with AM container Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix overrides Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * feat: Add docs about diffusion support in AM Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Remove older data processing tools Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/dataset.md Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Apply suggestions from code review Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Revert non-doc code changes to match main This PR is docs-only; restore test and tool files to main's state. Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Restructure the diffusion finetuning doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: Nemotron v3 inputs_embeds generation (#1583) Fix Nemotron v3 inputs_embeds generation Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: checkpointing for PEFT. (#1576) * Fix checkpointing for PEFT. Previously, the state_dict in the modelstate class had an if/elseif/else statement where peft was handled in two caces and non-peft on the third one. The first case of peft, was handling correctly, while the second was including buffers causing issues in downstream consumers. This fix simplifies the logic (simple if/else) and bypassed the issues with the buffer. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update stateful_wrappers.py * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * improve error logging in test; pass is_peft to optimizerstate Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix & logging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add filter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * u Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * convert may return tuple or dict :S Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/dataset-overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Add cr changes Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * ci: Move source install fla to dev group (#1580) * Move source install fla to dev group Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update uv lock Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update model coverage Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update overview doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update Hunyuan number of params Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix docs build issue Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> --------- Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
linnanwang
pushed a commit
that referenced
this pull request
Apr 24, 2026
Fix Nemotron v3 inputs_embeds generation
linnanwang
pushed a commit
that referenced
this pull request
Apr 24, 2026
* feat: Integrate Wan with multi-resolution DL Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Required changes for compatability with AM container Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix overrides Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * feat: Add docs about diffusion support in AM Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Remove older data processing tools Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/dataset.md Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Apply suggestions from code review Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Revert non-doc code changes to match main This PR is docs-only; restore test and tool files to main's state. Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Restructure the diffusion finetuning doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: Nemotron v3 inputs_embeds generation (#1583) Fix Nemotron v3 inputs_embeds generation Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * fix: checkpointing for PEFT. (#1576) * Fix checkpointing for PEFT. Previously, the state_dict in the modelstate class had an if/elseif/else statement where peft was handled in two caces and non-peft on the third one. The first case of peft, was handling correctly, while the second was including buffers causing issues in downstream consumers. This fix simplifies the logic (simple if/else) and bypassed the issues with the buffer. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Update stateful_wrappers.py * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * improve error logging in test; pass is_peft to optimizerstate Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix & logging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add filter Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fmt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * u Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * convert may return tuple or dict :S Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/diffusion/finetune.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/dataset-overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update docs/guides/overview.md Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Add cr changes Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * ci: Move source install fla to dev group (#1580) * Move source install fla to dev group Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> * Update uv lock Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> --------- Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com> Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update model coverage Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update overview doc Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Update Hunyuan number of params Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> * Fix docs build issue Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> --------- Signed-off-by: Pranav Prashant Thombre <pthombre@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com> Co-authored-by: Andrew Chen <chenopis@users.noreply.github.com> Co-authored-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: Dong Hyuk Chang <thomaschang26@tutanota.com> Co-authored-by: NeMo Bot <nemo-bot@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
This fixes a cached generation bug in Nemotron-v3 when generation is driven by inputs_embeds instead of normal input_ids, which affects multimodal callers that expand placeholders into longer embedding sequences before calling
generate().The failure had two parts in
prepare_inputs_for_generation():cache_positionwas derived frominput_ids.shape[1], which is wrong wheninputs_embedscarries the real prompt and the internalinput_idsbuffer is empty.cache_positionvector could be forwarded even though only one token was being decoded. Nemotron-v3’s Mamba cache update expects a single decode position there.That combination corrupts incremental decoding state. In practice it can cause models to fail during autoregressive multimodal generation.
The fix unblocks correct multimodal autoregressive generation for Nemotron-v3-based models using inputs_embeds, including the SpeechLM flow that injects audio features into the prompt before calling generate().
Changelog
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information