Add flexible virtual pipeline parallel (fVPP) to hybrid model by duncanriach · Pull Request #3377 · NVIDIA/Megatron-LM

duncanriach · 2026-02-11T23:46:09Z

What does this PR do ?

This PR introduces flexible virtual pipeline parallel (fVPP) for the hybrid model, allowing users to define explicit pipeline stage boundaries directly in the hybrid layer pattern. It also consolidates and simplifies the hybrid model configuration interface by replacing multiple arguments with a single --hybrid-layer-pattern flag and deprecating several legacy arguments. Pipelining for the hybrid model does not support MTP standalone mode, which is supported for the GPT model.

It also removes the old way of enabling pipeline parallel (by just setting --pipeline-model-parallel-size > 1). Now, as well as setting that, the user has to introduce pipe ("|") symbols into the hybrid layer pattern to specify how the layers should be partitioned between pipeline stages.

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

yobibyte · 2026-02-13T09:13:07Z

@jalbericiola @jon-barker when this is merged, we will need to update some of the configs in internal launching scripts.

ko3n1g · 2026-02-23T19:37:05Z

/ok to test 415c973

matthieule

LGTM for multimodal changes

jon-barker · 2026-02-23T23:20:46Z

@duncanriach am I correct in understanding that the hybrid patterns that don't use the new | operator are unchanged so it's safe to continue to load them from checkpoints, e.g. nemotron6 nano?

duncanriach · 2026-02-23T23:26:09Z

@duncanriach am I correct in understanding that the hybrid patterns that don't use the new | operator are unchanged so it's safe to continue to load them from checkpoints, e.g. nemotron6 nano?

Yes. Old checkpoints, including v3 checkpoints (that use --hybrid-override-pattern) will still load properly. Layer patterns that do not use | are also valid and will still work. This has been verified by the dynamic inference functional tests. The only regression is that if you want to enable PP>1, then you need to specify in the layer pattern how you want the model partitioned.

jon-barker · 2026-02-23T23:27:04Z

@duncanriach am I correct in understanding that the hybrid patterns that don't use the new | operator are unchanged so it's safe to continue to load them from checkpoints, e.g. nemotron6 nano?

Yes. Old checkpoints, including v3 checkpoints (that use --hybrid-override-pattern) will still load properly. Layer patterns that do not use | are also valid and will still work. This has been verified by the dynamic inference functional tests. The only regression is that if you want to enable PP>1, then you need to specify in the layer pattern how you want the model partitioned.

Great - LGTM from the RL side then

duncanriach · 2026-02-24T00:30:53Z

@jon-barker, after thinking some more about our earlier interaction, I made this change.

megatron/core/ssm/mamba_block.py

megatron/core/transformer/multi_token_prediction.py

rogerwaleffe

LGTM.

duncanriach requested review from a team as code owners February 11, 2026 23:46

copy-pr-bot bot temporarily deployed to nemo-ci February 11, 2026 23:46 Inactive

ko3n1g requested a review from a team February 11, 2026 23:46

copy-pr-bot bot temporarily deployed to nemo-ci February 11, 2026 23:46 Inactive

ko3n1g added this to the Core 0.16 milestone Feb 11, 2026

copy-pr-bot bot had a problem deploying to nemo-ci February 11, 2026 23:46 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 13, 2026 00:50 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 13, 2026 00:50 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 13, 2026 00:50 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 13, 2026 02:26 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 13, 2026 02:26 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 13, 2026 02:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci February 13, 2026 02:43 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 13, 2026 02:43 Failure

AAnoosheh approved these changes Feb 13, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci February 13, 2026 21:47 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 13, 2026 21:47 Failure

copy-pr-bot bot temporarily deployed to nemo-ci February 21, 2026 02:31 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci February 21, 2026 02:31 Failure

copy-pr-bot bot temporarily deployed to test February 21, 2026 02:32 Inactive

coderabbitai bot mentioned this pull request Feb 23, 2026

Improve megatron dataset preprocessing script and update docs NVIDIA/Model-Optimizer#918

Merged

duncanriach marked this pull request as ready for review February 23, 2026 19:22

duncanriach requested a review from a team as a code owner February 23, 2026 19:22

duncanriach self-assigned this Feb 23, 2026

duncanriach added the Expert Review Apply this label to indicate that your PR is ready for expert review. label Feb 23, 2026

Merge branch 'main' into add-hybrid-fvpp-to-main-v1

415c973

copy-pr-bot bot temporarily deployed to test February 23, 2026 19:38 Inactive

ko3n1g approved these changes Feb 23, 2026

View reviewed changes

matthieule approved these changes Feb 23, 2026

View reviewed changes

jon-barker approved these changes Feb 23, 2026

View reviewed changes

Enable different inference pipelining of pre-trained model

1401de1

copy-pr-bot bot had a problem deploying to test February 24, 2026 00:31 Error

Improve comment

ca74544

copy-pr-bot bot temporarily deployed to test February 24, 2026 00:36 Inactive

duncanriach mentioned this pull request Feb 24, 2026

Port DeepSeek Sparse Attention to MambaModel #3553

Draft

6 tasks

rogerwaleffe reviewed Feb 24, 2026

View reviewed changes

megatron/core/ssm/mamba_block.py Show resolved Hide resolved

rogerwaleffe reviewed Feb 24, 2026

View reviewed changes

megatron/core/transformer/multi_token_prediction.py Show resolved Hide resolved

rogerwaleffe approved these changes Feb 24, 2026

View reviewed changes

duncanriach added Final Review Apply this label to indicate that your PR is ready for final review. and removed Expert Review Apply this label to indicate that your PR is ready for expert review. labels Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add flexible virtual pipeline parallel (fVPP) to hybrid model#3377

Add flexible virtual pipeline parallel (fVPP) to hybrid model#3377
duncanriach wants to merge 31 commits intoNVIDIA:mainfrom
duncanriach:add-hybrid-fvpp-to-main-v1

duncanriach commented Feb 11, 2026 •

edited

Loading

Uh oh!

yobibyte commented Feb 13, 2026

Uh oh!

ko3n1g commented Feb 23, 2026

Uh oh!

matthieule left a comment

Uh oh!

jon-barker commented Feb 23, 2026

Uh oh!

duncanriach commented Feb 23, 2026 •

edited

Loading

Uh oh!

jon-barker commented Feb 23, 2026

Uh oh!

duncanriach commented Feb 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

rogerwaleffe left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Comments

Conversation

duncanriach commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

yobibyte commented Feb 13, 2026

Uh oh!

ko3n1g commented Feb 23, 2026

Uh oh!

matthieule left a comment

Choose a reason for hiding this comment

Uh oh!

jon-barker commented Feb 23, 2026

Uh oh!

duncanriach commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jon-barker commented Feb 23, 2026

Uh oh!

duncanriach commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rogerwaleffe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

duncanriach commented Feb 11, 2026 •

edited

Loading

(Step 1): Add PR label `Expert Review`

duncanriach commented Feb 23, 2026 •

edited

Loading

duncanriach commented Feb 24, 2026 •

edited

Loading