Skip to content

Comments

Add flexible virtual pipeline parallel (fVPP) to hybrid model#3377

Open
duncanriach wants to merge 31 commits intoNVIDIA:mainfrom
duncanriach:add-hybrid-fvpp-to-main-v1
Open

Add flexible virtual pipeline parallel (fVPP) to hybrid model#3377
duncanriach wants to merge 31 commits intoNVIDIA:mainfrom
duncanriach:add-hybrid-fvpp-to-main-v1

Conversation

@duncanriach
Copy link
Contributor

@duncanriach duncanriach commented Feb 11, 2026

What does this PR do ?

This PR introduces flexible virtual pipeline parallel (fVPP) for the hybrid model, allowing users to define explicit pipeline stage boundaries directly in the hybrid layer pattern. It also consolidates and simplifies the hybrid model configuration interface by replacing multiple arguments with a single --hybrid-layer-pattern flag and deprecating several legacy arguments. Pipelining for the hybrid model does not support MTP standalone mode, which is supported for the GPT model.

It also removes the old way of enabling pipeline parallel (by just setting --pipeline-model-parallel-size > 1). Now, as well as setting that, the user has to introduce pipe ("|") symbols into the hybrid layer pattern to specify how the layers should be partitioned between pipeline stages.

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]
Loading

Pre-checks

  • I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

@duncanriach duncanriach requested review from a team as code owners February 11, 2026 23:46
@ko3n1g ko3n1g requested a review from a team February 11, 2026 23:46
@ko3n1g ko3n1g added this to the Core 0.16 milestone Feb 11, 2026
@yobibyte
Copy link
Contributor

@jalbericiola @jon-barker when this is merged, we will need to update some of the configs in internal launching scripts.

@duncanriach duncanriach marked this pull request as ready for review February 23, 2026 19:22
@duncanriach duncanriach requested a review from a team as a code owner February 23, 2026 19:22
@duncanriach duncanriach self-assigned this Feb 23, 2026
@duncanriach duncanriach added the Expert Review Apply this label to indicate that your PR is ready for expert review. label Feb 23, 2026
@ko3n1g
Copy link
Contributor

ko3n1g commented Feb 23, 2026

/ok to test 415c973

Copy link

@matthieule matthieule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for multimodal changes

@jon-barker
Copy link
Contributor

@duncanriach am I correct in understanding that the hybrid patterns that don't use the new | operator are unchanged so it's safe to continue to load them from checkpoints, e.g. nemotron6 nano?

@duncanriach
Copy link
Contributor Author

duncanriach commented Feb 23, 2026

@duncanriach am I correct in understanding that the hybrid patterns that don't use the new | operator are unchanged so it's safe to continue to load them from checkpoints, e.g. nemotron6 nano?

Yes. Old checkpoints, including v3 checkpoints (that use --hybrid-override-pattern) will still load properly. Layer patterns that do not use | are also valid and will still work. This has been verified by the dynamic inference functional tests. The only regression is that if you want to enable PP>1, then you need to specify in the layer pattern how you want the model partitioned.

@jon-barker
Copy link
Contributor

@duncanriach am I correct in understanding that the hybrid patterns that don't use the new | operator are unchanged so it's safe to continue to load them from checkpoints, e.g. nemotron6 nano?

Yes. Old checkpoints, including v3 checkpoints (that use --hybrid-override-pattern) will still load properly. Layer patterns that do not use | are also valid and will still work. This has been verified by the dynamic inference functional tests. The only regression is that if you want to enable PP>1, then you need to specify in the layer pattern how you want the model partitioned.

Great - LGTM from the RL side then

@duncanriach
Copy link
Contributor Author

duncanriach commented Feb 24, 2026

@jon-barker, after thinking some more about our earlier interaction, I made this change.

Copy link
Contributor

@rogerwaleffe rogerwaleffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@duncanriach duncanriach added Final Review Apply this label to indicate that your PR is ready for final review. and removed Expert Review Apply this label to indicate that your PR is ready for expert review. labels Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Final Review Apply this label to indicate that your PR is ready for final review. Run functional tests Run MBridge tests Attach this for testing this PR against MBridge main Run tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants