Add sequence packing support for hybrid model by duncanriach · Pull Request #2913 · NVIDIA/Megatron-LM

duncanriach · 2026-01-12T22:40:42Z

What does this PR do ?

Adds sequence packing support for the hybrid model and also enhances --sft, SFTDataset, and pretrain_mamba.py to feed packed sequence data into the model.

Causes a regression in the functionality of --sft on the GPT model; this functionality will no longer be supported. To maintain this functionality, packed-sequence related changes would need to be made to pretrain_gpt.py following pretrain_mamba.py and, ideally, GPT model SFT packed-sequence convergence testing would need to be run. I believe this functionality will be getting added by PR 2282 from @parthmannan, which will be merged after this current PR.

This PR also does not contain functional tests for packed sequence support in the hybrid model. I believe that there are currently also no functional tests for packed sequence support in the GPT model.

See internal MR 4403

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

This is a cherry-pick of commit cd0a3206 from branch duncan/hybrid-packed-sequence-for-training-moe-june2025, which targets the training-moe-june2025 branch.

This is a cherry-pick of commit 77e69a6a from branch duncan/hybrid-packed-sequence-for-training-moe-june2025, which targets the training-moe-june2025 branch.

This is a cherry-pick of commit 16c5e363 from branch duncan/hybrid-packed-sequence-for-training-moe-june2025, which targets the training-moe-june2025 branch.

This is a cherry-pick of commit bfe73c8e from branch duncan/hybrid-packed-sequence-for-training-moe-june2025, which targets the training-moe-june2025 branch.

Phlip79 · 2026-01-16T02:02:29Z

/ok to test a2f0e20

Co-authored-by: Eric Harper <eharper@nvidia.com> Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>

duncanriach added 19 commits January 12, 2026 14:07

Add sequence packing support for hybrid model

941c056

This is a cherry-pick of commit cd0a3206 from branch duncan/hybrid-packed-sequence-for-training-moe-june2025, which targets the training-moe-june2025 branch.

Fix for packed_seq + CP>1 + PP>1

e84041a

This is a cherry-pick of commit 77e69a6a from branch duncan/hybrid-packed-sequence-for-training-moe-june2025, which targets the training-moe-june2025 branch.

Fix for packed_seq + PP>2

2c24331

This is a cherry-pick of commit 16c5e363 from branch duncan/hybrid-packed-sequence-for-training-moe-june2025, which targets the training-moe-june2025 branch.

Fix packed sequence info broadcast for PP>1

5064681

This is a cherry-pick of commit bfe73c8e from branch duncan/hybrid-packed-sequence-for-training-moe-june2025, which targets the training-moe-june2025 branch.

Resolve conflict with MR 3963 -> MR 4013

38c9bc9

Prevent training with packed sequences when use_mem_eff_path==False

5aab3b2

Remove statistics code

aa9c757

Improve function signatures wrt packed_seq_params

cdffa88

Add error check to prevent SFTDataset being used with GPTModel

b18c657

Improve comment

308ff65

Confirm that mamba-ssm and causal-conv1d support sequence packing

64941be

Fix packed sequence changes

ac7bafd

Add mamba and conv1d version checking for training

191aab1

Encapsulate utility function better. Add commit references

2a0d5ad

Fix small bug in get_batch

864001b

Make MambaMixer methods private

5955579

Fix typo in refactoring

9fd2216

Add unit test for hybrid model packed sequence

6230b63

Remove comment

5949222

duncanriach requested review from a team as code owners January 12, 2026 22:40

copy-pr-bot bot temporarily deployed to nemo-ci January 12, 2026 22:40 Inactive

ko3n1g requested a review from a team January 12, 2026 22:40

ko3n1g added this to the Core 0.16 milestone Jan 12, 2026

copy-pr-bot bot temporarily deployed to nemo-ci January 12, 2026 22:41 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 12, 2026 22:41 Failure

jaredcasper approved these changes Jan 15, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci January 15, 2026 21:39 Inactive

duncanriach enabled auto-merge January 15, 2026 21:40

rogerwaleffe approved these changes Jan 15, 2026

View reviewed changes

Merge branch 'main' into duncan/hybrid-packed-sequence-for-main

ebf2d31

copy-pr-bot bot temporarily deployed to nemo-ci January 15, 2026 22:17 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 15, 2026 22:17 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 15, 2026 22:17 Inactive

copy-pr-bot bot temporarily deployed to test January 15, 2026 22:18 Inactive

Fix GPT error check for packed sequence

fc28a1e

copy-pr-bot bot temporarily deployed to nemo-ci January 16, 2026 00:57 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 16, 2026 00:57 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 16, 2026 00:57 Inactive

copy-pr-bot bot temporarily deployed to test January 16, 2026 00:58 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 16, 2026 01:50 Failure

Merge branch 'main' into duncan/hybrid-packed-sequence-for-main

a2f0e20

copy-pr-bot bot temporarily deployed to nemo-ci January 16, 2026 02:02 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 16, 2026 02:02 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 16, 2026 02:02 Inactive

copy-pr-bot bot temporarily deployed to test January 16, 2026 02:03 Inactive

duncanriach added this pull request to the merge queue Jan 16, 2026

Merged via the queue into NVIDIA:main with commit 03c0727 Jan 16, 2026
43 of 47 checks passed

duncanriach deleted the duncan/hybrid-packed-sequence-for-main branch January 16, 2026 03:41

asolergi-nv mentioned this pull request Feb 5, 2026

Fix SFT Pipeline when TP>1 #3268

Merged

6 tasks

daiyaanarfeen pushed a commit to daiyaanarfeen/Megatron-LM that referenced this pull request Feb 23, 2026

Add sequence packing support for hybrid model (NVIDIA#2913)

21f6e62

Co-authored-by: Eric Harper <eharper@nvidia.com> Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sequence packing support for hybrid model#2913

Add sequence packing support for hybrid model#2913
duncanriach merged 24 commits intoNVIDIA:mainfrom
duncanriach:duncan/hybrid-packed-sequence-for-main

duncanriach commented Jan 12, 2026 •

edited

Loading

Uh oh!

Phlip79 commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

duncanriach commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

Phlip79 commented Jan 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

duncanriach commented Jan 12, 2026 •

edited

Loading

(Step 1): Add PR label `Expert Review`