Skip to content

Add sequence packing support for hybrid model#2913

Merged
duncanriach merged 24 commits intoNVIDIA:mainfrom
duncanriach:duncan/hybrid-packed-sequence-for-main
Jan 16, 2026
Merged

Add sequence packing support for hybrid model#2913
duncanriach merged 24 commits intoNVIDIA:mainfrom
duncanriach:duncan/hybrid-packed-sequence-for-main

Conversation

@duncanriach
Copy link
Contributor

@duncanriach duncanriach commented Jan 12, 2026

What does this PR do ?

Adds sequence packing support for the hybrid model and also enhances --sft, SFTDataset, and pretrain_mamba.py to feed packed sequence data into the model.

Causes a regression in the functionality of --sft on the GPT model; this functionality will no longer be supported. To maintain this functionality, packed-sequence related changes would need to be made to pretrain_gpt.py following pretrain_mamba.py and, ideally, GPT model SFT packed-sequence convergence testing would need to be run. I believe this functionality will be getting added by PR 2282 from @parthmannan, which will be merged after this current PR.

This PR also does not contain functional tests for packed sequence support in the hybrid model. I believe that there are currently also no functional tests for packed sequence support in the GPT model.

See internal MR 4403

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]
Loading

Pre-checks

  • I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

This is a cherry-pick of commit cd0a3206 from branch
duncan/hybrid-packed-sequence-for-training-moe-june2025,
which targets the training-moe-june2025 branch.
This is a cherry-pick of commit 77e69a6a from branch
duncan/hybrid-packed-sequence-for-training-moe-june2025,
which targets the training-moe-june2025 branch.
This is a cherry-pick of commit 16c5e363 from branch
duncan/hybrid-packed-sequence-for-training-moe-june2025,
which targets the training-moe-june2025 branch.
This is a cherry-pick of commit bfe73c8e from branch
duncan/hybrid-packed-sequence-for-training-moe-june2025,
which targets the training-moe-june2025 branch.
@Phlip79
Copy link
Member

Phlip79 commented Jan 16, 2026

/ok to test a2f0e20

@duncanriach duncanriach added this pull request to the merge queue Jan 16, 2026
Merged via the queue into NVIDIA:main with commit 03c0727 Jan 16, 2026
43 of 47 checks passed
@duncanriach duncanriach deleted the duncan/hybrid-packed-sequence-for-main branch January 16, 2026 03:41
@asolergi-nv asolergi-nv mentioned this pull request Feb 5, 2026
6 tasks
daiyaanarfeen pushed a commit to daiyaanarfeen/Megatron-LM that referenced this pull request Feb 23, 2026
Co-authored-by: Eric Harper <eharper@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

complexity: high Final Review Apply this label to indicate that your PR is ready for final review.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants