Add MTP support for hybrid models#2363
Conversation
|
This pr implements three features: |
This is a good point. Let's go with the assertion for now re: 3. |
I added the assert, thanks! |
a94a868 to
e8304aa
Compare
|
/ok to test 9cc1668 |
This reverts commit a0cc8ca.
This reverts commit a0cc8ca.
|
For posterity, this PR was re-merged as #3207 with some bugfixes. |
Signed-off-by: adithyare <adithyare@nvidia.com>
Signed-off-by: adithyare <adithyare@nvidia.com>
Co-authored-by: Rabeeh Mahabadi <rkarimimahab@nb-hel-cs-001-vscode-02.cm.cluster> Co-authored-by: Sanjeev Satheesh <sasatheesh@nvidia.com> Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
This reverts commit 300d1b6.
What does this PR do ?
(1) supporting to use hybrid mamba models as mtp_model_layer.
(2) splitting the MTP loss calculation in the GPT model’s forward pass into a separate function.
(3) supporting MTP layer repetition.
Contribution process
flowchart LR A[Pre-checks] --> B[PR Tests] subgraph Code Review/Approval C1[Expert Review] --> C2[Final Review] end B --> C1 C2 --> D[Merge]Pre-checks
Core 0.8)Code review
The following process is enforced via the CODEOWNERS file for changes into
megatron/core. For changes outside ofmegatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.For MRs into `main` branch
(Step 1): Add PR label
Expert Review(Step 2): Collect the expert reviewers reviews
Expert Reviewlabel when your PR is ready for review.Final Review might get declined if these requirements are not fulfilled.
(Step 3): Final Review
Final Reviewlabel(Optional Step 4): Cherry-pick into release branch
If this PR also needs to be merged into
core_r*release branches, after this PR has been merged, selectCherry-pickto open a new PR into the release branch.For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.Merging your PR
Any member of core-adlr and
core-nemowill be able to merge your PR.