[Megatron-FSDP] Test FP8 activations + parameter sharding with Megatron-FSDP fully-shard. Update README. by cspades · Pull Request #2894 · NVIDIA/Megatron-LM

cspades · 2026-01-10T00:49:56Z

What does this PR do ?

Add a unit test for Megatron-FSDP + FP8 Parameters for all existing FP8 recipes using fully_shard.
Refactor and add documentation in the README for exactly how to use FP8 parameters with fully_shard.
Nit: Refactor the fp8_model_init context manager to mixed_precision.py and try/except quantized_model_init for newer versions of TransformerEngine
Update inline commentary, which also documents some gaps in the support matrix with FP8, particularly that we only support FP8 parameters when fully-sharding the compute parameters, but do not currently universally support FP8 parameters for un-sharded, optimizer-sharded, or optimizer/gradient-sharded models. Only optim_grads_params.

Testing

Testing mostly done for #2239 using both Llama-8B in Megatron-LM and a toy TransformerEngine model with fully_shard, which is now tested in CI/CD.

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]

Pre-checks

I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
I have added relevant unit tests
I have added relevant functional tests
I have added proper typing to my code Typing guidelines
I have added relevant documentation
I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label `Expert Review`

(Step 2): Collect the expert reviewers reviews

Attach the Expert Review label when your PR is ready for review.
GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

Add Final Review label
GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch

The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

…on-FSDP fully-shard. Update README. (NVIDIA#2894) Signed-off-by: Cory Ye <cye@nvidia.com>

cspades self-assigned this Jan 10, 2026

cspades requested review from a team as code owners January 10, 2026 00:49

cspades added the Expert Review Apply this label to indicate that your PR is ready for expert review. label Jan 10, 2026

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 00:50 Inactive

ko3n1g added this to the Core 0.16 milestone Jan 10, 2026

copy-pr-bot bot had a problem deploying to nemo-ci January 10, 2026 00:50 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 00:50 Inactive

cspades force-pushed the cye/mfsdp-fp8-fully-shard branch from f5576e1 to 7445fd5 Compare January 10, 2026 00:52

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 00:52 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 10, 2026 00:52 Failure

cspades force-pushed the cye/mfsdp-fp8-fully-shard branch from 7445fd5 to 7d246d1 Compare January 10, 2026 00:53

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 00:53 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 10, 2026 00:53 Failure

cspades force-pushed the cye/mfsdp-fp8-fully-shard branch from 7d246d1 to a5b3546 Compare January 10, 2026 00:57

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 00:57 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 10, 2026 00:57 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 00:57 Inactive

copy-pr-bot bot temporarily deployed to test January 10, 2026 00:57 Inactive

cspades force-pushed the cye/mfsdp-fp8-fully-shard branch from a5b3546 to 7652096 Compare January 10, 2026 01:03

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 01:04 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 10, 2026 01:04 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 10, 2026 01:04 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 15, 2026 19:58 Failure

copy-pr-bot bot temporarily deployed to test January 15, 2026 19:59 Inactive

Merge branch 'main' into cye/mfsdp-fp8-fully-shard

7203c2a

copy-pr-bot bot temporarily deployed to nemo-ci January 15, 2026 20:00 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 15, 2026 20:01 Failure

copy-pr-bot bot temporarily deployed to nemo-ci January 15, 2026 20:01 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 15, 2026 20:01 Failure

copy-pr-bot bot temporarily deployed to test January 15, 2026 20:01 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci January 15, 2026 20:59 Inactive

jaredcasper approved these changes Jan 16, 2026

View reviewed changes

Merge branch 'main' into cye/mfsdp-fp8-fully-shard

61f10af

copy-pr-bot bot temporarily deployed to nemo-ci January 20, 2026 01:55 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci January 20, 2026 01:55 Failure

copy-pr-bot bot temporarily deployed to test January 20, 2026 01:56 Inactive

ericharper approved these changes Jan 20, 2026

View reviewed changes

copy-pr-bot bot temporarily deployed to nemo-ci January 20, 2026 02:01 Inactive

Merge branch 'main' into cye/mfsdp-fp8-fully-shard

e724add

copy-pr-bot bot temporarily deployed to nemo-ci January 20, 2026 02:23 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci January 20, 2026 02:24 Inactive

copy-pr-bot bot temporarily deployed to test January 20, 2026 02:24 Inactive

cspades added this pull request to the merge queue Jan 20, 2026

Merged via the queue into NVIDIA:main with commit 517dfd4 Jan 20, 2026
46 of 48 checks passed

cspades deleted the cye/mfsdp-fp8-fully-shard branch January 20, 2026 05:26

daiyaanarfeen pushed a commit to daiyaanarfeen/Megatron-LM that referenced this pull request Feb 23, 2026

[Megatron-FSDP] Test FP8 activations + parameter sharding with Megatr…

87d5f3f

…on-FSDP fully-shard. Update README. (NVIDIA#2894) Signed-off-by: Cory Ye <cye@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Megatron-FSDP] Test FP8 activations + parameter sharding with Megatron-FSDP fully-shard. Update README.#2894

[Megatron-FSDP] Test FP8 activations + parameter sharding with Megatron-FSDP fully-shard. Update README.#2894
cspades merged 11 commits intoNVIDIA:mainfrom
cspades:cye/mfsdp-fp8-fully-shard

cspades commented Jan 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

cspades commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Testing

Contribution process

Pre-checks

Code review

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

(Step 3): Final Review

(Optional Step 4): Cherry-pick into release branch

Merging your PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cspades commented Jan 10, 2026 •

edited

Loading

(Step 1): Add PR label `Expert Review`