Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
358 commits
Select commit Hold shift + click to select a range
f3cbf03
ci: Restore `gpt3_moe_mcore_te_tp4_ep2_etp2_pp2_scoped_cudagraph`
ko3n1g Nov 4, 2025
ecbfe70
chore: Fix autoformatter (#2073) (#2134)
ko3n1g Nov 4, 2025
79e8592
add device and dtype to empty inv_dt init (#2137)
maanug-nv Nov 5, 2025
3c1b98e
Remove DS-V3 doc - draft being updated (#2155)
sbhavani Nov 6, 2025
b2fdd94
[DEV] torch_dist fixes, speed improvements and memory reduction for l…
FDecaYed Nov 6, 2025
6cc224d
[Dev] Fix Qwen3-Next hang on Blackwell, add a flag to control torch.c…
yuzhongw-nvidia Nov 6, 2025
f4bd87e
[dev] Fix cuda graph scope check in `language_model.py` (#2158)
ananthsub Nov 6, 2025
3207c23
[Dev] Remove experimental tags for fused kernels (#2172)
Victarry Nov 10, 2025
ae4cda5
Merge branch 'main' into dev
FDecaYed Nov 11, 2025
3cbe5c6
[DEV] pull changes from main(f150f42e3929f7f2171e3687e67990332a76285b…
ko3n1g Nov 11, 2025
6b59d71
fix(transformer_config): Initialize cuda_graph_scope if not set regar…
cuichenx Nov 11, 2025
442a7f2
[Dev] fix(offloading): Accuracy mismatch when offloading and recomput…
lhb8125 Nov 11, 2025
6b01330
Ko3n1g/chore/update dev release settings (#2099)
ko3n1g Nov 11, 2025
d7d71e0
Merge remote-tracking branch 'github/main' into dev
ko3n1g Nov 11, 2025
56019e6
[DEV] Cherry-pick: M4 + Dist Checkpoint: Replace global parallel stat…
yaoyu-33 Nov 11, 2025
2c2ee22
[Dev] Remove redundant reduce in aux_loss logging (#2094)
BestJuly Nov 12, 2025
b7c1e75
[DEV] Make CUDA graph compatible with FP8 params (tensorwise & blockw…
kunlunl Nov 12, 2025
ca68395
remove workflow
ko3n1g Nov 12, 2025
1d502cd
[Dev] Reduce Overhead in Timers (#2208)
yaox12 Nov 12, 2025
a2048c8
Revert "[DEV] Cherry-pick: M4 + Dist Checkpoint: Replace global paral…
ko3n1g Nov 12, 2025
a2a1c89
[Dev] replay: Cherry-pick: M4 + Dist Checkpoint: Replace global paral…
yaoyu-33 Nov 12, 2025
8427584
[Dev]Revert torch ckpt format change for LayerwiseDistOpt (#2228)
BoxiangW Nov 13, 2025
7020e1f
[Dev] Add more tests for LayerwiseDistOpt with dist_ckpt (#2132)
BoxiangW Nov 13, 2025
693587d
[Dev] Add muon golden value (#2247)
BoxiangW Nov 14, 2025
b55a544
bump deps
ko3n1g Nov 14, 2025
bfbf13f
Merge remote-tracking branch 'github/dev' into ko3n1g/chore/main-to-dev
ko3n1g Nov 14, 2025
658931e
ci: Create weekly dev branch (#2223)
ko3n1g Nov 14, 2025
1211348
[20251111] Ko3n1g/chore/main to dev (#2211)
ko3n1g Nov 14, 2025
71fa2e6
Revert "[20251111] Ko3n1g/chore/main to dev (#2211)" (#2266)
chtruong814 Nov 17, 2025
0bf9ff9
chroe: [Dev]Disable muon test for now (#2275)
BoxiangW Nov 17, 2025
565202f
Fixes of Merge main into dev
ko3n1g Nov 17, 2025
4b78163
Replay: [20251111] Ko3n1g/chore/main to dev (#2267)
ko3n1g Nov 17, 2025
d1a31a3
[Dev] MuonClip support (non-split version) on dev branch (#2194)
BoxiangW Nov 18, 2025
7968d5f
[dev] Add assertion for mxfp8 params without dp overlap (#2270)
kunlunl Nov 18, 2025
d09482c
[DEV] Save memory using main_param for moe in param_l2_norm (#2234)
BestJuly Nov 18, 2025
7da6e5b
[DEV][NVFP4] Fix NVFP4 Selective Activation Recompute (#2036)
zhongbozhu Nov 18, 2025
ca4c03e
[DEV] Fix aux loss scale when cp enabled (#2217)
Victarry Nov 18, 2025
157bec9
[Community][Dev] feat(moe): Adding context parallel support to eager …
nrailg Nov 18, 2025
5c1d294
[HOT FIX] Fix bug of hybrid-ep backend in flex-dispatcher (#2287)
Autumn1998 Nov 18, 2025
2782acf
Ko3n1g/ci/golden values weeklies (#2279)
ko3n1g Nov 18, 2025
dc9a38d
[DEV] Add support of fake distributed process group (#2254)
Victarry Nov 18, 2025
a8fc591
Cherrypick CI changes between 20251111 - 20251118 (#2292)
ko3n1g Nov 18, 2025
d547462
[DEV] Update emerging optimizers (#2261)
skyw Nov 18, 2025
056ebc5
ci(hotfix): Do not run on main/dev
ko3n1g Nov 18, 2025
5880674
[dev] ci(moe): Add a functional test case for Qwen3Next-specific feat…
yuzhongw-nvidia Nov 19, 2025
a4fce1d
[DEV] fix layerwise torch_dist checkpointing fails due to empty rank …
FDecaYed Nov 20, 2025
c6e2b29
[Dev] fix(megatron-fsdp): Resolve hang caused by non-deterministic re…
xuwchen Nov 20, 2025
c6f277a
ci: Disable flaky unit test (#2338)
ko3n1g Nov 20, 2025
716bb4a
feat: check: api backwards compatibility [dev] (#2341)
pablo-garay Nov 20, 2025
7b8e39e
Revert "[Dev] fix(megatron-fsdp): Resolve hang caused by non-determin…
ko3n1g Nov 20, 2025
cb88c6e
ci: Upload to testpypi only on main (#2342) (#2343)
ko3n1g Nov 21, 2025
c241d0c
Reapply "[Dev] fix(megatron-fsdp): Resolve hang caused by non-determi…
ko3n1g Nov 21, 2025
31f5049
feat: required check adjustment (#2349)
pablo-garay Nov 21, 2025
56682f8
[DEV] pull main Nov 25 (#2395)
FDecaYed Nov 28, 2025
b9c48ec
adding action for checking whether PR author is nvidia employee or no…
theothermike Nov 25, 2025
3aa0c4e
fix: exit failure when PR author is external contributor removed (#2410)
theothermike Nov 26, 2025
b750bdb
fix: adding k8s taints for ephermeral jobs (#2420)
theothermike Nov 27, 2025
c12909b
ci: Enable functional tests (#2419)
ko3n1g Nov 27, 2025
44933d7
Reapply "build: Upgrade deps (NVIDIA#2289)" (#2408)
ko3n1g Nov 27, 2025
98c64b2
fix: use a script to do node tainting in the cicd workflow (#2421)
theothermike Nov 27, 2025
c8fb49e
cp: CI changes until 20251128 (#2426)
ko3n1g Nov 28, 2025
03150b4
Revert "[DEV] pull main Nov 25 (#2395)"
ko3n1g Nov 28, 2025
6ca67bc
[Dev] Support packed seq in MTP (#2043)
BestJuly Dec 1, 2025
11caf01
Fix runaway Etpt in straggler detector by resetting FLOPs accumulator…
sbhavani Dec 1, 2025
92c8482
[Dev] feat(MoE): Refactor cuda_graph_scope - part2 (#2353)
buptzyb Dec 1, 2025
b0c96b3
[dev] DeepSeek V3.2 support (#2154)
kunlunl Dec 1, 2025
71357e2
Revert "[Dev] feat(MoE): Refactor cuda_graph_scope - part2 (#2353)"
ko3n1g Dec 1, 2025
fdcb0a4
Replay "[Dev] feat(MoE): Refactor cuda_graph_scope - part2 (#2353)" (…
buptzyb Dec 2, 2025
14b19b1
[Dev] Optimize TE cudagraph input memory (#2391)
buptzyb Dec 2, 2025
b0f5746
Fix HSDP Registering Device Mesh (#2388)
tomlifu Dec 2, 2025
5375ad4
fix: update baseline (#2468)
pablo-garay Dec 2, 2025
79660b7
fix: Add merge_group support with pre-flight pattern (#2469)
pablo-garay Dec 2, 2025
d72b218
DeepSeek V3 FSDP Fix for Precision-Aware Optimizer (#2204)
tomlifu Dec 3, 2025
436065a
[Dev] fix(moe): minor refactor for fine-grained activation offloading…
lhb8125 Dec 3, 2025
a4bee49
[Dev] feat: m4 leftover changes (#2226)
yaoyu-33 Dec 4, 2025
ad5a222
feat: add decorator: experimental_api (#2546)
pablo-garay Dec 4, 2025
7d17116
feat: API compat: ignore AttributeChangedValueBreakage (not a signatu…
pablo-garay Dec 4, 2025
274e04d
[Dev] Hybrid Data x Context Parallelism Feature (#2054)
parthmannan Dec 4, 2025
87ac13d
update API compat check baseline to 274e04d (#2548)
pablo-garay Dec 4, 2025
f0c1b55
feat: mcore trigger mbridge (#2340) (#2552)
pablo-garay Dec 5, 2025
8de5a7f
[Dev] Optimize TE CUDA Graph capturing time (#2483)
buptzyb Dec 5, 2025
1f08ceb
[Dev] Feature: linear cross entropy fusion (#2256)
Jianbing-D Dec 5, 2025
9cf6838
Fix gpt_layer_spec for frequently linear attention (#2481)
yuzhongw-nvidia Dec 5, 2025
89fe895
Skip trainloader when `args.skip_train` is True (#2501)
Niccolo-Ajroldi Dec 5, 2025
a6d86a6
[DEV] fixes for muon(qwen3-next, ep multi-adam) (#2564)
FDecaYed Dec 5, 2025
aee4a74
[Dev] remove fp16 assert in moe_grouped_gemm & EP (#2494)
HaochenYuan Dec 8, 2025
dfe4da2
Update tp support in muon (#2385)
skyw Dec 8, 2025
1d462bd
[DEV] Update GitHub MoE functional test cases (#2449)
Victarry Dec 8, 2025
23e092f
Fix: don't enter branch if mtp_num_layers == 0 (#2581)
rj42 Dec 9, 2025
c60d5c2
[Dev] fix(moe): Support HybridEP and reduce memory overhead for 1F1B …
lhb8125 Dec 10, 2025
4db2f11
Merge branch 'main' into dev
FDecaYed Dec 10, 2025
ed804b4
[dev] pull main 1201 (#2448)
ko3n1g Dec 11, 2025
2d398b4
chore: Bump baseline (#2626)
ko3n1g Dec 11, 2025
e8a9275
[Dev] Use the latest Hybrid-EP (#2424)
Autumn1998 Dec 12, 2025
305957a
API compat: ignore ParameterMovedBreakage for __init__ methods (#2649)
pablo-garay Dec 12, 2025
e93814b
[training migration] add training config dataclass and arg generation…
maanug-nv Dec 16, 2025
288b8ea
[Dev] Optimize TE CUDA Graph _get_sample_arguments() Time (#2568)
buptzyb Dec 17, 2025
0eec631
Reopen qwen3next functional test in lightweight mode (#2493)
yuzhongw-nvidia Dec 17, 2025
2ebff67
[Dev] Fix CUDA RNG Tracker (#2640)
buptzyb Dec 17, 2025
368e580
[Dev] Mark API backwards compatibility checks as OPTIONAL (non-blocki…
pablo-garay Dec 17, 2025
3714d81
[Dev] FP8 params support for megatron-fsdp (MXFP8/Blockwise) (#2086)
kunlunl Dec 18, 2025
a935008
[Dev] Feat(moe): Gated delta net context parallel (CP) (#2614)
yuzhongw-nvidia Dec 19, 2025
fd932c9
ci: Gridify test configs (#2707)
ko3n1g Dec 19, 2025
2b1fc70
Revert "[dev] Add assertion for mxfp8 params without dp overlap (#2270)"
ko3n1g Dec 22, 2025
4665be4
Revert "[Dev] Use the latest Hybrid-EP (#2424)" (#2732)
ko3n1g Dec 22, 2025
46b5505
[Dev] Fix ep overlap missing final layernorm (#2691)
Wohox Dec 23, 2025
0b6714e
[Dev] Remove calculation of padding token in moe routing loss (#2121)
HaochenYuan Dec 24, 2025
1068d77
Revert "[Dev] Remove calculation of padding token in moe routing loss…
chtruong814 Dec 24, 2025
9885ddb
[Dev] Disable ep overlap memory optimization (#2750)
Wohox Dec 30, 2025
14c35dc
Merge branch 'main' into dev
FDecaYed Dec 30, 2025
929e77f
feat: Cherry-pick PR of PR!2661 for dev branch (#2757)
youngeunkwon0405 Dec 30, 2025
b361561
Merge branch 'dev' into deyuf/dev_pull_main_1217_test
FDecaYed Dec 31, 2025
922e8e9
cp: Allow disabling external contributors (#2784) (#2786)
chtruong814 Dec 31, 2025
5455f0a
build: Pin down `nvidia-nvshmem-cu13` (#2798)
ko3n1g Jan 3, 2026
71d5c84
[dev] Fix bug of reuse_grad_buf_for_mxfp8_param_ag (#2801)
kunlunl Jan 5, 2026
8b93e0d
[Dev] Partial CUDA Graph support for EP Overlap (#2168)
Wohox Jan 5, 2026
c1045f6
Revert "[Dev] FP8 params support for megatron-fsdp (MXFP8/Blockwise) …
ko3n1g Jan 5, 2026
bd06945
Revert "[Dev] Partial CUDA Graph support for EP Overlap (#2168)"
ko3n1g Jan 5, 2026
29ffe43
Merge branch 'dev' into deyuf/dev_pull_main_1217_test
FDecaYed Jan 5, 2026
d8464fc
PR for testing pull main 1217 (#2716)
ko3n1g Jan 5, 2026
dfa6cc1
[Dev] Remove calculation of padding token in moe routing loss (#2754)
HaochenYuan Jan 6, 2026
5823534
[dev] Reapply fsdp mxfp8 (#2828)
kunlunl Jan 6, 2026
1ec0beb
[Dev] Partial CUDA Graph support for EP Overlap (#2810)
Wohox Jan 6, 2026
0bc4114
[Dev] fix EP Overlap Partial Cuda Graph Unit Test hang issue (#2838)
Wohox Jan 7, 2026
28c586e
build: Bump jet-client (#2877)
ko3n1g Jan 8, 2026
46d1f47
FP8 attention knob for nvFP4 recipe (#2818)
vasunvidia Jan 9, 2026
ed6ebff
[DEV][NVFP4][MOE] 128 Zero Padding for Grouped Quantization kernels a…
zhongbozhu Jan 9, 2026
ebe7079
Add check for full_iteration scope before instantiating CudaGraphMana…
vasunvidia Jan 9, 2026
736da3c
Reapply "[Dev] Use the latest Hybrid-EP (#2423)" (#2867)
ko3n1g Jan 9, 2026
9d741cf
build: Main dependency bump for 26.02 (#2682)
ko3n1g Jan 12, 2026
de866fa
ci(fix): Update golden values (#2921)
ko3n1g Jan 13, 2026
ae3dbc0
ci(hotfix): Re-add `gpt3_mcore_te_tp4_pp2_resume_torch_dist_reshard_8…
ko3n1g Jan 13, 2026
583dd58
ci: Skip broken tests after dependency update (#2935)
chtruong814 Jan 13, 2026
b0a702b
Cherry-pick optimizer override refactor from #2723 (#2835)
yaoyu-33 Jan 14, 2026
1964d39
ci(hotfix): Disable gpt_grpo_tp1_pp1_dp8_583m_throughputtest
ko3n1g Jan 14, 2026
383505c
[dev]: ci: Onboard GB200 (#2922)
ko3n1g Jan 14, 2026
ab3ae8a
ci(hotfix): Repair recipe
ko3n1g Jan 14, 2026
dce8e88
Fix clip_qk for virtual pipeline size > 1 (#2776)
juntaowww Jan 15, 2026
748ab80
ci(hotfix): GB200 to nightly
ko3n1g Jan 15, 2026
a32b198
ci(fix): GB200 racecondition (#2962)
ko3n1g Jan 15, 2026
7c6c4e9
Revert "ci(fix): GB200 racecondition (#2962)"
ko3n1g Jan 15, 2026
619115a
ci: Fix GB200 change (#2969) (#2974)
ko3n1g Jan 16, 2026
b395016
[Dev] TE cudagraph recompute (#2694)
buptzyb Jan 16, 2026
b927e1f
[Dev] docs(megatron-fsdp): add Megatron-FSDP user guide (#2397)
xuwchen Jan 16, 2026
6b157e0
[Dev] Optimizer State and Master Weight Offloading (#2760)
hxbai Jan 16, 2026
8ac3a9f
Revert "[Dev] Optimizer State and Master Weight Offloading (#2760)" (…
ko3n1g Jan 16, 2026
bd8411c
Forced load imbalance (#2917)
nanz-nv Jan 19, 2026
0a2e01f
[Dev] [Reapply] Optimizer State and Master Weight Offloading (#2987)
hxbai Jan 19, 2026
8abc086
ci(fix): CI_COMMIT_BRANCH on forks (#2982) (#2989)
ko3n1g Jan 19, 2026
5b17f19
[Dev] Update MoE readme. (#2808)
Victarry Jan 19, 2026
9ea50a9
feat: add routing replay for Mcore (#2693)
litianjian Jan 20, 2026
ac9f665
[dev] feat(moe): Support apply wd to qk layernorm for Qwen3-Next (#2825)
yuzhongw-nvidia Jan 21, 2026
6e2153b
[dev] feat(moe): Cherry-pick #1989 back to dev (#3011)
yuzhongw-nvidia Jan 21, 2026
68e5fec
[Dev]feat(moe): code refactor for fine grained activation offloading …
lhb8125 Jan 22, 2026
6807df4
[Dev] [fix] Bug fix for offloading in evaluate() (#3041)
lhb8125 Jan 22, 2026
b3bba3f
ci: Log node name (#3081) (#3082)
ko3n1g Jan 26, 2026
a4e3fb3
[dev] pull main 260122 (#3045)
FDecaYed Jan 27, 2026
420aa6a
ci: Skip test_precision_aware_optimizer (#3062)
thomasdhc Jan 23, 2026
da56650
Merge branch 'main' into deyuf/dev_pull_main_260122_fix_git
FDecaYed Jan 27, 2026
08357d8
[dev] fix git history for dev pull main 260122 (#3094)
ko3n1g Jan 27, 2026
0f82f05
[dev] fixes for pull main 260122 (#3103)
FDecaYed Jan 28, 2026
0ceb698
ci: Disable broken test (#3121)
ko3n1g Jan 28, 2026
f6f2abe
[Dev] Param offset in _ParamAndGradBucket should be aligned (#3010)
BestJuly Jan 29, 2026
d587dd1
[Dev] fix cg missing wgrad hook (#2999)
Wohox Jan 29, 2026
8f8f735
[Megatron-FSDP] Add fsdp_all_gather_in_start_param_sync option in DDP…
shjwudp Jan 29, 2026
bde9e32
[Dev] Support EP with HSDP (#2800)
wplf Jan 29, 2026
27fcfb2
Cherrypick CI improvements to dev branch (#3118)
ko3n1g Jan 29, 2026
a9fb6c8
Merge branch 'main' into deyuf/dev_pull_main_260130
FDecaYed Jan 30, 2026
55e3a0a
[dev] ci: Add DSv3 proxy (#3144)
ko3n1g Jan 30, 2026
a78ae49
[dev] ci: Fix DSv3 (#3187)
ko3n1g Jan 31, 2026
9375be4
Fix: nccl-ub in ddp path (#3181)
youngeunkwon0405 Feb 1, 2026
0f73a8a
[dev] perf(moe): Refine gated delta net implementation (#3040)
yuzhongw-nvidia Feb 2, 2026
5035cbe
[Dev] Add the missing part to support 1F1B overlap for Qwen3-Next (#2…
BestJuly Feb 2, 2026
4aac3fe
Use the latest hybrid-ep (#3092)
Autumn1998 Feb 2, 2026
bfa1d31
[BUG FIX] Try to enable cuda graph ut (#3192)
Autumn1998 Feb 2, 2026
13ad653
[Dev] Fix Linear-Cross-Entropy Convergence Issue (#2739)
shjwudp Feb 3, 2026
b8b8662
Revert "[Dev] Fix Linear-Cross-Entropy Convergence Issue (#2739)" (#3…
chtruong814 Feb 3, 2026
2ab74ab
Fix missing PackedSeqParams import (#3215)
parthmannan Feb 3, 2026
20e8ac8
fix merge main issues
FDecaYed Jan 30, 2026
77b5a3d
[dev] pull main 260130 (#3166)
ko3n1g Feb 3, 2026
c5b282b
ci(hotfix): Pin uv (#3233) (#3234)
ko3n1g Feb 3, 2026
8a29fd5
[DEV] Reapply fix Linear CE Fusion (#3226)
shjwudp Feb 4, 2026
dd17acc
Missing import fix (#3242)
parthmannan Feb 4, 2026
fa5bcf6
[Dev] Fix EP Overlap Bugs for Full-Iter CG (#3163)
Wohox Feb 4, 2026
a592819
[Refactor] Decouple topk and loss from DSA Indexer (#3013)
laixinn Feb 4, 2026
54f4feb
cp: Fix uv install for GH actions (#3259) (#3261)
chtruong814 Feb 5, 2026
ef336ca
[Dev] Fix EP Overlap missing record stream for shared expert (#3244)
Wohox Feb 5, 2026
ec94d63
Restore missing linear-cross-entropy option accidentally removed from…
shjwudp Feb 6, 2026
500e080
Fix reload_model_params failure when loading MoE models with explicit…
eternally-z Feb 9, 2026
433c169
ci: Disable moe20 tests (#3312)
ko3n1g Feb 9, 2026
fd4801e
ci: Pin down setuptools to lt 82 (#3316)
ko3n1g Feb 9, 2026
52eabf0
[None][Fix] Prevent resource leak warnings (#3216)
IanBoyanZhang Feb 10, 2026
c0030d6
[Dev] Fix backward dw dependency (#3338)
Wohox Feb 10, 2026
2c2e749
ci: Rely exclusively on GitHub CI (#3341)
ko3n1g Feb 10, 2026
98f6f81
[dev] ci: skip queue in merge-gate (#3344)
ko3n1g Feb 10, 2026
28b130f
Revert "[None][Fix] Prevent resource leak warnings (#3216)" (#3366)
ko3n1g Feb 11, 2026
e868e8f
ci: Fix dev branch merge queue (#3397)
chtruong814 Feb 13, 2026
c4b910f
[Dev] Add Qwen3-VL support with Megatron-FSDP (#2842)
xuwchen Feb 13, 2026
6059f36
Add absorbed-mla (#3193)
kunlunl Feb 13, 2026
9f2ca96
cp: Remove gpu sanity check (#3420) into dev (#3421)
chtruong814 Feb 13, 2026
1dcf0da
[dev] ci: Fix merge queue (#3385)
ko3n1g Feb 14, 2026
cd1c215
[dev] `cp: Cherrypick CI changes` (#3543)
ko3n1g Feb 23, 2026
aa86018
[Dev] Fix MoE aux loss tracker hang with MTP enabled (#3400)
Victarry Feb 25, 2026
2b4b9c4
ci: Remove multi-approval action from dev branch (#3576)
chtruong814 Feb 25, 2026
0ab47fa
Merge branch 'main' into dev
FDecaYed Feb 26, 2026
a1a73f8
[dev] pull main 260220 (#3574)
ko3n1g Feb 26, 2026
2e4a5d4
[dev] fix(moe): fix the bug where gate was not sliced when kv_head < …
LiuXTao Feb 27, 2026
d0e0cf0
Add unit test for THD (#3608)
kunlunl Feb 28, 2026
bc9298c
[Dev] feat(checkpoint): zero-copy storage sharing in CheckpointWithou…
Victarry Mar 2, 2026
5c613ab
[Dev] Add E2E support for THD format (#2924)
xiaoyao0115 Mar 3, 2026
5dadaf1
fix: skip FSDP DTensor boundary validation under fake process group (…
Victarry Mar 4, 2026
2176c4a
ci: Remove cudagraph codeowners entry in dev branch (#3712)
chtruong814 Mar 5, 2026
31f5294
[dev] refactor to support emerging optimizers beyond muon (#3618)
FDecaYed Mar 5, 2026
a268231
[Dev] Move some processing into a function so can be compiled (#3220)
BestJuly Mar 5, 2026
f983b21
[Dev] Refactor MoE loss logging (#2569)
yanring Mar 5, 2026
0b0074e
[dev] feat(mHC): Add basic pytorch implementation of manifold hyper c…
jingqiny-99 Mar 6, 2026
597f0d8
[Dev] Cherry-pick: M-FSDP: Cancel erroneous grad accumulation check (…
Victarry Mar 6, 2026
3d097e5
[dev] fix(moe): Fix DSA spec and rope. (#3402)
yuzhongw-nvidia Mar 6, 2026
1edfbd6
Fix split_state_dict function for MoE models (#3667)
eternally-z Mar 10, 2026
28a0aef
Exposing interleave argument for fused_apply_rotary_pos_emb_thd (#3759)
huvunvidia Mar 10, 2026
15fb557
build: Move fast-hadmard-transform (#3786)
ko3n1g Mar 11, 2026
dbf6c4c
fix ddp bug when --overlap-grad-reduce and --num-distributed-optimi f…
wplf Mar 11, 2026
cde56a4
[Dev] Fix for rope when enabling THD + Dynamic-CP; and use the naming…
xiaoyao0115 Mar 11, 2026
9374a4d
Continue emerging optimizer refactoring (#3737)
skyw Mar 12, 2026
f47ad91
Fix emerging optimizer init_group for ckpt loading (#3897)
FDecaYed Mar 17, 2026
74124ba
fix cg acess issue by using dict instead of list to iteratively acces…
ilml Mar 17, 2026
51299c5
Enhance rotary positional embedding version checks (#3887)
huvunvidia Mar 17, 2026
7c3eea6
[DEV] fix(megatron-fsdp): build expt_device_mesh only for MoE models …
xuwchen Mar 17, 2026
a9e5bf9
[Fix][Dev] Missing Assertion for moe layer recomptue in A2A Overlap (…
Wohox Mar 18, 2026
ebf1508
ci: Fix sso users check (#3937)
chtruong814 Mar 19, 2026
8ae70d4
Add more emerging optimizers (#3907)
skyw Mar 19, 2026
c72c459
Support GEMM + Swiglu fused MLP (#3890)
ksivaman Mar 20, 2026
0296101
[Dev] Support EP Overlap's Dynamic Computation Stream For Full-Iter C…
Wohox Mar 25, 2026
4108d68
[dev] mHC kernel fusion (#3828)
jingqiny-99 Mar 25, 2026
79aeecf
Merge remote-tracking branch 'upstream/main' into tolong/sync-main-to…
ilml Mar 25, 2026
0e53b30
fix: correct H2->H4 header skips in router_replay.md
ilml Mar 25, 2026
076d20f
fix: add missing tensor_parallel import in absorbed_mla.py
ilml Mar 25, 2026
0961196
fix: correct import ordering for tensor_parallel in absorbed_mla
ilml Mar 25, 2026
6823637
fix layerwise related merge error due to dev refactor
FDecaYed Mar 30, 2026
0c306dc
[Dev][feat] Support CUDA Graph capture offloading modules (#3219)
lhb8125 Mar 30, 2026
9c0b6ef
update golden value for gpt3_moe_mcore_te_tp4_ep2_etp2_pp2_resume_tor…
FDecaYed Mar 30, 2026
f36257c
Merge branch 'dev' into pull-request/4031
FDecaYed Mar 30, 2026
4ef64eb
Sync main into dev (#4031)
ko3n1g Mar 30, 2026
2bb0d38
[Dev] Fix golden values mismatch and dependency error due to last pul…
Victarry Apr 3, 2026
8d1fd3c
[Dev] Skip routed expert padding for graph-safe MoE (#4071)
zhongbozhu Apr 3, 2026
74751c9
[DEV] Minor update optimizer (#4082)
skyw Apr 7, 2026
ab6c0ff
TE fused grouped mlp with grouped bias and delayed wgrad (#4095)
ksivaman Apr 7, 2026
37a4cee
[Dev][feat] Support overlapping A2A Combine backprop with wgrad GEMM …
Wohox Apr 7, 2026
b165580
support GDN packed sequence
yuzhongw-nvidia Dec 12, 2025
cebd475
Fix several bugs
yuzhongw-nvidia Jan 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 5 additions & 54 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,43 +1,4 @@
megatron/core/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/models/gpt/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/multimodal/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/multi-modal

megatron/core/models/mamba/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba
megatron/core/ssm/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/tokenizers/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/tokenizers

megatron/core/distributed/fsdp/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/transformer/fsdp_dtensor_checkpoint.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/dist_checkpointing/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-checkpointing

megatron/core/optimizer/distrib_optimizer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer

megatron/core/inference/modelopt_support @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/quantization-and-inference

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/pipeline_parallel/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/pipeline-parallelism

megatron/core/transformer/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/transformer/moe/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mixture-of-experts-adlr @NVIDIA/mixture-of-experts-devtech

megatron/core/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference

megatron/core/parallel_state.py @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/post_training/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/post_training/ @NVIDIA/post-training

megatron/core/transformer/cuda_graphs.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/cuda-graphs
* @NVIDIA/core-nemo @NVIDIA/core-devtech

megatron/training/ @NVIDIA/training-adlr @NVIDIA/training-nemo
megatron/training/arguments.py
Expand All @@ -46,19 +7,9 @@ megatron/training/arguments.py
.github/ @NVIDIA/ci
.gitlab-ci.yml @NVIDIA/ci
docker/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci
tests/test_utils/python_scripts/
tests/functional_tests/python_test_utils/ @NVIDIA/ci
tests/functional_tests/shell_test_utils/ @NVIDIA/ci
tests/test_utils/recipes/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci

# API Backwards Compatibility Check
scripts/check_api_backwards_compatibility.py @NVIDIA/ci
scripts/README_API_COMPAT.md @NVIDIA/ci
.github/workflows/check_api_backwards_compatibility_workflow.yml @NVIDIA/ci
docs/api-backwards-compatibility-check.md @NVIDIA/ci
tests/unit_tests/test_api_backwards_compat_setup.py @NVIDIA/ci

megatron/rl/ @NVIDIA/reinforcement-learning
examples/rl/ @NVIDIA/reinforcement-learning
test/unit_tests/test_rl_utils.py @NVIDIA/reinforcement-learning
train_rl.py @NVIDIA/reinforcement-learning
pyproject.toml @NVIDIA/ci
uv.lock @NVIDIA/ci
4 changes: 2 additions & 2 deletions .github/workflows/cicd-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ jobs:
IS_MERGE_GROUP: ${{ github.event_name == 'merge_group' }}
SCHEDULED_JOB: ${{ github.event_name == 'schedule' }}
run: |
# Skip SSO check for scheduled jobs, main branch, or merge groups
if [ "${{ env.SCHEDULED_JOB }}" == "true" ] || [ "${IS_MAIN_BRANCH}" == "true" ] || [ "${IS_MERGE_GROUP}" == "true" ]; then
# Skip SSO check for scheduled jobs, main branch, dev branch, or merge groups
if [ "${{ env.SCHEDULED_JOB }}" == "true" ] || [ "${IS_MAIN_BRANCH}" == "true" ] || [ "${IS_DEV_BRANCH}" == "true" ] || [ "${IS_MERGE_GROUP}" == "true" ]; then
echo "is_maintainer=true" | tee -a $GITHUB_OUTPUT
exit 0
fi
Expand Down
129 changes: 129 additions & 0 deletions .github/workflows/mirror-to-main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
name: Mirror Dev to Main

on:
push:
branches:
- "pull-request/[0-9]+"

jobs:
cherry-pick-to-main:
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
token: ${{ secrets.PAT }}

- name: Get PR info
id: get-pr-info
uses: nv-gha-runners/get-pr-info@main

- name: Configure Git
run: |
git config --global user.email "github-actions[bot]@users.noreply.github.com"
git config --global user.name "GitHub Actions Bot"

- name: Cherry-pick to main
env:
GH_TOKEN: ${{ secrets.PAT }}
run: |
set -x

PR_NUMBER=${{ fromJSON(steps.get-pr-info.outputs.pr-info || '{}').number }}
BASE_REF="${{ fromJSON(steps.get-pr-info.outputs.pr-info).base.ref }}"
HAS_MIRROR_MAIN_LABEL=$(gh pr view $PR_NUMBER --json labels | jq '[.labels[].name] | any(. == "mirror-to-main")' || echo "false")
TARGET_BRANCH="cherry-pick-$PR_NUMBER-into-main"

# Skip if not labeled with mirror-to-main
if [ "$HAS_MIRROR_MAIN_LABEL" != "true" ]; then
echo "PR is not labeled with mirror-to-main, will not mirror to main."
exit 0
fi

# Skip if not targeting dev
if [ "$BASE_REF" != "dev" ]; then
echo "PR is not targeting dev, will not mirror to main."
exit 0
fi

# Check if target branch already exists
if git ls-remote --heads origin "refs/heads/$TARGET_BRANCH" | grep -q .; then
echo "Target branch already exists, will not cherry-pick again."
exit 0
fi

# Get PR details
PR_AUTHOR="${{ fromJSON(steps.get-pr-info.outputs.pr-info).user.login }}"
PR_TITLE="${{ fromJSON(steps.get-pr-info.outputs.pr-info).title }}"
SOURCE_BRANCH="${{ fromJSON(steps.get-pr-info.outputs.pr-info).head.ref }}"
SOURCE_REPO="${{ fromJSON(steps.get-pr-info.outputs.pr-info).head.repo.full_name }}"

# Fetch all branches
git fetch origin dev

# Handle forks vs same repo
if [ "$SOURCE_REPO" = "${{ github.repository }}" ]; then
git fetch origin "$SOURCE_BRANCH"
git checkout "$SOURCE_BRANCH"
else
git fetch "https://github.com/$SOURCE_REPO.git" "$SOURCE_BRANCH"
git checkout FETCH_HEAD
fi

# Find commit range to cherry-pick
START_COMMIT=$(git merge-base origin/dev HEAD)
END_COMMIT=$(git rev-parse HEAD)

# Create cherry-pick branch from main
git fetch origin main
git checkout main
git checkout -b "$TARGET_BRANCH"

# Cherry-pick commits
if ! git cherry-pick "$START_COMMIT..$END_COMMIT"; then
# Comment on the original PR about the failure
COMMENT_BODY=$(cat <<'EOF'
❌ **Cherry-pick to main failed**

The cherry-pick encountered conflicts and could not be completed automatically.

**Next steps:**
1. Manually create a PR with these changes to main
2. Resolve any conflicts
EOF
)

gh pr comment $PR_NUMBER --body "$COMMENT_BODY"
exit 1
fi

# Push branch
git push -u origin "$TARGET_BRANCH"

# Create PR to main
gh pr create \
--base main \
--head "$TARGET_BRANCH" \
--title "cp: \`$PR_TITLE ($PR_NUMBER)\` into \`main\`" \
--body "[🤖]: Hi @$PR_AUTHOR 👋<br><br>We've cherry-picked \`$PR_TITLE (#$PR_NUMBER)\` into \`main\` for you! 🚀<br><br>Please review and approve this cherry-pick at your convenience!" \
--label "cherry-pick" \
--reviewer "$PR_AUTHOR"

74 changes: 0 additions & 74 deletions .github/workflows/multi-approval-bot.yml

This file was deleted.

1 change: 1 addition & 0 deletions .gitlab/stages/00.pre.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ pre:create_ci_branches_dev:
- branch: ci-dev-rebuild-mcore-nemo-image
- branch: ci-dev-mr
- branch: ci-dev-nightly
- branch: ci-dev-weekly
- branch: ci-dev-upgrade-dependencies
tags:
- arch/amd64
Expand Down
4 changes: 2 additions & 2 deletions .gitlab/stages/04.functional-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ functional:x_notify:
- export RO_API_TOKEN=${PROJECT_ACCESS_TOKEN_MCORE}
- export GITLAB_ENDPOINT
- export CONTEXT=$FUNCTIONAL_TEST_SCOPE
- export TAG_TEAM=$([[ "$CI_COMMIT_BRANCH" == "main" ]] && echo "1" || "0")
- export TAG_TEAM=$([[ "$CI_COMMIT_BRANCH" == "main" || "$CI_COMMIT_BRANCH" == "dev" ]] && echo "1" || "0")
- export TEAM_SLUG=$SLACK_ADMIN
- |
python tests/test_utils/python_scripts/notify.py \
Expand All @@ -269,7 +269,7 @@ functional:x_notify:
paths:
- scripts
rules:
- if: ($CI_PIPELINE_SOURCE == "schedule" || $CI_COMMIT_BRANCH == "main") && $FUNCTIONAL_TEST == "yes"
- if: ($CI_PIPELINE_SOURCE == "schedule" || $CI_COMMIT_BRANCH == "main" || $CI_COMMIT_BRANCH == "dev") && $FUNCTIONAL_TEST == "yes"
when: always
- when: never

Expand Down
Loading
Loading