-
Notifications
You must be signed in to change notification settings - Fork 362
Track Qwen3.5-related issues #2281
Copy link
Copy link
Open
Description
zpqiu
opened on Apr 17, 2026
Issue body actions
- MCore Path
- Add / track CP support: [main] feat(moe): Support packed sequence for gated delta net (GDN) NVIDIA/Megatron-LM#2645
- Temporaroy solution: feat: add Qwen3.5 CP support for MCore path #2312
- Need to swtich MCore dev branch
- AutoModel Path
- Move the FLA dependency from the dev group ([dependency-groups]) to optional extras ([project.optional-dependencies]) so that NeMo-RL can install it downstream via pkg[extra]. If FLA is not installed, 1) No CP support; 2) Worse performance, related issues/PRs:
- Automodel side: build: move flash-linear-attention back to optional-dependencies Automodel#1894
- RL side: bump Automodel version
- Fix the default config path where Torch Adam is used without FP32 master weights, as this can slow down convergence.
- TE FusedAdam can be used as a workaround.
- AutoModel should correctly support / apply the FP32 master weight setting. fix: fp32 master weights for custom MoE models under FSDP2 Automodel#1896
- Move the FLA dependency from the dev group ([dependency-groups]) to optional extras ([project.optional-dependencies]) so that NeMo-RL can install it downstream via pkg[extra]. If FLA is not installed, 1) No CP support; 2) Worse performance, related issues/PRs:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels