docs(llm): add DeepSeek V4 Flash fine-tuning guide by khazic · Pull Request #2053 · NVIDIA-NeMo/Automodel

khazic · 2026-04-25T08:29:08Z

Summary

Adds a Qwen3.5-style fine-tuning guide for DeepSeek V4 Flash, mirroring the format of docs/guides/vlm/qwen3-5.md. Companion to #2039 (the model + recipe PR).

What's added

docs/guides/llm/dsv4-flash.md (new) — covers:
- Architecture: SWA / CSA / HCA hybrid attention via compress_ratios, Hash gate (DeepseekV4HashGate + tid2eid) on the first num_hash_layers, Hyper-Connections with col-norm-first Sinkhorn, dual-base RoPE (θ=10000 / θ=160000 + YaRN), GQA + Q-LoRA + grouped O-LoRA.
- Checkpoint format: FP4 e2m1fn packed routed experts with FP8 e8m0fnu scales, FP8 e4m3fn 128×128 for the rest, hash-bias drop, Indexer / Compressor key flattening, F8_E8M0 / F8_E5M2 storage-reader backport.
- Both shipped recipes — deepseek_v4_flash_validate.yaml (4-layer infra harness) and deepseek_v4_flash_hellaswag.yaml (HellaSwag finetune).
- Standalone Slurm launch script.
- Layer-parity result vs the DeepSeek inference reference (final-logits cos 0.998, top-1 token matches, every block cos ≥ 0.987 on the 4-layer parity harness).
- 43-layer full-finetune loss curve.
docs/index.md — adds a "Fine-tune DeepSeek V4 Flash" row to the feature table and a toctree entry under "Recipes & E2E Examples".
docs/model-coverage/latest-models.md — adds DeepSeek V4 Flash entry at the top with date 2026-04-25.

Test plan

Markdown renders locally.
All cross-links resolve to existing files / sections.
No CLAUDE co-author tag; commit is signed off.
CI (docs build + linting).

Adds a Qwen3.5-style fine-tuning guide for DeepSeek V4 Flash, covering the architecture (SWA / CSA / HCA hybrid attention via compress_ratios, Hash gate, Hyper-Connections, dual-base RoPE, Q-LoRA + grouped O-LoRA), checkpoint format support (FP4 e2m1fn + FP8 e8m0fnu / e4m3fn / e5m2), both shipped recipes (validate harness + HellaSwag), and the 4-layer parity result vs the DeepSeek inference reference. - docs/guides/llm/dsv4-flash.md (new) - docs/index.md (feature table + toctree entry) - docs/model-coverage/latest-models.md (entry under 2026-04-25) Signed-off-by: khazic <khazzz1c@gmail.com>

copy-pr-bot · 2026-04-25T08:29:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

HuiyingLi · 2026-04-25T08:36:02Z

/ok to test 983d29e

khazic · 2026-04-25T09:49:41Z

Pushed 0ca1d03 to drop the deepseek_v4_flash_validate.yaml reference from the guide (the bullet under Launch Training plus the Quick infrastructure validation subsection). The validate harness is an internal smoke-test config — the user-facing guide should advertise only the HellaSwag recipe. Ready for re-review.

…2054) * docs(llm): drop validate-yaml reference from DeepSeek V4 Flash guide Removes the validate-yaml bullet under "Launch Training" and the "Quick infrastructure validation" subsection. The validate harness is an internal smoke-test config, not a user-facing finetune recipe; the guide should advertise only the HellaSwag recipe. Follow-up to #2053 (the original change was force-pushed after the PR had already merged, so the deletion did not land on main). Signed-off-by: khazic <khazzz1c@gmail.com> * docs(llm): add DeepSeek V4 Flash to README + model-coverage index Mirrors the per-model rollout pattern used for MiniMax-M2.7 (#1785): news entry at the top of the README, a dedicated model-coverage page under deepseek-ai/, and registration of the new page in the LLM index (architecture table + toctree). - README.md (news entry) - docs/model-coverage/llm/deepseek-ai/dsv4-flash.md (new) - docs/model-coverage/llm/index.md (table + toctree) Signed-off-by: Huiying Li <willwin.lee@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(llm): use plain link for hellaswag yaml until model PR lands The {download} directive on the recipe yaml fails the Sphinx build with `download.not_readable` because examples/llm_finetune/deepseek_v4/deepseek_v4_flash_hellaswag.yaml is added by the model PR (#2039), which has not yet landed on main. Use a plain GitHub link until #2039 merges; a follow-up can switch back to {download} once the file is on main. Signed-off-by: khazic <khazzz1c@gmail.com> --------- Signed-off-by: khazic <khazzz1c@gmail.com> Co-authored-by: Huiying Li <willwin.lee@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

khazic requested review from HuiyingLi, adil-a, akoumpa, athitten, hemildesai, jgerh, pthombre and zyzhou5 as code owners April 25, 2026 08:29

github-actions Bot added the community-request label Apr 25, 2026

HuiyingLi approved these changes Apr 25, 2026

View reviewed changes

HuiyingLi enabled auto-merge (squash) April 25, 2026 08:36

HuiyingLi merged commit 52fd3bb into NVIDIA-NeMo:main Apr 25, 2026
30 of 31 checks passed

khazic mentioned this pull request Apr 25, 2026

docs(llm): drop validate-yaml reference from DeepSeek V4 Flash guide #2054

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(llm): add DeepSeek V4 Flash fine-tuning guide#2053

docs(llm): add DeepSeek V4 Flash fine-tuning guide#2053
HuiyingLi merged 1 commit intoNVIDIA-NeMo:mainfrom
khazic:docs/dsv4-flash-guide

khazic commented Apr 25, 2026

Uh oh!

copy-pr-bot Bot commented Apr 25, 2026

Uh oh!

HuiyingLi commented Apr 25, 2026

Uh oh!

Uh oh!

khazic commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

khazic commented Apr 25, 2026

Summary

What's added

Test plan

Uh oh!

copy-pr-bot Bot commented Apr 25, 2026

Uh oh!

HuiyingLi commented Apr 25, 2026

Uh oh!

Uh oh!

khazic commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants