docs(llm): add DeepSeek V4 Flash fine-tuning guide#2053
Merged
HuiyingLi merged 1 commit intoNVIDIA-NeMo:mainfrom Apr 25, 2026
Merged
docs(llm): add DeepSeek V4 Flash fine-tuning guide#2053HuiyingLi merged 1 commit intoNVIDIA-NeMo:mainfrom
HuiyingLi merged 1 commit intoNVIDIA-NeMo:mainfrom
Conversation
Adds a Qwen3.5-style fine-tuning guide for DeepSeek V4 Flash, covering the architecture (SWA / CSA / HCA hybrid attention via compress_ratios, Hash gate, Hyper-Connections, dual-base RoPE, Q-LoRA + grouped O-LoRA), checkpoint format support (FP4 e2m1fn + FP8 e8m0fnu / e4m3fn / e5m2), both shipped recipes (validate harness + HellaSwag), and the 4-layer parity result vs the DeepSeek inference reference. - docs/guides/llm/dsv4-flash.md (new) - docs/index.md (feature table + toctree entry) - docs/model-coverage/latest-models.md (entry under 2026-04-25) Signed-off-by: khazic <khazzz1c@gmail.com>
HuiyingLi
approved these changes
Apr 25, 2026
Contributor
|
/ok to test 983d29e |
Contributor
Author
|
Pushed 0ca1d03 to drop the |
4 tasks
HuiyingLi
added a commit
that referenced
this pull request
Apr 26, 2026
…2054) * docs(llm): drop validate-yaml reference from DeepSeek V4 Flash guide Removes the validate-yaml bullet under "Launch Training" and the "Quick infrastructure validation" subsection. The validate harness is an internal smoke-test config, not a user-facing finetune recipe; the guide should advertise only the HellaSwag recipe. Follow-up to #2053 (the original change was force-pushed after the PR had already merged, so the deletion did not land on main). Signed-off-by: khazic <khazzz1c@gmail.com> * docs(llm): add DeepSeek V4 Flash to README + model-coverage index Mirrors the per-model rollout pattern used for MiniMax-M2.7 (#1785): news entry at the top of the README, a dedicated model-coverage page under deepseek-ai/, and registration of the new page in the LLM index (architecture table + toctree). - README.md (news entry) - docs/model-coverage/llm/deepseek-ai/dsv4-flash.md (new) - docs/model-coverage/llm/index.md (table + toctree) Signed-off-by: Huiying Li <willwin.lee@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(llm): use plain link for hellaswag yaml until model PR lands The {download} directive on the recipe yaml fails the Sphinx build with `download.not_readable` because examples/llm_finetune/deepseek_v4/deepseek_v4_flash_hellaswag.yaml is added by the model PR (#2039), which has not yet landed on main. Use a plain GitHub link until #2039 merges; a follow-up can switch back to {download} once the file is on main. Signed-off-by: khazic <khazzz1c@gmail.com> --------- Signed-off-by: khazic <khazzz1c@gmail.com> Co-authored-by: Huiying Li <willwin.lee@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a Qwen3.5-style fine-tuning guide for DeepSeek V4 Flash, mirroring the format of
docs/guides/vlm/qwen3-5.md. Companion to #2039 (the model + recipe PR).What's added
docs/guides/llm/dsv4-flash.md(new) — covers:compress_ratios, Hash gate (DeepseekV4HashGate+tid2eid) on the firstnum_hash_layers, Hyper-Connections with col-norm-first Sinkhorn, dual-base RoPE (θ=10000 / θ=160000 + YaRN), GQA + Q-LoRA + grouped O-LoRA.e2m1fnpacked routed experts with FP8e8m0fnuscales, FP8e4m3fn128×128 for the rest, hash-bias drop, Indexer / Compressor key flattening,F8_E8M0/F8_E5M2storage-reader backport.deepseek_v4_flash_validate.yaml(4-layer infra harness) anddeepseek_v4_flash_hellaswag.yaml(HellaSwag finetune).docs/index.md— adds a "Fine-tune DeepSeek V4 Flash" row to the feature table and a toctree entry under "Recipes & E2E Examples".docs/model-coverage/latest-models.md— adds DeepSeek V4 Flash entry at the top with date 2026-04-25.Test plan