Skip to content

docs: add Mistral Medium 3.5 VLM coverage and fine-tuning guide#2091

Merged
HuiyingLi merged 1 commit intomainfrom
huiyingl/docs-mistral-medium-3-5
Apr 29, 2026
Merged

docs: add Mistral Medium 3.5 VLM coverage and fine-tuning guide#2091
HuiyingLi merged 1 commit intomainfrom
huiyingl/docs-mistral-medium-3-5

Conversation

@HuiyingLi
Copy link
Copy Markdown
Contributor

@HuiyingLi HuiyingLi commented Apr 29, 2026

Summary

  • Adds the public-facing documentation for Mistral AI's Mistral Medium 3.5 (a 128B FP8-native dense VLM with Pixtral vision tower + Ministral-3 text backbone, same architecture lineage as Devstral-2-123B).
  • Sibling implementation PR: feat: add mistral medium 3.5 #2090 — that PR ships the loader (Mistral3FP8VLMForConditionalGeneration), TP plan, PP forward, and 8-node MedPix recipe. This PR is documentation-only.

🤖 Generated with Claude Code

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 29, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@HuiyingLi HuiyingLi added the docs-only With great power comes great responsibility. label Apr 29, 2026
@HuiyingLi
Copy link
Copy Markdown
Contributor Author

/ok to test c500cc7

Mistral AI's new flagship 128B dense VLM is now supported in NeMo
AutoModel via the Mistral3FP8VLMForConditionalGeneration custom class
(Pixtral vision tower + Ministral-3 dense decoder, FP8 on disk).
Mistral Medium 3.5 merges Mistral Medium 3.1, Magistral Medium, and
Devstral 2 into a single checkpoint with a configurable reasoning mode
and a 256k context window — open-weights under a modified MIT license.
Architecturally it shares the dense Ministral-3 text backbone with
mistralai/Devstral-2-123B-Instruct-2512.

This commit adds the documentation for that model:

- docs/model-coverage/vlm/mistralai/mistral-medium-3-5.md: model
  coverage page covering architecture (dense, 88-layer Ministral-3
  text backbone + Pixtral vision tower), strengths/trade-offs, use
  cases, recipe, install/run snippets. Modeled on mistral-small-4.md.
- docs/guides/vlm/mistral-medium-3-5.md: end-to-end fine-tuning guide
  on MedPix-VQA with the 8-node TP=8 PP=8 recipe, including a
  walkthrough of the FP8 dequantize-on-load path. Modeled on the
  Qwen3.5-VL guide.
- docs/model-coverage/vlm/index.md: add Mistral Medium 3.5 row to
  supported models table and toctree.
- docs/model-coverage/latest-models.md: prepend release row.
- docs/index.md: add Mistral Medium 3.5 VL to the guides toctree.
- README.md: add news bullet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: HuiyingLi <willwin.lee@gmail.com>
@HuiyingLi HuiyingLi force-pushed the huiyingl/docs-mistral-medium-3-5 branch from c500cc7 to e5aeebb Compare April 29, 2026 15:23
@HuiyingLi
Copy link
Copy Markdown
Contributor Author

/ok to test e5aeebb

@HuiyingLi HuiyingLi enabled auto-merge (squash) April 29, 2026 15:27
@HuiyingLi HuiyingLi disabled auto-merge April 29, 2026 15:31
@HuiyingLi HuiyingLi merged commit fab3f81 into main Apr 29, 2026
33 of 34 checks passed
@HuiyingLi HuiyingLi deleted the huiyingl/docs-mistral-medium-3-5 branch April 29, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-only With great power comes great responsibility.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants