Skip to content

Add GB200 DSV4 Dynamo vLLM MTP2 recipes#1242

Merged
Oseltamivir merged 3 commits intoSemiAnalysisAI:mainfrom
alec-flowers:codex/dsv4-gb200-mtp2-pareto
May 1, 2026
Merged

Add GB200 DSV4 Dynamo vLLM MTP2 recipes#1242
Oseltamivir merged 3 commits intoSemiAnalysisAI:mainfrom
alec-flowers:codex/dsv4-gb200-mtp2-pareto

Conversation

@alec-flowers
Copy link
Copy Markdown
Collaborator

@alec-flowers alec-flowers commented Apr 30, 2026

Summary

  • Add a new dsv4-fp4-gb200-dynamo-vllm-mtp2 matrix key using vllm/vllm-openai:nightly-a749a33d8d05acdd3ab346bd3f0c6b5c9c80474f.
  • Add four 8k/1k GB200 Dynamo vLLM MTP2 Pareto recipes: low-latency TP8, low-middle DEP8/TP8 offload, mid DEP8 MegaMOE, and high-throughput 2P/1D DEP8 MegaMOE.
  • Keep InferenceX metadata aligned with existing GB200 recipes: deepseek-v4-pro model alias, GB200 runner config, dedicated infra node, FP4 indexer cache, and MTP2 speculative config.

Validation

  • Parsed all four new recipe YAML files and .github/configs/nvidia-master.yaml with PyYAML.
  • Asserted nightly image, deepseek-v4-pro model alias, MTP2, FP4 indexer cache, expected topology, and expected concurrency per recipe.
  • Ran python3 utils/matrix_logic/generate_sweep_configs.py test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-gb200-dynamo-vllm-mtp2 --seq-lens 8k1k --no-evals.
  • Ran filtered GB200 full sweep generation and verified 4 MTP rows using the nightly image.

Comment thread .github/configs/nvidia-master.yaml Outdated
@alec-flowers alec-flowers force-pushed the codex/dsv4-gb200-mtp2-pareto branch from df3a6f7 to 5e7699f Compare April 30, 2026 22:13
@alec-flowers alec-flowers marked this pull request as ready for review May 1, 2026 01:41
@alec-flowers alec-flowers requested a review from a team May 1, 2026 01:41
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@alec-flowers
Copy link
Copy Markdown
Collaborator Author

Screenshot 2026-04-30 at 6 56 08 PM Stats for reproducibility of low latency point.

@alec-flowers alec-flowers force-pushed the codex/dsv4-gb200-mtp2-pareto branch from e35649b to dc7bdf4 Compare May 1, 2026 01:58
@alec-flowers alec-flowers requested a review from cquil11 May 1, 2026 02:05
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Oseltamivir Oseltamivir merged commit ef5dee4 into SemiAnalysisAI:main May 1, 2026
20 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants