Support Mimo-v2.5-Pro by wufann · Pull Request #654 · ROCm/ATOM

wufann · 2026-04-28T06:51:59Z

Motivation

Support Mimo-v2.5-Pro

Technical Details

Fused QKV Loading Hook (Core Technical Change)

Mimo-V2.5-Pro checkpoints store a single fused qkv_proj weight in TP-interleaved layout, while Flash checkpoints use separate q_proj/k_proj/v_proj. To avoid modifying the QKVParallelLinear.weight_loader interface, a model-level hook mechanism
was introduced:

loader.py: Before the packed_modules_mapping loop in the weight loading iteration, the loader calls model.load_fused_qkv_hook(). If the hook returns True, the weight is considered handled and the rest of the loading logic is skipped
via continue.
mimo_v2.py / mimo_v2_mtp.py: Both model classes implement load_fused_qkv_hook — when the weight name contains qkv_proj and exists in params_dict, it chunks the weight tensor by TP rank and writes it directly, bypassing the
shard-based split logic.

Auto-adaptation: Flash checkpoint weight names are q_proj/k_proj/v_proj → hook never fires → normal packed_modules_mapping path. Mimo-V2.5-Pro checkpoint weight names are qkv_proj → hook intercepts and handles directly. No model-type branching needed. ref:https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/mimo_v2.py#L1106

File & Class Renaming (Unified Naming)

mimo_v2_flash.py → mimo_v2.py
mimo_v2_flash_mtp.py → mimo_v2_mtp.py

All class names drop the Flash suffix: MiMoV2FlashForCausalLM → MiMoV2ForCausalLM, MiMoV2FlashMTP → MiMoV2MTP, MiMoV2FlashDecoderLayer → MiMoV2DecoderLayer, etc.

Model Registration (Flash + Pro Compatibility)

model_runner.py: Added "MiMoV2ForCausalLM" architecture entry (used by Mimo-V2.5-Pro weights), kept "MiMoV2FlashForCausalLM" for backward compatibility. is_mimo_v2() now recognizes both "mimo_v2" and "mimo_v2_flash" model types.
eagle.py: Added "MiMoV2MTPModel" MTP architecture entry alongside the existing "MiMoV2FlashMTPModel".
config.py: Added "mimo_v2" → "mimo_v2_mtp" to _MTP_TYPE_MAP. Changed MTP config override check from "mimo_v2_flash_mtp" to "mimo_v2_mtp".

max_position_embeddings Adaptation

Mimo-V2.5-Pro's HF config uses context_len instead of max_position_embeddings. Both model files now use getattr(config, "context_len", None) or getattr(config, "max_position_embeddings", 32768), preferring context_len when present.

Test Plan

TP8 + FP8KV
TP8 + FP8KV + MTP1

python -m atom.entrypoints.openai_server --model /data/MiMo-V2.5-Pro -tp 8 --trust-remote-code --kv_cache_dtype fp8

python -m atom.entrypoints.openai_server --model /data/MiMo-V2.5-Pro -tp 8 --trust-remote-code --kv_cache_dtype fp8 --method mtp

python -m atom.benchmarks.benchmark_serving \
  --model=/data/MiMo-V2.5-Pro --backend=vllm --base-url=http://localhost:8000 \
  --dataset-name=random \
  --random-input-len=1024 --random-output-len=1024 \
  --random-range-ratio=0.8 \
  --num-prompts=1280 --max-concurrency=128 \
  --request-rate=inf --ignore-eos \
  --save-result --percentile-metrics="ttft,tpot,itl,e2el"

lm_eval --model local-completions \
  --model_args model=/data/MiMo-V2.5-Pro,base_url=http://localhost:8000/v1/completions,num_concurrent=64,max_retries=3,tokenized_requests=False \
  --tasks gsm8k --num_fewshot 5

Test Result

Acc
TP8 + FP8KV

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9386	±	0.0066
		strict-match	5	exact_match	↑	0.9348	±	0.0068

TP4 + FP8KV + MTP1

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9401	±	0.0065
		strict-match	5	exact_match	↑	0.9386	±	0.0066

Perf
TP8 + FP8KV

data

============ Serving Benchmark Result ============
Successful requests:                     1280
Benchmark duration (s):                  378.80
Total input tokens:                      1180188
Total generated tokens:                  1177601
Request throughput (req/s):              3.38
Output token throughput (tok/s):         3108.74
Total Token throughput (tok/s):          6224.30
---------------Time to First Token----------------
Mean TTFT (ms):                          312.11
Median TTFT (ms):                        127.69
P99 TTFT (ms):                           3232.73
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          39.85
Median TPOT (ms):                        40.38
P99 TPOT (ms):                           42.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           39.83
Median ITL (ms):                         34.22
P99 ITL (ms):                            121.85
----------------End-to-end Latency----------------
Mean E2EL (ms):                          36953.47
Median E2EL (ms):                        36939.84
P99 E2EL (ms):                           42933.85
==================================================

TP8 + FP8KV + MTP1

data

============ Serving Benchmark Result ============
Successful requests:                     1280
Benchmark duration (s):                  344.32
Total input tokens:                      1180188
Total generated tokens:                  1175988
Request throughput (req/s):              3.72
Output token throughput (tok/s):         3415.39
Total Token throughput (tok/s):          6842.98
---------------Time to First Token----------------
Mean TTFT (ms):                          339.56
Median TTFT (ms):                        135.65
P99 TTFT (ms):                           3382.80
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          35.61
Median TPOT (ms):                        33.81
P99 TPOT (ms):                           49.84
---------------Inter-token Latency----------------
Mean ITL (ms):                           48.19
Median ITL (ms):                         40.22
P99 ITL (ms):                            117.61
----------------End-to-end Latency----------------
Mean E2EL (ms):                          33013.86
Median E2EL (ms):                        31428.81
P99 E2EL (ms):                           49556.51
==================================================

cc: @billishyahao

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Support Mimo-v2.5-Pro

8712c2a

wufann marked this pull request as draft April 28, 2026 06:53

Add ci

7fc50fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Mimo-v2.5-Pro#654

Support Mimo-v2.5-Pro#654
wufann wants to merge 2 commits intoROCm:mainfrom
wufann:mimov2pro

wufann commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wufann commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wufann commented Apr 28, 2026 •

edited

Loading