feat+refactor: SimAI 1.6 GPU Memory Module & Code Quality | SimAI 1.6 GPU 内存推理模块与代码质量提升#243
feat+refactor: SimAI 1.6 GPU Memory Module & Code Quality | SimAI 1.6 GPU 内存推理模块与代码质量提升#243tianhao909 wants to merge 3 commits intoaliyun:masterfrom
Conversation
…n3 inference simulation Add GPU memory inference and PD-separation (Prefill-Decode disaggregation) support for large-scale model simulation including DeepSeek-671B, Qwen3-MoE-235B, and Qwen3-Next-80B. Key changes: - Add parameter counter for MoE/Dense/MLA architectures - Add memory planner with PD-separation request allocation - Integrate AICB workload data and HuggingFace model configs - Add MFU calculator improvements - Add execution time entity enhancements - Add run_scenarios.sh for batch simulation 新增 GPU 内存推理与 PD 分离(预填充-解码分离调度)功能,支持 DeepSeek-671B、 Qwen3-MoE-235B、Qwen3-Next-80B 等大规模模型仿真。主要变更: - 新增 MoE/Dense/MLA 架构参数计数器 - 新增支持 PD 分离请求分配的内存规划器 - 集成 AICB 工作负载数据和 HuggingFace 模型配置 - 改进 MFU 计算器 - 增强执行时间实体 - 新增批量仿真脚本 run_scenarios.sh Co-authored-by: tianhao909 <843101550@qq.com> Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
…d bilingual docstrings Code quality improvements across vidur-alibabacloud modules: - Replace all print() calls with proper logging module usage - Remove ~390 lines of dead/commented-out code - Add bilingual (EN/ZH) docstrings to core modules - Clean up imports and unused variables - Improve execution time predictor and scheduler code vidur-alibabacloud 模块代码质量改进: - 将所有 print() 调用替换为标准 logging 模块 - 移除约 390 行死代码/注释代码 - 为核心模块添加双语(中英文)文档字符串 - 清理导入和未使用的变量 - 改进执行时间预测器和调度器代码 Co-authored-by: tianhao909 <843101550@qq.com> Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
恢复 vidur-alibabacloud/.gitignore 中的开发规则 Co-authored-by: tianhao909 <843101550@qq.com> Co-authored-by: MXtremist <44829997+MXtremist@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds SimAI 1.6 PD-separation (prefill/decode disaggregation) support for GPU memory simulation, expands model/device/node SKU configuration to cover additional large models and hardware, and checks in pre-generated AICB workload/HF config data.
Changes:
- Implement PD-aware cluster initialization and replica/world-size bookkeeping.
- Add new model configs (Qwen3-Next-80B, Qwen3-MoE-235B) and new device/node SKU configs (H20/H200/GB200).
- Add HF model config JSONs and AICB workload CSV/JSON artifacts; update
.gitignoreto keep required workload data tracked.
Reviewed changes
Copilot reviewed 63 out of 89 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| vidur-alibabacloud/vidur/entities/cluster.py | Adds PD separation logic to compute per-phase world sizes/EP and initialize replicas accordingly. |
| vidur-alibabacloud/vidur/entities/batch.py | Removes a leftover debug assertion comment. |
| vidur-alibabacloud/vidur/config/node_sku_config.py | Introduces an H20 DGX node SKU config. |
| vidur-alibabacloud/vidur/config/model_config.py | Adds Qwen3-Next-80B and Qwen3-MoE-235B model config dataclasses. |
| vidur-alibabacloud/vidur/config/device_sku_config.py | Adds H20/H200/GB200 device SKU configs and updates H800 throughput fields. |
| vidur-alibabacloud/vidur/config/config.py | Extends ReplicaConfig with PD-specific knobs and EP auto-computation/logging. |
| vidur-alibabacloud/data/hf_configs/qwen3-next-80B-A3B_config.json | Adds HF-style config for Qwen3-Next-80B-A3B. |
| vidur-alibabacloud/data/hf_configs/qwen3-next-80B-A3B_Instruct_FP8_config.json | Adds HF-style config for Qwen3-Next-80B-A3B Instruct FP8. |
| vidur-alibabacloud/data/hf_configs/qwen3-8B_config.json | Adds HF-style config for Qwen3-8B. |
| vidur-alibabacloud/data/hf_configs/qwen3-30B-A3B_config.json | Adds HF-style config for Qwen3-MoE 30B-A3B. |
| vidur-alibabacloud/data/hf_configs/qwen3-235B-A22B_config.json | Adds HF-style config for Qwen3-MoE 235B-A22B. |
| vidur-alibabacloud/data/hf_configs/qwen3-235B-A22B_FP8_config.json | Adds FP8 HF-style config variant for Qwen3-MoE 235B-A22B. |
| vidur-alibabacloud/data/hf_configs/deepseek_v3_config.json | Adds HF-style config for DeepSeek V3. |
| vidur-alibabacloud/data/hf_configs/deepseek_R1_0528_config.json | Adds HF-style config for DeepSeek R1 0528. |
| vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv | Adds prefill workload profile for Qwen3-Next-80B. |
| vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv | Adds decode workload profile for Qwen3-Next-80B. |
| vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Moe-235B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv | Adds prefill workload profile for Qwen3-MoE-235B. |
| vidur-alibabacloud/data/aicb_workload/vidur-Qwen3-Moe-235B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv | Adds decode workload profile for Qwen3-MoE-235B. |
| vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs4-seq4096-prefill.csv | Adds prefill workload profile for DeepSeek-671B. |
| vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs4-seq4096-decode.csv | Adds decode workload profile for DeepSeek-671B. |
| vidur-alibabacloud/data/aicb_workload/vidur-DeepSeek-671B-world_size32-tp1-pp1-ep32-bs3-seq4096-decode.csv | Adds a smaller CSV decode workload variant for DeepSeek-671B. |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq100-prefill.csv | Adds cached prefill workload profile for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size8-tp1-pp1-ep8-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size6-tp1-pp1-ep6-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws6). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size6-tp1-pp1-ep6-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws6). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws32). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size32-tp1-pp1-ep32-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-Next-80B (ws32). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Next-80B-world_size2-tp1-pp1-ep2-bs1-seq100-prefill.csv | Adds cached prefill workload profile for Qwen3-Next-80B (ws2). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size8-tp4-pp1-ep8-bs1-seq100-prefill.csv | Adds cached prefill workload profile for Qwen3-MoE-235B (ws8,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-MoE-235B (ws32,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq100-prefill.csv | Adds cached prefill workload profile for Qwen3-MoE-235B (ws32,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size32-tp4-pp1-ep4-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-MoE-235B (ws32,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size24-tp4-pp1-ep24-bs1-seq106-decode.csv | Adds cached decode workload profile for Qwen3-MoE-235B (ws24,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-Qwen3-Moe-235B-world_size24-tp4-pp1-ep24-bs1-seq100-decode.csv | Adds cached decode workload profile for Qwen3-MoE-235B (ws24,tp4). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size64-tp8-pp1-ep8-bs1-seq106-decode.csv | Adds cached decode workload profile for DeepSeek-671B (ws64,tp8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size64-tp8-pp1-ep8-bs1-seq100-decode.csv | Adds cached decode workload profile for DeepSeek-671B (ws64,tp8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size48-tp8-pp1-ep48-bs1-seq106-decode.csv | Adds cached decode workload profile for DeepSeek-671B (ws48,tp8). |
| vidur-alibabacloud/data/aicb_workload/cache/vidur-DeepSeek-671B-world_size48-tp8-pp1-ep48-bs1-seq100-decode.csv | Adds cached decode workload profile for DeepSeek-671B (ws48,tp8). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws8-tp1-pp1-ep8-bs1-seq106-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws8-tp1-pp1-ep8-bs1-seq100-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws8). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws6-tp1-pp1-ep6-bs1-seq106-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws6). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws6-tp1-pp1-ep6-bs1-seq100-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws6). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws32-tp1-pp1-ep32-bs1-seq106-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws32). |
| vidur-alibabacloud/data/aicb_workload/cache/aicb-Qwen3-Next-80B-ws32-tp1-pp1-ep32-bs1-seq100-decode.json | Adds cached AICB JSON for Qwen3-Next-80B (ws32). |
| vidur-alibabacloud/.gitignore | Keeps AICB workload artifacts tracked while ignoring other generated CSV/log outputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # 每个 replica 同时处理 prefill 和 decode | ||
| # EP = ws = tp * pp * dp (full cluster world_size) | ||
| # ============================================================ | ||
| if rc.pd_node_ratio == 1: |
| # 2. pd_node_ratio (calculated by ratio) | ||
| # pd_node_ratio (按比例计算) | ||
| # ============================================================ | ||
| elif rc.pd_node_ratio > 0 and rc.pd_node_ratio < 1: |
| if metrics_config.write_json_trace: | ||
| self._write_cluster_info_to_file() |
| assert num_p > 0 and num_d > 0, ( | ||
| f"[Cluster] _num_prefill_replicas={num_p} 和 " | ||
| f"_num_decode_replicas={num_d} 必须都 > 0, " | ||
| f"来源: {replica_source}") |
| assert rc.prefill_world_size > 0 and rc.decode_world_size > 0, ( | ||
| f"[Cluster] prefill_ws={rc.prefill_world_size} 和 " | ||
| f"decode_ws={rc.decode_world_size} 必须都 > 0") |
| # GB200 NVL72 | ||
| class GB200DeviceSKUConfig(BaseDeviceSKUConfig): | ||
| fp16_tflops: int = 2500 | ||
| fp8_tflops: int = 5000 | ||
| total_memory_gb: int = 192 |
| metadata={"help": "> add: pd_p2p_comm_dtype for pd disaggregation." | ||
| "choices=['fp8', 'float16', 'float32', 'float64', 'bfloat16', 'int8', 'int16', 'int32', 'int64']," | ||
| }, | ||
|
|
| # 打印 ReplicaConfig 配置摘要 | Print ReplicaConfig summary | ||
| logger.info(f"[ReplicaConfig] tp={self.tensor_parallel_size}, pp={self.num_pipeline_stages}, " | ||
| f"per_replica_ws={self.world_size}, ep(temp)={self.expert_model_parallel_size}, " | ||
| f"pd_ratio={self.pd_node_ratio}") | ||
| if self.pd_node_ratio < 1: | ||
| p_tp = self.prefill_tensor_parallel_size or self.tensor_parallel_size | ||
| p_pp = self.prefill_num_pipeline_stages or self.num_pipeline_stages | ||
| d_tp = self.decode_tensor_parallel_size or self.tensor_parallel_size | ||
| d_pp = self.decode_num_pipeline_stages or self.num_pipeline_stages | ||
| logger.info(f"[ReplicaConfig] PD separation enabled: " | ||
| f"prefill(tp={p_tp}, pp={p_pp}), decode(tp={d_tp}, pp={d_pp})") | ||
| if self.num_prefill_replicas is not None: | ||
| logger.info(f"[ReplicaConfig] User specified num_prefill_replicas={self.num_prefill_replicas}") |
|
|
||
|
|
||
| @dataclass | ||
| class Qwen3235BA22BModelConfig(BaseModelConfig): |
| "Qwen2MoeForCausalLM" | ||
| ], | ||
| "attention_dropout": 0.0, | ||
| "bos_token_id": 151643, | ||
| "decoder_sparse_step": 1, | ||
| "eos_token_id": 151643, |
[EN] PR #243 Copilot Review — Resolution Summary
[ZH] PR #243 Copilot 评审 — 修复汇总
|
Summary
This PR introduces the GPU Memory Inference Module with PD-separation (Prefill-Decode disaggregation) support for SimAI 1.6, enabling accurate memory simulation for large-scale models including DeepSeek-671B, Qwen3-MoE-235B, and Qwen3-Next-80B. It also includes code quality improvements across the vidur-alibabacloud modules.
摘要
本 PR 为 SimAI 1.6 引入 GPU 内存推理模块,支持 PD 分离(预填充-解码分离调度),实现对 DeepSeek-671B、Qwen3-MoE-235B、Qwen3-Next-80B 等大规模模型的精确内存仿真。同时包含 vidur-alibabacloud 模块的代码质量改进。
Changes / 变更内容
New Features / 新功能
Code Quality / 代码质量
print()with properloggingmodule | 将 print() 替换为 loggingFiles Changed / 文件变更
vidur-alibabacloud/vidur/— Python code (89 files)vidur-alibabacloud/data/— AICB workload data + HF model configsvidur-alibabacloud/examples/— run_scenarios.shvidur-alibabacloud/.gitignore— Updated ignore rulesTesting / 测试
Checklist
Co-authored-by: tianhao909 843101550@qq.com
Co-authored-by: MXtremist 44829997+MXtremist@users.noreply.github.com