[SOT] Add sot warmup (NVIDIA GPU Only) #2929

DrRyanHuang · 2025-07-21T02:13:52Z

给SOT添加 Warm Up 操作
cc @SigureMo

paddle-bot · 2025-07-21T02:13:56Z

Thanks for your contribution!

SigureMo

LGTMeow

fastdeploy/worker/gpu_model_runner.py

SigureMo

merge 一下最新的 develop 并跑一下 pre-commit

pre-commit run --files fastdeploy/model_executor/graph_optimization/utils.py fastdeploy/worker/gpu_model_runner.py fastdeploy/worker/gpu_worker.py

fastdeploy/worker/gpu_model_runner.py

fastdeploy/worker/worker_process.py

SigureMo

LGTMeow

gongshaotian · 2025-07-22T12:23:34Z

fastdeploy/model_executor/graph_optimization/utils.py

+sot_warmup_guard, in_sot_warmup_mode = create_guard(False)
+profile_run_guard, in_profile_run_mode = create_guard(False)


是不是保留一个profile_run_guard 就够了，sot 和 cuda graph 都能通过 in_profile_run_mode 判断

目前是酱紫的（按时间顺序）：

profile_run_guard 这个用来判断是否在 profile_run 阶段，如果是这个阶段，则确定跑动态图

sot_warmup_guard 这个用来判断是否在 SOT 的 warmup 阶段，这个阶段转静，并标记动态shape

服务启动之后，如果是启动了SOT转静，则跑静态图，不再标记动态shape

如果只留下 profile_run_guard 的话，就没办法区分是在 SOT warmup 阶段跑的假数据，还是服务启动之后跑的真数据（服务启动后不再标记动态shape，只在 warmup 阶段标记）

gongshaotian · 2025-07-22T12:25:15Z

SOT 计划支持多硬件吗，gcu 那些model runner 需要适配吗

DrRyanHuang · 2025-07-22T12:38:36Z

SOT 计划支持多硬件吗，gcu 那些model runner 需要适配吗

@gongshaotian 其他硬件也会支持 warmup，下一个PR统一修改吧

gongshaotian

LGTM

* [MTP Fix] Fix code and register cpp operators (PaddlePaddle#2965) * fix rl config local rank (PaddlePaddle#2957) * [FIX]fix rejection sampling when topp=0 using _SAMPLING_EPS (PaddlePaddle#2967) * fix rejection sampling when topp=0 * fix * [SOT] Add sot warmup (NVIDIA GPU Only) (PaddlePaddle#2929) * add sot warmup * fix code style * change batch_size list * add param to config * rm free_list settings && set sot_warmup_sizes * finish debug with dynamic dims by type annotations * add profile_run guard * rm sth useless * support chunk_prefill in fa3 * 【Infer】Improve the performance block_wise_fp8 of triton_moe_backend (PaddlePaddle#2942) * Update README.md * Update README.md * delete max-len (PaddlePaddle#2959) * [CI] add codestyle_check action (PaddlePaddle#2972) * [CI] add codestyle_check action * [CI] Integrate codestyle check via pre-commit in GitHub Actions * fix mtp bug in pd-split mode (PaddlePaddle#2970) * [BugFix] Add prefill restrictions for chunked_prefill+VL (PaddlePaddle#2983) * Fix performance degradation bug of custom_all_reduce (PaddlePaddle#2981) * FA3 fix bug (PaddlePaddle#2987) * polish code for prefill restrictions (PaddlePaddle#2991) * [Feature] Support block scheduler v1 for FD (PaddlePaddle#2928) * Support FD block scheduler v1 * Support FD block scheduler v1 * Support FD block scheduler v1 * Fix according to copilot review * Fix according to review * Remove is_dummy * Fix bug when real_bsz=1 * Fix infer first token cost time --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> * update (PaddlePaddle#2978) * [Code Simplification] fix init_distributed_environment() (PaddlePaddle#2982) * support c4 attn && fix cache * fix chunk_prefill * [benchmark] add quantization for benchmark yaml (PaddlePaddle#2995) * [Fix] fix mm ep empty run (PaddlePaddle#2999) * add ci reuse action (PaddlePaddle#2968) * add ci reuse action * fix code formatting * update * [Feature] multi-source download (PaddlePaddle#2986) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * [LLM] update function name (PaddlePaddle#2985) * [LLM] update function name * [BugFix] fix multinode deployment (PaddlePaddle#2977) * Update benchmark tools (PaddlePaddle#3004) * update benchmark tools * update benchmark tools * update flake8 version to support pre-commit in python3.12 (PaddlePaddle#3000) * update flake8 version to support pre-commit in python3.12 * polish code * [Feature] multi source download (PaddlePaddle#3005) * multi-source download * multi-source download * huggingface download revision * requirement * style * add revision arg * test * pre-commit * Change default download * change requirements.txt * modify English Documentation * documentation * [GCU] Update to develop (PaddlePaddle#2988) * [Model] Provide clearer error for missing KV cache quantization scales (PaddlePaddle#3007) * [Feature] Support_eplb (PaddlePaddle#2997) * [Feature] support_eplb * [Feature] support_eplb * [Fix] fix mm ep * Update setup.py * [feat] add disable_chat_template in chat api as a substitute for previous raw_request (PaddlePaddle#3023) * [feat] add disable_chat_template in chat api as a substitute for previous raw_request * [fix] pre-commit code check --------- Co-authored-by: GoldPancake <56388518+Deleter-D@users.noreply.github.com> Co-authored-by: gaoziyuan <88373061+gzy19990617@users.noreply.github.com> Co-authored-by: Sunny-bot1 <68891411+Sunny-bot1@users.noreply.github.com> Co-authored-by: Ryan <zihaohuang@aliyun.com> Co-authored-by: lizhenyun01 <1500424927@qq.com> Co-authored-by: chen <103103266+ckl117@users.noreply.github.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: lizexu123 <39205361+lizexu123@users.noreply.github.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: freeliuzc <lzc842650834@gmail.com> Co-authored-by: Zero Rains <linjunlu@zerorains.top> Co-authored-by: zhink <33270771+zhink@users.noreply.github.com> Co-authored-by: chenjian <1435317881@qq.com> Co-authored-by: bukejiyu <52310069+bukejiyu@users.noreply.github.com> Co-authored-by: xiegegege <46314656+xiegegege@users.noreply.github.com> Co-authored-by: xiaoxiaohehe001 <49090790+xiaoxiaohehe001@users.noreply.github.com> Co-authored-by: YUNSHEN XIE <1084314248@qq.com> Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com> Co-authored-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: Zhang Yulong <35552275+ZhangYulongg@users.noreply.github.com> Co-authored-by: EnflameGCU <118410644+EnflameGCU@users.noreply.github.com> Co-authored-by: littledgg <61149469+littledgg@users.noreply.github.com> Co-authored-by: 李泳桦 <39643373+liyonghua0910@users.noreply.github.com>

add sot warmup

9259dd1

Merge branch 'develop' into sot_warmup

c679f72

SigureMo approved these changes Jul 21, 2025

View reviewed changes

gongshaotian reviewed Jul 21, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

gongshaotian reviewed Jul 21, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

SigureMo reviewed Jul 21, 2025

View reviewed changes

gongshaotian reviewed Jul 21, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

gongshaotian reviewed Jul 21, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

DrRyanHuang added 3 commits July 21, 2025 08:00

Merge branch 'develop' into sot_warmup

12f2ba7

fix code style

e42d507

change batch_size list

b257957

DrRyanHuang requested review from SigureMo and gongshaotian July 21, 2025 08:58

gongshaotian reviewed Jul 21, 2025

View reviewed changes

fastdeploy/worker/gpu_model_runner.py Outdated Show resolved Hide resolved

add param to config

7be888d

SigureMo mentioned this pull request Jul 22, 2025

[SOT] Mark dynamic dims by type annotations #2771

Merged

gongshaotian reviewed Jul 22, 2025

View reviewed changes

fastdeploy/worker/worker_process.py Show resolved Hide resolved

DrRyanHuang added 7 commits July 22, 2025 15:38

rm free_list settings && set sot_warmup_sizes

ef0c50b

Merge branch 'develop' into sot_warmup

01fb5f1

finish debug with dynamic dims by type annotations

fd734af

Merge branch 'develop' into sot_warmup

9704703

add profile_run guard

37882ac

rm sth useless

d94b1d2

Merge branch 'develop' into sot_warmup

f5babb8

SigureMo approved these changes Jul 22, 2025

View reviewed changes

gongshaotian reviewed Jul 22, 2025

View reviewed changes

DrRyanHuang changed the title ~~[SOT] Add sot warmup~~ [SOT] Add sot warmup (NVIDIA GPU Only) Jul 22, 2025

DrRyanHuang requested a review from gongshaotian July 22, 2025 12:39

gongshaotian approved these changes Jul 22, 2025

View reviewed changes

gongshaotian merged commit 95b5af2 into PaddlePaddle:develop Jul 22, 2025
4 of 5 checks passed

DrRyanHuang deleted the sot_warmup branch July 22, 2025 13:43

DrRyanHuang mentioned this pull request Jul 28, 2025

[SOT] Extend SOT warmup support to new hardware #3032

Merged

		sot_warmup_guard, in_sot_warmup_mode = create_guard(False)
		profile_run_guard, in_profile_run_mode = create_guard(False)

[SOT] Add sot warmup (NVIDIA GPU Only) #2929

[SOT] Add sot warmup (NVIDIA GPU Only) #2929

Uh oh!

Conversation

DrRyanHuang commented Jul 21, 2025

Uh oh!

paddle-bot bot commented Jul 21, 2025

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SigureMo left a comment

Choose a reason for hiding this comment

Uh oh!

gongshaotian Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian commented Jul 22, 2025

Uh oh!

DrRyanHuang commented Jul 22, 2025

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants