Skip to content

Conversation

@DrRyanHuang
Copy link
Collaborator

@DrRyanHuang DrRyanHuang commented Jul 7, 2025

之前的参数 --enable-static-graph-inference 在多模模型不生效,本pr将该参数透传下去

(需要先合入 #2731 ✅️ 与 #2733 ✅️)


在多模 ERNIE-4.5-VL-28B-A3B-Base-Paddle ,开启SOT推理,与动态图端到端性能数据对比

指标 SOT 动态图
Successful requests 1975 1973
Benchmark duration (s) 2047.86 2589.33
Total input tokens 6574762 6566334
Total generated tokens 1477338 1452119
Request throughput (req/s) 0.96 0.76
Output token throughput (tok/s) 721.41 560.81
Total Token throughput (tok/s) 3931.97 3096.73

测试细节

# 起服务脚本
python -m fastdeploy.entrypoints.openai.api_server \
       --model $model_path \
       --port $port \
       --tensor-parallel-size 1 \
       --max-model-len 32768 \
       --max-num-seqs 128 \
       --enable-mm \

测试数据 0512_api9_mm_ds_9974_for_fastdeploy2000 条数据集 --max-concurrency 100

其余环境变量

export SOT_UNSAFE_CACHE_FASTPATH=1
export SOT_ENABLE_0_SIZE_FALLBACK=0 
export FLAGS_specialize_device_in_dy2st=1
export FLAGS_parameters_persistent_mode_in_dy2st=1

cc @SigureMo

@paddle-bot
Copy link

paddle-bot bot commented Jul 7, 2025

Thanks for your contribution!

@DrRyanHuang DrRyanHuang changed the title [SOT] Enable SOT D2St in Multimodal Model [SOT] Enable SOT Dy2St in Multimodal Model Jul 7, 2025
Copy link
Member

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTMeow 🐾

Copy link
Collaborator

@ming1753 ming1753 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ming1753 ming1753 merged commit c4718fd into PaddlePaddle:develop Jul 9, 2025
3 checks passed
@DrRyanHuang DrRyanHuang deleted the vl_support_sot branch July 9, 2025 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants