[Cherry-Pick][Optimize] Qwen2.5-VL vision model #6300

xiaoxiaohehe001 · 2026-02-02T03:36:47Z

[Cherry-Pick][Optimize] Qwen2.5-VL vision model ([Optimize] Qwen2.5-VL vision model with merged linear layers and unif… #6037)

Modifications

合并gate_proj和up_proj为up_gate_proj
使用统一的RMSNorm层

before

============ Serving Benchmark Result ============
Successful requests:                     2350      
Benchmark duration (s):                  1011.96   
Total input tokens:                      6591406   
Total generated tokens:                  3525000   
Request throughput (req/s):              2.322     
Output token throughput (tok/s):         3483.35   
Total Token throughput (tok/s):          9996.89   
-------------------解码速度(tok/s)--------------------
Mean Decode:                             32.78     
Median Decode:                           32.35     
P99 Decode:                              67.39     
---------------Time to First Token----------------
Mean TTFT (ms):                          8115.94   
Median TTFT (ms):                        7694.97   
P99 TTFT (ms):                           29780.83  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          30.93     
Median TPOT (ms):                        31.06     
P99 TPOT (ms):                           36.83     
---------------Inter-token Latency----------------
Mean ITL (ms):                           30.90     
Median ITL (ms):                         17.51     
P99 ITL (ms):                            610.61    
----------------End-to-end Latency----------------
Mean E2EL (ms):                          54472.78  
Median E2EL (ms):                        54733.45  
P99 E2EL (ms):                           67343.39  
-------------Infer End-to-end Latency-------------
Mean S_E2EL (ms):                        47336.72  
Median S_E2EL (ms):                      47152.32  
P99 S_E2EL (ms):                         56024.31  
==================================================

after

============ Serving Benchmark Result ============
Successful requests:                     2350      
Benchmark duration (s):                  948.40    
Total input tokens:                      6591406   
Total generated tokens:                  3525000   
Request throughput (req/s):              2.478     
Output token throughput (tok/s):         3716.79   
Total Token throughput (tok/s):          10666.82  
-------------------解码速度(tok/s)--------------------
Mean Decode:                             34.02     
Median Decode:                           33.08     
P99 Decode:                              60.80     
---------------Time to First Token----------------
Mean TTFT (ms):                          6422.78   
Median TTFT (ms):                        5483.12   
P99 TTFT (ms):                           26078.27  
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          29.78     
Median TPOT (ms):                        30.20     
P99 TPOT (ms):                           34.54     
---------------Inter-token Latency----------------
Mean ITL (ms):                           29.76     
Median ITL (ms):                         17.39     
P99 ITL (ms):                            542.75    
----------------End-to-end Latency----------------
Mean E2EL (ms):                          51059.96  
Median E2EL (ms):                        51309.52  
P99 E2EL (ms):                           70451.84  
-------------Infer End-to-end Latency-------------
Mean S_E2EL (ms):                        45460.27  
Median S_E2EL (ms):                      45970.20  
P99 S_E2EL (ms):                         51969.68    
==================================================

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…ayers

paddle-bot · 2026-02-02T03:36:59Z

Thanks for your contribution!

codecov-commenter · 2026-02-02T05:06:34Z

Codecov Report

❌ Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.4@e677cd5). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...del_executor/models/qwen2_5_vl/dfnrope/modeling.py	95.23%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.4    #6300   +/-   ##
==============================================
  Coverage               ?   57.07%           
==============================================
  Files                  ?      331           
  Lines                  ?    41271           
  Branches               ?     6285           
==============================================
  Hits                   ?    23554           
  Misses                 ?    15872           
  Partials               ?     1845

Flag	Coverage Δ
GPU	`57.07% <95.45%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ayers

[Cherry-Pick] [Optimize] Qwen2.5-VL vision model with merged linear l…

b61073c

…ayers

xiaoxiaohehe001 had a problem deploying to Metax_ci February 2, 2026 03:36 — with GitHub Actions Failure

xiaoxiaohehe001 changed the title ~~[Cherry-Pick][Optimize] Qwen2.5-VL vision model with merged linear l…~~ [Cherry-Pick][Optimize] Qwen2.5-VL vision model #6037 Feb 2, 2026

xiaoxiaohehe001 changed the title ~~[Cherry-Pick][Optimize] Qwen2.5-VL vision model #6037~~ [Cherry-Pick][Optimize] Qwen2.5-VL vision model https://github.com/PaddlePaddle/FastDeploy/pull/6037 Feb 2, 2026

xiaoxiaohehe001 changed the title ~~[Cherry-Pick][Optimize] Qwen2.5-VL vision model https://github.com/PaddlePaddle/FastDeploy/pull/6037~~ [Cherry-Pick][Optimize] Qwen2.5-VL vision model Feb 2, 2026

xiaoxiaohehe001 changed the title ~~[Cherry-Pick][Optimize] Qwen2.5-VL vision model~~ [Cherry-Pick][Optimize] Qwen2.5-VL vision model Feb 2, 2026

[Cherry-Pick] [Optimize] Qwen2.5-VL vision model with merged linear l…

8963f3f

…ayers

xiaoxiaohehe001 had a problem deploying to Metax_ci February 2, 2026 08:48 — with GitHub Actions Failure

Merge branch 'release/2.4' into release24_qwenvl

32a9010

xiaoxiaohehe001 had a problem deploying to Metax_ci February 2, 2026 11:45 — with GitHub Actions Failure

xiaoxiaohehe001 closed this Feb 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick][Optimize] Qwen2.5-VL vision model #6300

[Cherry-Pick][Optimize] Qwen2.5-VL vision model #6300

Uh oh!

xiaoxiaohehe001 commented Feb 2, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Feb 2, 2026

Uh oh!

codecov-commenter commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Cherry-Pick][Optimize] Qwen2.5-VL vision model #6300

[Cherry-Pick][Optimize] Qwen2.5-VL vision model #6300

Uh oh!

Conversation

xiaoxiaohehe001 commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Feb 2, 2026

Uh oh!

codecov-commenter commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xiaoxiaohehe001 commented Feb 2, 2026 •

edited

Loading

codecov-commenter commented Feb 2, 2026 •

edited

Loading