[CP] Glm45 air 2.2 #4072

ckl117 · 2025-09-11T12:34:45Z

* update enable chunked_prefill * update code * update code * update code

…ddle#3794) Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>

* Update config.py * Update ep.py * Update fused_moe_backend_base.py * Update dynamic_weight_manager.py * Update worker_process.py * fix ci

* Update serving_chat.py * Update serving_completion.py * Update serving_completion.py

…ddlePaddle#3817)

…) (PaddlePaddle#3804) * 延迟 import Config * support chunked_prefill * support chunked_prefill

…ePaddle#3810) * speed up eb45 * update

* fix scheduler bug * fix

* add moe noaux_tc tatics in trition backend * fix * add dp config

* Update no_proxy environment variable in CI workflow * Install lsof and kill api_server processes Install lsof tool and kill processes using it.

…se (PaddlePaddle#3855) * Update no_proxy environment variable in CI workflow * Install lsof and kill api_server processes Install lsof tool and kill processes using it. * Update dependency versions for stable release * Update CI script to use stable dependencies

…le#3771) (PaddlePaddle#3835) * fix w4afp8 * 增加集中式配置 * codestyle * fix fa3 append attn

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

* Support for async processor added. * remove yappi code

* [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default

* fix scheduler bug * fix * Update api_server.py

* add reasoning parser plugin * fix finish reason --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>

* [DEBUG] Adapt validation for paddleformers==0.2 in release/2.2 * [CI] update paddleformers==0.2 in release/2.2

* disable scheduler v1 in guided decoding * disable scheduler v1 in guided decoding

* add cache queue port * add cache queue port * add cache queue port

* [Feature] Enable prefix caching as default * [Feature] Enable prefix caching as default * Set prefix caching as default * skip dynamic load * fix kill bug * fix kill bug * fix kill bug * fix ci * fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* optimize prefix cache in release22 * optimize prefix cache in release22 * fix worker * fix * fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* [Bug Fix] Fix mm performance degradation * formate --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: chenjian <1435317881@qq.com>

* Update paddleformers version to 0.2.2 * Update requirements.txt * Update paddleformers version to >=0.2.3

…ePaddle#3888) * fix the bug for real size 0 in cudagraph * fix cache_messager --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* add reasoning parser plugin * fix finish reason * fix default parser --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* [Feature] support rl_tp_degree * add rl_tp_degree in lmhead * add rl_tp_degree in bias * fix split_axis=0 in bias * fix split_axis in weight * fix bias rl_tp_degree * fix bias rl_tp_degree * change attr to dict --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* update best practice docs * add version and v1 loader info

…Paddle#3972) * add v1/models interface related * add model parameters * default model verification * unit test * check model err_msg * unit test * type annotation * model parameter in response * modify document description * modify document description * unit test * verification * verification update * model_name * pre-commit * update test case * update test case * Update tests/entrypoints/openai/test_serving_models.py * Update tests/entrypoints/openai/test_serving_models.py * Update tests/entrypoints/openai/test_serving_models.py * Update tests/entrypoints/openai/test_serving_models.py * Update fastdeploy/entrypoints/openai/serving_models.py * 优化报错信息。 --------- Co-authored-by: yangzichao01 <yangzichao01@baidu.com> Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com> Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

) * 更新文档 * 【docs】 update readme (PaddlePaddle#4000) * 更新文档 * update readme * update docs * 【FIX】Change the name of sparse attn from moba to plas (PaddlePaddle#3845) * 更新文档 * 更新文档 * 更新文档 * 更新文档 * 修改moba为plas * code style * update ci * code style * update ci * code style --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

* fix scheduler bug * fix * Update api_server.py * Update multi_api_server.py * [Fix]

PaddlePaddle#4010) * Fixed the issue of metrics file conflicts between multiple instances on a single machine * Use uuid to name the metrics shared folder * Use uuid to name the metrics shared folder

…addlePaddle#3974) * [Feature] Support mixed deployment with yiyan adapter in release2.2 * [Feature] Support mixed deployment with yiyan adapter in release2.2 * fix metrics * add unit test * add unit test * add unit test * add unit test * add unit test * add unit test

* support glm45_air

…8 triton_moe_backend) (PaddlePaddle#4051)

paddle-bot · 2025-09-11T12:34:52Z

Thanks for your contribution!

Jiang-Jia-Jun and others added 30 commits August 31, 2025 21:31

Update FASTDEPLOY_VERSION to 2.2.0

1953c7c

fix ce build job (PaddlePaddle#3777)

2b0a745

fix ce compile task upload error (PaddlePaddle#3788)

0cdbc95

Fix chunked prefill (PaddlePaddle#3778)

a86b35a

* update enable chunked_prefill * update code * update code * update code

[Feature] Setting number of apiserver workers automatically (PaddlePa…

d1d063e

…ddle#3794) Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>

[Feature] support model weight update in ep (PaddlePaddle#3802)

0f42771

* Update config.py * Update ep.py * Update fused_moe_backend_base.py * Update dynamic_weight_manager.py * Update worker_process.py * fix ci

[BugFix] fix max streaming tokens invalid (PaddlePaddle#3799)

cd09384

* Update serving_chat.py * Update serving_completion.py * Update serving_completion.py

[Executor] Fix bug of import paddle with RLHF (PaddlePaddle#3781) (Pa…

a6c8f17

…ddlePaddle#3817)

Update qwen_vl_processor.py (PaddlePaddle#3806)

5cda326

[BugFix] fix error of import paddle.base.core.Config (PaddlePaddle#3761…

1745101

…) (PaddlePaddle#3804) * 延迟 import Config * support chunked_prefill * support chunked_prefill

[v1loader]Reduce EB300B model loading time (PaddlePaddle#3700) (Paddl…

f975f7d

…ePaddle#3810) * speed up eb45 * update

[BugFix] fix scheduler (PaddlePaddle#3818)

37cb37b

* fix scheduler bug * fix

add reasoning parser plugin (PaddlePaddle#3820)

1968c65

Update installation method for paddlepaddle-xpu (PaddlePaddle#3834)

42402c8

【BugFix】add moe noaux_tc tatics in trition backend (PaddlePaddle#3821)

05b6591

* add moe noaux_tc tatics in trition backend * fix * add dp config

[XPU]Update XPU CI Case (PaddlePaddle#3844)

abcd214

* Update no_proxy environment variable in CI workflow * Install lsof and kill api_server processes Install lsof tool and kill processes using it.

【Fix bug] w4afp8 的nblock固定为256，并且fa3的append attn 增加mask参数 (PaddlePadd…

9213a58

…le#3771) (PaddlePaddle#3835) * fix w4afp8 * 增加集中式配置 * codestyle * fix fa3 append attn

[Bug Fix] Fix bug of multimodal inputs only text (PaddlePaddle#3850)

1432e33

fix port (PaddlePaddle#3865)

b56b015

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

Support for async processor added. (PaddlePaddle#3870)

8c0e7d6

* Support for async processor added. * remove yappi code

fix mem boom in ep (PaddlePaddle#3852)

fbf0e9d

[Bug fix] Fix prompt token ids dtype in v1 (PaddlePaddle#3861)

a0c0351

[bugfix] scheduler (PaddlePaddle#3871)

8550e19

* fix scheduler bug * fix * Update api_server.py

[bug] fix finish reason (PaddlePaddle#3858)

b8d0f1c

* add reasoning parser plugin * fix finish reason --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>

fix DP&&TP (PaddlePaddle#3872)

d40d3a5

[CI] update paddleformers==0.2 in release/2.2 (PaddlePaddle#3828)

afcde19

* [DEBUG] Adapt validation for paddleformers==0.2 in release/2.2 * [CI] update paddleformers==0.2 in release/2.2

[Fix] disable scheduler v1 in guided decoding (PaddlePaddle#3877)

8567ada

* disable scheduler v1 in guided decoding * disable scheduler v1 in guided decoding

paddleformers==0.1.4 (PaddlePaddle#3908)

e9f72df

lizhenyun01 and others added 28 commits September 5, 2025 22:29

[BugFix] fix TaskQueue dp_id in multi node (PaddlePaddle#3919)

2d975e1

update hybrid-mtp-with-ngram (PaddlePaddle#3924)

e2c764f

add cache queue port (PaddlePaddle#3904) (PaddlePaddle#3926)

11b18e5

* add cache queue port * add cache queue port * add cache queue port

[Optimize] optimize prefix cache in release22 (PaddlePaddle#3889)

8d77c1c

* optimize prefix cache in release22 * optimize prefix cache in release22 * fix worker * fix * fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

[BugFix] qwen2.5vl enable_thinking=true bug fix (PaddlePaddle#3920)

c6e2a37

[Fix] when prompt token ids is numpy (PaddlePaddle#3944)

b2bb37d

ignore (PaddlePaddle#3949)

051e4a8

[Feature] support hierarchical cache in v1 (PaddlePaddle#3939)

38e734e

[Bug Fix] Fix mm performance degradation (PaddlePaddle#3942)

d6bf6de

* [Bug Fix] Fix mm performance degradation * formate --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: chenjian <1435317881@qq.com>

Update paddleformers version to >=0.2.3 (PaddlePaddle#3936)

c7c1627

* Update paddleformers version to 0.2.2 * Update requirements.txt * Update paddleformers version to >=0.2.3

[Cherry-Pick][Bug Fix]fix the bug for real size 0 in cudagraph (Paddl…

d435499

…ePaddle#3888) * fix the bug for real size 0 in cudagraph * fix cache_messager --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

[BugFix] fix default parser (PaddlePaddle#3932)

1023a67

* add reasoning parser plugin * fix finish reason * fix default parser --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

update ci (PaddlePaddle#3953)

8903f93

update env docs for Machete (PaddlePaddle#3960)

fa23692

[docs] update best practice docs for release/2.2 (PaddlePaddle#3970)

36a58f4

* update best practice docs * add version and v1 loader info

[Docs] release 2.2.0 (PaddlePaddle#3991)

9340715

更新文档 (PaddlePaddle#3996)

14df2c5

get org_vocab_size from args (PaddlePaddle#3984)

35b8362

Fix down projection weight shape in fused MOE layer (PaddlePaddle#4041)

7272afe

[Fix] fix multi api server log dir (PaddlePaddle#3966)

a6b161b

* fix scheduler bug * fix * Update api_server.py * Update multi_api_server.py * [Fix]

Fixed the issue of metrics file conflicts between multiple instances … (

c4098d5

PaddlePaddle#4010) * Fixed the issue of metrics file conflicts between multiple instances on a single machine * Use uuid to name the metrics shared folder * Use uuid to name the metrics shared folder

[Feature] Support zai-org/GLM-4.5-Air BF16 model (PaddlePaddle#3928)

6a96e57

* support glm45_air

[Feature] GLM-45-AIR Support Mix Quantization(Dense wfp8afp8 and wint…

190104e

…8 triton_moe_backend) (PaddlePaddle#4051)

ckl117 closed this Sep 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CP] Glm45 air 2.2 #4072

[CP] Glm45 air 2.2 #4072

Uh oh!

ckl117 commented Sep 11, 2025

Uh oh!

paddle-bot bot commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[CP] Glm45 air 2.2 #4072

[CP] Glm45 air 2.2 #4072

Uh oh!

Conversation

ckl117 commented Sep 11, 2025

Uh oh!

paddle-bot bot commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants