Skip to content

Conversation

@ckl117
Copy link
Collaborator

@ckl117 ckl117 commented Sep 11, 2025

cp #4051

Jiang-Jia-Jun and others added 30 commits August 31, 2025 21:31
* update enable chunked_prefill

* update code

* update code

* update code
* Update config.py

* Update ep.py

* Update fused_moe_backend_base.py

* Update dynamic_weight_manager.py

* Update worker_process.py

* fix ci
* Update serving_chat.py

* Update serving_completion.py

* Update serving_completion.py
…) (PaddlePaddle#3804)

* 延迟 import Config

* support chunked_prefill

* support chunked_prefill
* add moe noaux_tc tatics in trition backend

* fix

* add dp config
* Update no_proxy environment variable in CI workflow

* Install lsof and kill api_server processes

Install lsof tool and kill processes using it.
…se (PaddlePaddle#3855)

* Update no_proxy environment variable in CI workflow

* Install lsof and kill api_server processes

Install lsof tool and kill processes using it.

* Update dependency versions for stable release

* Update CI script to use stable dependencies
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* Support for async processor added.

* remove yappi code
* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default

* [Feature] Set scheduler v1 as default
* fix scheduler bug

* fix

* Update api_server.py
* add reasoning parser plugin

* fix finish reason

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
* [DEBUG] Adapt validation for paddleformers==0.2 in release/2.2

* [CI] update paddleformers==0.2 in release/2.2
* disable scheduler v1 in guided decoding

* disable scheduler v1 in guided decoding
lizhenyun01 and others added 28 commits September 5, 2025 22:29
* add cache queue port

* add cache queue port

* add cache queue port
* [Feature] Enable prefix caching as default

* [Feature] Enable prefix caching as default

* Set prefix caching as default

* skip dynamic load

* fix kill bug

* fix kill bug

* fix kill bug

* fix ci

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* optimize prefix cache in release22

* optimize prefix cache in release22

* fix worker

* fix

* fix

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [Bug Fix] Fix mm performance degradation

* formate

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
Co-authored-by: chenjian <1435317881@qq.com>
* Update paddleformers version to 0.2.2

* Update requirements.txt

* Update paddleformers version to >=0.2.3
…ePaddle#3888)

* fix the bug for real size 0 in cudagraph

* fix cache_messager

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* add reasoning parser plugin

* fix finish reason

* fix default parser

---------

Co-authored-by: Yuanle Liu <yuanlehome@163.com>
Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [Feature] support rl_tp_degree

* add rl_tp_degree in lmhead

* add rl_tp_degree in bias

* fix split_axis=0 in bias

* fix split_axis in weight

* fix bias rl_tp_degree

* fix bias rl_tp_degree

* change attr to dict

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* update best practice docs

* add version and v1 loader info
…Paddle#3972)

* add v1/models interface related

* add model parameters

* default model verification

* unit test

* check model err_msg

* unit test

* type annotation

* model parameter in response

* modify document description

* modify document description

* unit test

* verification

* verification update

* model_name

* pre-commit

* update test case

* update test case

* Update tests/entrypoints/openai/test_serving_models.py



* Update tests/entrypoints/openai/test_serving_models.py



* Update tests/entrypoints/openai/test_serving_models.py



* Update tests/entrypoints/openai/test_serving_models.py



* Update fastdeploy/entrypoints/openai/serving_models.py



* 优化报错信息。

---------

Co-authored-by: yangzichao01 <yangzichao01@baidu.com>
Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com>
Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
)

* 更新文档

* 【docs】 update readme (PaddlePaddle#4000)

* 更新文档

* update readme

* update docs

* 【FIX】Change the name of sparse attn from moba to plas (PaddlePaddle#3845)

* 更新文档

* 更新文档

* 更新文档

* 更新文档

* 修改moba为plas

* code style

* update ci

* code style

* update ci

* code style

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* fix scheduler bug

* fix

* Update api_server.py

* Update multi_api_server.py

* [Fix]
PaddlePaddle#4010)

* Fixed the issue of metrics file conflicts between multiple instances on a single machine

* Use uuid to name the metrics shared folder

* Use uuid to name the metrics shared folder
…addlePaddle#3974)

* [Feature] Support mixed deployment with yiyan adapter in release2.2

* [Feature] Support mixed deployment with yiyan adapter in release2.2

* fix metrics

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test

* add unit test
@paddle-bot
Copy link

paddle-bot bot commented Sep 11, 2025

Thanks for your contribution!

@ckl117 ckl117 closed this Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.