-
Notifications
You must be signed in to change notification settings - Fork 693
[CP] Glm45 air 2.2 #4072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
[CP] Glm45 air 2.2 #4072
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* update enable chunked_prefill * update code * update code * update code
…ddle#3794) Co-authored-by: Jiang-Jia-Jun <jiangjiajun@baidu.com>
* Update config.py * Update ep.py * Update fused_moe_backend_base.py * Update dynamic_weight_manager.py * Update worker_process.py * fix ci
* Update serving_chat.py * Update serving_completion.py * Update serving_completion.py
…) (PaddlePaddle#3804) * 延迟 import Config * support chunked_prefill * support chunked_prefill
…ePaddle#3810) * speed up eb45 * update
* fix scheduler bug * fix
* add moe noaux_tc tatics in trition backend * fix * add dp config
* Update no_proxy environment variable in CI workflow * Install lsof and kill api_server processes Install lsof tool and kill processes using it.
…se (PaddlePaddle#3855) * Update no_proxy environment variable in CI workflow * Install lsof and kill api_server processes Install lsof tool and kill processes using it. * Update dependency versions for stable release * Update CI script to use stable dependencies
…le#3771) (PaddlePaddle#3835) * fix w4afp8 * 增加集中式配置 * codestyle * fix fa3 append attn
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* Support for async processor added. * remove yappi code
* [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default * [Feature] Set scheduler v1 as default
* fix scheduler bug * fix * Update api_server.py
* add reasoning parser plugin * fix finish reason --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com>
* [DEBUG] Adapt validation for paddleformers==0.2 in release/2.2 * [CI] update paddleformers==0.2 in release/2.2
* disable scheduler v1 in guided decoding * disable scheduler v1 in guided decoding
* add cache queue port * add cache queue port * add cache queue port
* [Feature] Enable prefix caching as default * [Feature] Enable prefix caching as default * Set prefix caching as default * skip dynamic load * fix kill bug * fix kill bug * fix kill bug * fix ci * fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* optimize prefix cache in release22 * optimize prefix cache in release22 * fix worker * fix * fix --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [Bug Fix] Fix mm performance degradation * formate --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com> Co-authored-by: chenjian <1435317881@qq.com>
* Update paddleformers version to 0.2.2 * Update requirements.txt * Update paddleformers version to >=0.2.3
…ePaddle#3888) * fix the bug for real size 0 in cudagraph * fix cache_messager --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* add reasoning parser plugin * fix finish reason * fix default parser --------- Co-authored-by: Yuanle Liu <yuanlehome@163.com> Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [Feature] support rl_tp_degree * add rl_tp_degree in lmhead * add rl_tp_degree in bias * fix split_axis=0 in bias * fix split_axis in weight * fix bias rl_tp_degree * fix bias rl_tp_degree * change attr to dict --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* update best practice docs * add version and v1 loader info
…Paddle#3972) * add v1/models interface related * add model parameters * default model verification * unit test * check model err_msg * unit test * type annotation * model parameter in response * modify document description * modify document description * unit test * verification * verification update * model_name * pre-commit * update test case * update test case * Update tests/entrypoints/openai/test_serving_models.py * Update tests/entrypoints/openai/test_serving_models.py * Update tests/entrypoints/openai/test_serving_models.py * Update tests/entrypoints/openai/test_serving_models.py * Update fastdeploy/entrypoints/openai/serving_models.py * 优化报错信息。 --------- Co-authored-by: yangzichao01 <yangzichao01@baidu.com> Co-authored-by: Yzc216 <101054010+Yzc216@users.noreply.github.com> Co-authored-by: LiqinruiG <37392159+LiqinruiG@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
) * 更新文档 * 【docs】 update readme (PaddlePaddle#4000) * 更新文档 * update readme * update docs * 【FIX】Change the name of sparse attn from moba to plas (PaddlePaddle#3845) * 更新文档 * 更新文档 * 更新文档 * 更新文档 * 修改moba为plas * code style * update ci * code style * update ci * code style --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* fix scheduler bug * fix * Update api_server.py * Update multi_api_server.py * [Fix]
PaddlePaddle#4010) * Fixed the issue of metrics file conflicts between multiple instances on a single machine * Use uuid to name the metrics shared folder * Use uuid to name the metrics shared folder
…addlePaddle#3974) * [Feature] Support mixed deployment with yiyan adapter in release2.2 * [Feature] Support mixed deployment with yiyan adapter in release2.2 * fix metrics * add unit test * add unit test * add unit test * add unit test * add unit test * add unit test
* support glm45_air
…8 triton_moe_backend) (PaddlePaddle#4051)
|
Thanks for your contribution! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cp #4051