Skip to content

Support EB5 mini block-wise FP8 inference.#3

Open
K11OntheBoat wants to merge 130 commits intozoooo0820:deepgemm_sm100from
K11OntheBoat:eb_sm100_fp8
Open

Support EB5 mini block-wise FP8 inference.#3
K11OntheBoat wants to merge 130 commits intozoooo0820:deepgemm_sm100from
K11OntheBoat:eb_sm100_fp8

Conversation

@K11OntheBoat
Copy link
Collaborator

No description provided.

ddchenhao66 and others added 30 commits December 3, 2025 19:07
Co-authored-by: ddchenhao66 <dhaochen163.com>
* llguidance

* add requirements_guided_decoding.txt and doc

* fix test_guidance_*.py

* fix test_guidance_*.py && mv

* fix llguidance choice

* test_guidance_*

* rm lazy loader

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* commit

* commit

* commit

* commit

* commit

* commit
…iton and safetensors, del fastsafetensors (PaddlePaddle#5330)

* Remove version constraints for setuptools, triton, and fastsafetensors.

* remove version for uvicorn

* fix according to review
* fix

* fix

* fix test

* fix gpu_model_runner

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
…le#5138)

* [Models] Add forward_meta to moe models' forward function

* fix missing param

* fix

* fix

* fix forward_meta

* fix test and remove chunked MoE releated in config

* fix test

* fix

* fix
…addlePaddle#5197)

* [fix] support DP via v1 router and decouple DP and EP

* [fix] fix scripts

* [fix] reset model path

* [fix] dp use get_output_ep, fix router port type, update scripts

* [merge] merge with latest code

* [chore] remove some debug log

* [fix] fix code style check

* [fix] fix test_multi_api_server for log_dir name

* [chore] reduce logs

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…le#5368)

* [doc] update FAQ with logprobs MQ limits and deprecation

* [doc] update FAQ with logprobs MQ limits and deprecation

* update faq
…ePaddle#5367)

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [Fearture] Support cache kv cache for output tokens

* fix bug

* fix ci bug

* improve coverage

* enable output caching by default

* fix ci

---------

Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* [BugFix] fix data_processor asyn bug

* fix bug
* Enhance run_ci_xpu.sh with caching and prefill options

* Update model path and configuration in run_ci_xpu.sh

* Add '北朝' keyword to assertion in run_45vl.py

* Enhance process termination logic in run_ci_xpu.sh

* Set timeout for CI_XPU job to 60 minutes

* Remove extra newline in stop_processes function

* Update paddlepaddle-xpu installation command

Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error.

* Update PaddlePaddle installation command
Deleter-D and others added 27 commits December 12, 2025 12:22
…toring. (PaddlePaddle#5518)

* support spec metrics monitor per request

* fix bug

* remove debug log

* fix ut bugs
…cessor/image_preprocessor/image_preprocessor_adaptive.py 单测补充 (PaddlePaddle#5265)

* test: add unit tests for image_preprocessor_adaptive.py (NO.25)

* refactor: merge redundant test functions in test_image_preprocessor_adaptive.py

* fix: fix codestyle issues - remove extra blank lines

* update

* update

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
…n different prefill (PaddlePaddle#5514)

* Distinguish the pipelines for sending kv signal in different prefill

* up
* [XPU] add speculate_step_system_cache

* [XPU] add speculate_step_system_cache

* [XPU] add speculate_get_logits

* delete context

* add ptr check

---------

Co-authored-by: cmcamdy <1027740945@qq.com>
Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
…PaddlePaddle#5506)

* bugfix reschedule_preempt_task append waiting & PREEMPTED blocksize

* bugfix reschedule_preempt_task append waiting & PREEMPTED blocksize

* 注释

* [bugfix] PREEMPTED task blocksize

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* add mtp case

* Refactor test_mtp.py for clarity and efficiency

Removed duplicate import of json and simplified spec_config formatting.

---------

Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>
…cessor/ernie4_5_vl_processor.py 单测补充 (PaddlePaddle#5263)

* test: improve ernie4_5_vl_processor.py test coverage

* update

* improve coverage

* update

* fix: correct test expectation for thinking_mode false in test_ernie_vl_processor

* remove test_process_request_dict_comprehensive test case

---------

Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>
Co-authored-by: xiaozude <xiaozude@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.