Support EB5 mini block-wise FP8 inference. by K11OntheBoat · Pull Request #3 · zoooo0820/FastDeploy

K11OntheBoat · 2025-12-15T11:54:11Z

No description provided.

Co-authored-by: ddchenhao66 <dhaochen163.com>

* llguidance * add requirements_guided_decoding.txt and doc * fix test_guidance_*.py * fix test_guidance_*.py && mv * fix llguidance choice * test_guidance_* * rm lazy loader --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

* commit * commit * commit * commit * commit * commit

…k_scale (PaddlePaddle#5362)

…iton and safetensors, del fastsafetensors (PaddlePaddle#5330) * Remove version constraints for setuptools, triton, and fastsafetensors. * remove version for uvicorn * fix according to review

* fix * fix * fix test * fix gpu_model_runner --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

…fix moe all_reduce issue (PaddlePaddle#5357)

…le#5138) * [Models] Add forward_meta to moe models' forward function * fix missing param * fix * fix * fix forward_meta * fix test and remove chunked MoE releated in config * fix test * fix * fix

…addlePaddle#5197) * [fix] support DP via v1 router and decouple DP and EP * [fix] fix scripts * [fix] reset model path * [fix] dp use get_output_ep, fix router port type, update scripts * [merge] merge with latest code * [chore] remove some debug log * [fix] fix code style check * [fix] fix test_multi_api_server for log_dir name * [chore] reduce logs * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…le#5368) * [doc] update FAQ with logprobs MQ limits and deprecation * [doc] update FAQ with logprobs MQ limits and deprecation * update faq

…ePaddle#5367) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [Fearture] Support cache kv cache for output tokens * fix bug * fix ci bug * improve coverage * enable output caching by default * fix ci --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

…e#5145)

* [BugFix] fix data_processor asyn bug * fix bug

* Enhance run_ci_xpu.sh with caching and prefill options * Update model path and configuration in run_ci_xpu.sh * Add '北朝' keyword to assertion in run_45vl.py * Enhance process termination logic in run_ci_xpu.sh * Set timeout for CI_XPU job to 60 minutes * Remove extra newline in stop_processes function * Update paddlepaddle-xpu installation command Comment out the previous paddlepaddle-xpu installation command and replace it with a specific version installation due to EP parallel error. * Update PaddlePaddle installation command

…ddle#5078)

…toring. (PaddlePaddle#5518) * support spec metrics monitor per request * fix bug * remove debug log * fix ut bugs

…cessor/image_preprocessor/image_preprocessor_adaptive.py 单测补充 (PaddlePaddle#5265) * test: add unit tests for image_preprocessor_adaptive.py (NO.25) * refactor: merge redundant test functions in test_image_preprocessor_adaptive.py * fix: fix codestyle issues - remove extra blank lines * update * update --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>

…n different prefill (PaddlePaddle#5514) * Distinguish the pipelines for sending kv signal in different prefill * up

…ddlePaddle#5524)

* [XPU] add speculate_step_system_cache * [XPU] add speculate_step_system_cache * [XPU] add speculate_get_logits * delete context * add ptr check --------- Co-authored-by: cmcamdy <1027740945@qq.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

…ddlePaddle#5486)

…PaddlePaddle#5506) * bugfix reschedule_preempt_task append waiting & PREEMPTED blocksize * bugfix reschedule_preempt_task append waiting & PREEMPTED blocksize * 注释 * [bugfix] PREEMPTED task blocksize * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* add mtp case * Refactor test_mtp.py for clarity and efficiency Removed duplicate import of json and simplified spec_config formatting. --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>

…cessor/ernie4_5_vl_processor.py 单测补充 (PaddlePaddle#5263) * test: improve ernie4_5_vl_processor.py test coverage * update * improve coverage * update * fix: correct test expectation for thinking_mode false in test_ernie_vl_processor * remove test_process_request_dict_comprehensive test case --------- Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com>

Co-authored-by: xiaozude <xiaozude@outlook.com>

ddchenhao66 and others added 30 commits December 3, 2025 19:07

[XPU] xpu support mm prefix cache (PaddlePaddle#5356)

4e8096b

Co-authored-by: ddchenhao66 <dhaochen163.com>

[Feature] support audio tts (PaddlePaddle#5333)

5f8d4ae

[FIX BUG] fix bug in TP in permute_x_fp8_kernel (PaddlePaddle#5350)

a36d60a

* commit * commit * commit * commit * commit * commit

[BugFix] dynamic cache kv block_wise_fp8 not need create layer.cache_…

be0c960

…k_scale (PaddlePaddle#5362)

[Optimization] Remove version constraints for setuptools, uvicorn, tr…

96ff402

…iton and safetensors, del fastsafetensors (PaddlePaddle#5330) * Remove version constraints for setuptools, triton, and fastsafetensors. * remove version for uvicorn * fix according to review

fix logprobs (PaddlePaddle#5335)

a52aea0

[Bug fix] fix pooling models (PaddlePaddle#5358)

9460254

* fix * fix * fix test * fix gpu_model_runner --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

[Intel HPU] fix memory fragmentation issue due to warmup process and …

209006e

…fix moe all_reduce issue (PaddlePaddle#5357)

Reduce timeout in unittest (PaddlePaddle#5366)

f5bdb36

[Models] Add forward_meta to moe models' forward function (PaddlePadd…

5cd17fd

…le#5138) * [Models] Add forward_meta to moe models' forward function * fix missing param * fix * fix * fix forward_meta * fix test and remove chunked MoE releated in config * fix test * fix * fix

[Docs] update FAQ with logprobs MQ limits and deprecation (PaddlePadd…

3697110

…le#5368) * [doc] update FAQ with logprobs MQ limits and deprecation * [doc] update FAQ with logprobs MQ limits and deprecation * update faq

[BugFix] Exit if neither modern nor legacy wheel dir not found (Paddl…

f88c159

…ePaddle#5367) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

[CE]change yaml name

b7e1e6c

remove fastsafetensors (PaddlePaddle#5371)

41c63f6

[fix] update check_model_weights_status loop (PaddlePaddle#5249)

b6f8069

fix get_request from scheduler (PaddlePaddle#5369)

7f4fff4

[CI] disable test_schedule_output.py in unit_test (PaddlePaddle#5377)

1b5fd79

deepseek torch (PaddlePaddle#5373)

620d1da

[XPU] [Optimization] [EP] EP communication optimization. (PaddlePaddl…

e927c65

…e#5145)

[BugFix] Compatible with asynchronous functions (PaddlePaddle#5378)

dd2e9a1

* [BugFix] fix data_processor asyn bug * fix bug

[XPU] support XDNN downloading function (PaddlePaddle#5365)

7b0b6e4

[Intel HPU] fix bug about RP 5138 (PaddlePaddle#5380)

ebe613c

fix split_rope_cache_kv_encoder in mix mtp (PaddlePaddle#5384)

86b6430

[BugFix] Fix flash_attn_backend

d436640

fix trace log (PaddlePaddle#5386)

1aefbef

[Feature] support Two batch overlap, mainly used in Prefill (PaddlePa…

c83dc58

…ddle#5078)

Deleter-D and others added 27 commits December 12, 2025 12:22

[Feature] Support for request-level speculative decoding metrics moni…

909059c

…toring. (PaddlePaddle#5518) * support spec metrics monitor per request * fix bug * remove debug log * fix ut bugs

[Metax] add ci yaml (PaddlePaddle#5520)

f32e331

Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>

[PD Disaggregation] Distinguish the pipelines for sending kv signal i…

d67388a

…n different prefill (PaddlePaddle#5514) * Distinguish the pipelines for sending kv signal in different prefill * up

fix mtp multi batch (PaddlePaddle#5521)

6cc3cb4

[Models] Add forward_meta to VocabParallelEmbedding of all models (Pa…

4eb5533

…ddlePaddle#5524)

[XPU] refactor of block_attn param 'pos_emb_type' (PaddlePaddle#5511)

888c4b9

[Doc]add text/vl cinn ce config (PaddlePaddle#5532)

13cc7da

[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (Pa…

a389bb7

…ddlePaddle#5486)

[CI][XPU] add mtp case (PaddlePaddle#5537)

9211977

* add mtp case * Refactor test_mtp.py for clarity and efficiency Removed duplicate import of json and simplified spec_config formatting. --------- Co-authored-by: root <root@gajl-bbc-onlinec-com-1511972.gajl.baidu.com>

fix encoder cache bug (PaddlePaddle#5528)

bebd722

[Graph Optimization][CI] Add ERNIE45T 21B sot test (PaddlePaddle#5538)

d01cb27

[Others] Clean code (PaddlePaddle#5543)

722de5a

[Metax] fix release2.4 and support cudagraph (PaddlePaddle#5547)

77f8ba0

Co-authored-by: xiaozude <xiaozude@outlook.com>

add check health in FD (PaddlePaddle#5534)

7b0fdf7

[CE]add pd router and wint4 tp4 config (PaddlePaddle#5554)

97e340e

Deepgemm暂时可用版本

a727c33

dense部分 e8m0 ok

c65293c

EB模型E8M0跑通的版本

95328b2

code check

30dbd3a

support 21b-tp2, dev_paddle

ffec0a1

单机4.5T ep OK的版本

80ac9d3

修复删除的代码,单机4.5T ep(非cudagraph)

dda4bbd

eb tp

8c29bb1

K11OntheBoat force-pushed the eb_sm100_fp8 branch from 7011be9 to 8c29bb1 Compare December 16, 2025 11:17

Support SM100 block-wise FP8 inference

c1c7658

K11OntheBoat force-pushed the eb_sm100_fp8 branch from 276487f to c1c7658 Compare December 18, 2025 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support EB5 mini block-wise FP8 inference.#3

Support EB5 mini block-wise FP8 inference.#3
K11OntheBoat wants to merge 130 commits intozoooo0820:deepgemm_sm100from
K11OntheBoat:eb_sm100_fp8

K11OntheBoat commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

K11OntheBoat commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants