[WIP][R3] Support Full Async R3 and PrefixCache #6313

gongshaotian · 2026-02-02T13:00:41Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…addle#5408) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot * Apply suggestion from @Copilot * Apply suggestion from @Copilot * Apply suggestion from @Copilot * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl * Revert "Revert "[RL] Support Rollout Routing Replay (PaddlePaddle#5321)" (PaddlePaddle#5402)" This reverts commit c45e064. * Fix XPU and NPU bug --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>

…dleOCR-VL (PaddlePaddle#5413) (PaddlePaddle#5414) * [BugFix] Fix some parameter place on CPU in PaddleOCR-VL * clean log * fix codestyle

…#5423) * fix bug * fix bug

…cess_group for RL (PaddlePaddle#5433) (PaddlePaddle#5434) * [fix] remove shutdown_process_group/restart_process_group for RL * [chore] remove log * [chore] remove log * [chore] set log to debug level

…ddlePaddle#5432)

PaddlePaddle#5448)

* [BugFix] fix instability after clearing weight * [chore] add todo

…Paddle#5492)(PaddlePaddle#5499) (PaddlePaddle#5498) * [BugFix] fix hung when n>1 and --enable-logprob (PaddlePaddle#5492) * check * check * check

…ing is done (PaddlePaddle#5527) (PaddlePaddle#5523) * [fix] fix ep loop * [fix] another try * [fix] again

…ddlePaddle#5486) (PaddlePaddle#5536)

…addlePaddle#5519) * fix dyname load bug * update * update

-

…ePaddle#5578) (PaddlePaddle#5583) * [CI] Remove test_metrics.py due to incompatible forced merge (PaddlePaddle#5578) * [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576)

…dle#5468) * [RL] R3 support rdma store * refine code * refine notes * disable prefix cache * fix ci bug * support preempted task and put cpu tensor

…ddlePaddle#5568) (PaddlePaddle#5597) * fix mtp entropy drop in RL * optimize usage and fix unit test * optimize padding_sampling_params speed(vectorized)

…addlePaddle#5491) (PaddlePaddle#5617) * [liuzichang spend 10 dyas]fix write qknorm cache bug * fix 'fix cachekv bug''

…monitoring.(PaddlePaddle#5518) (PaddlePaddle#5614) * support spec metrics monitor per request

…ddlePaddle#5621)

* [Model] tp+ep support v1_loader * fix * fix mtp_linear * fix mtp_linear * fix * fix * fix v0 loader * fix * Add get_tensor for EP * fix linear weight_loader * fix typo * fix

…lePaddle#6047, PaddlePaddle#6093) (PaddlePaddle#6219)

* Update download_dependencies.sh

…lash_mask_attn PaddlePaddle#6238 (PaddlePaddle#6232) * fash_mask_attn support mixed * enhance deep_ep and fix bug * update * fix

…addlePaddle#6193) * cherry pick * bug fix tool_calls (PaddlePaddle#6166) * fix image gen (PaddlePaddle#6175) * fix unit test

This reverts commit da9b356.

…#6096 (#…" (PaddlePaddle#6253) This reverts commit c424287.

…PaddlePaddle#6120) * fused put routing * fix bug * [draft commit]dynamic dtype * Updated to accommodate uint8 baseline changes * fix async put & numpy bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

…ePaddle#6256) * support glm mtp rl model * update baseline

…itly to avoid pip cache issues (PaddlePaddle#6265)

paddle-bot · 2026-02-02T13:00:51Z

Thanks for your contribution!

bukejiyu and others added 30 commits December 6, 2025 00:47

cp pr5373 pr5379 pr5410 (PaddlePaddle#5411)

7eea23f

[Cherry-Pick][Loader][BugFix] Fix some parameters place on CPU in Pad…

7926add

…dleOCR-VL (PaddlePaddle#5413) (PaddlePaddle#5414) * [BugFix] Fix some parameter place on CPU in PaddleOCR-VL * clean log * fix codestyle

Update setup.py

1dceb1c

[BugFix][Cherry-Pick] fix can not enter into cuda graph (PaddlePaddle…

d4c16aa

…#5423) * fix bug * fix bug

[Cherry-Pick] [BugFix] [RL] remove shutdown_process_group/restart_pro…

31436a3

…cess_group for RL (PaddlePaddle#5433) (PaddlePaddle#5434) * [fix] remove shutdown_process_group/restart_process_group for RL * [chore] remove log * [chore] remove log * [chore] set log to debug level

[BugFix] 0 not into cuda graph to save memory (PaddlePaddle#5426) (Pa…

4b9e2c5

…ddlePaddle#5432)

support dynamic load for normal (PaddlePaddle#5437)

2c55bbc

[Optimization] compulte real max_logprobs in batch (PaddlePaddle#5430) (

b491dcd

PaddlePaddle#5448)

commit (PaddlePaddle#5452)

e9174f2

fix limit_thinking bug (PaddlePaddle#5469)

1776d41

fix attention bug in spec decoding (PaddlePaddle#5481)

c5c43e3

[CI][XPU] ep+prefix cache+chunk prefill (PaddlePaddle#5490)

bcde798

[BugFix] fix instability after clearing weight (PaddlePaddle#5487)

7019afb

* [BugFix] fix instability after clearing weight * [chore] add todo

[CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test

b435639

RL fix (PaddlePaddle#5505)

71781b5

[[Cherry-Pick][BugFix] fix hung when n>1 and --enable-logprob (Paddle…

4e5e36e

…Paddle#5492)(PaddlePaddle#5499) (PaddlePaddle#5498) * [BugFix] fix hung when n>1 and --enable-logprob (PaddlePaddle#5492) * check * check * check

[Cherry-Pick] [BugFix] [RL] skip model executing after clearing/updat…

12e0206

…ing is done (PaddlePaddle#5527) (PaddlePaddle#5523) * [fix] fix ep loop * [fix] another try * [fix] again

[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (Pa…

5bdef76

…ddlePaddle#5486) (PaddlePaddle#5536)

Fix bug for caching output when preempted (PaddlePaddle#5510)

0fa40f5

[Cherry-Pick][BugFix] fix dynamic c8 in v1 loader(PaddlePaddle#5562) (P…

99b4024

…addlePaddle#5519) * fix dyname load bug * update * update

【NewFeature】support load fp8 weight (PaddlePaddle#5566)

9f74233

-

[Cherry-Pick][CI] Adape unit_test due to incompatibility change(Paddl…

53158b7

…ePaddle#5578) (PaddlePaddle#5583) * [CI] Remove test_metrics.py due to incompatible forced merge (PaddlePaddle#5578) * [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576)

[Cherry-Pick][RL] R3 Support RDMA Store(PaddlePaddle#5467) (PaddlePad…

c19af49

…dle#5468) * [RL] R3 support rdma store * refine code * refine notes * disable prefix cache * fix ci bug * support preempted task and put cpu tensor

[Cherry-Pick][CI]Support different inferseed in speculate decoding(Pa…

a7359d1

…ddlePaddle#5568) (PaddlePaddle#5597) * fix mtp entropy drop in RL * optimize usage and fix unit test * optimize padding_sampling_params speed(vectorized)

add detoken switch (PaddlePaddle#5463) (PaddlePaddle#5572)

d67b64d

[Cherry-Pick][CI]Fix write qknorm cache bug in speculative decoding(P…

d7d633a

…addlePaddle#5491) (PaddlePaddle#5617) * [liuzichang spend 10 dyas]fix write qknorm cache bug * fix 'fix cachekv bug''

[Cherry-Pick] Support for request-level speculative decoding metrics …

e56c4dd

…monitoring.(PaddlePaddle#5518) (PaddlePaddle#5614) * support spec metrics monitor per request

[Others] Maintain the mtp branch temporarily. (PaddlePaddle#5446) (Pa…

5300e73

…ddlePaddle#5621)

[Model] tp+ep support v1_loader (PaddlePaddle#5600)

a30a5b4

* [Model] tp+ep support v1_loader * fix * fix mtp_linear * fix mtp_linear * fix * fix * fix v0 loader * fix * Add get_tensor for EP * fix linear weight_loader * fix typo * fix

Deleter-D and others added 26 commits January 27, 2026 10:45

[Cherry-Pick][Speculative Decoding] Support MTP for GLM-4.5-Air (Padd…

c8cf686

…lePaddle#6047, PaddlePaddle#6093) (PaddlePaddle#6219)

[XPU][CI] Release ci update (PaddlePaddle#6212)

1d519b9

* Update download_dependencies.sh

commit

957bd2c

commit

81d77d7

add test tools

53f6fd4

Delete log and refine code

d0b94ec

[Cherry-Pick][Others] enhance deep_ep import and support mixed mode f…

fb7ec62

…lash_mask_attn PaddlePaddle#6238 (PaddlePaddle#6232) * fash_mask_attn support mixed * enhance deep_ep and fix bug * update * fix

[Cherry-Pick] update data_processor & add tool parser plugins#6096 (P…

c424287

…addlePaddle#6193) * cherry pick * bug fix tool_calls (PaddlePaddle#6166) * fix image gen (PaddlePaddle#6175) * fix unit test

1.fix async numpy bug 2. refine code

22f0a5e

profile & refine code

d18d3fc

add note

7e4b3e3

Reapply "[Feature] Unify quant ops (PaddlePaddle#6021)"

2855be9

This reverts commit da9b356.

Revert "[Cherry-Pick] update data_processor & add tool parser plugins…

fccfe57

…#6096 (#…" (PaddlePaddle#6253) This reverts commit c424287.

Merge branch 'r3_prefixcache_2.4' into pr_6203

80b54bf

Update Kernel #13 from zhoutianzi666/pr_6203

d259b56

delete paremeter

a4d4929

fix transpose bug

7b28ea7

[Cherry-Pick][RL] Support GLM MTP RL Model (PaddlePaddle#6223) (Paddl…

4097455

…ePaddle#6256) * support glm mtp rl model * update baseline

fix insert decode task after set stop flag as true

fe80b01

[CI] Remove test_splitwise_scheduler and download latest_wheel explic…

f04ba48

…itly to avoid pip cache issues (PaddlePaddle#6265)

merge 2.4

fa7708b

Success Run StoreWrapper

7bc4f05

WIP support async thread

2e37e94

success run async thread put

a899fc2

fix clear prefix batch bug

20733cd

gongshaotian had a problem deploying to Metax_ci February 2, 2026 13:00 — with GitHub Actions Failure

gongshaotian closed this Feb 2, 2026

gongshaotian had a problem deploying to Metax_ci February 2, 2026 13:01 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][R3] Support Full Async R3 and PrefixCache #6313

[WIP][R3] Support Full Async R3 and PrefixCache #6313

Uh oh!

gongshaotian commented Feb 2, 2026

Uh oh!

paddle-bot bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[WIP][R3] Support Full Async R3 and PrefixCache #6313

[WIP][R3] Support Full Async R3 and PrefixCache #6313

Uh oh!

Conversation

gongshaotian commented Feb 2, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants