[Feature] mm prefix cache #4554

kevincheng2 · 2025-10-23T03:03:54Z

Motivation

mm prefix cache

启动时需要增加参数：

  python -m fastdeploy.entrypoints.openai.api_server \
       ...
       --enable-prefix-caching \
       --disable-chunked-mm-input

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-10-23T03:03:59Z

Thanks for your contribution!

…x_cache

Copilot

Pull Request Overview

This pull request implements multimodal prefix caching functionality, enabling efficient cache reuse for requests containing images or other multimodal inputs. The key changes involve refactoring the prefix cache manager to handle multimodal data with proper hashing and block management.

Adds multimodal-aware prefix caching with image hash tracking
Introduces disable_chunked_mm_input flag to prevent splitting multimodal inputs across cache blocks
Updates cache hit tracking to use token counts instead of block counts for more accurate metrics

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
fastdeploy/cache_manager/prefix_cache_manager.py	Implements core multimodal prefix caching logic including `mm_match_block`, `mm_build_path`, and hash computation with image keys
fastdeploy/engine/sched/resource_manager_v1.py	Updates cache hit metrics to use token-level granularity instead of block-level
fastdeploy/engine/common_engine.py	Simplifies available blocks calculation by removing multimodal-specific logic
fastdeploy/engine/args_utils.py	Adds `disable_chunked_mm_input` configuration option and removes restriction preventing prefix caching with multimodal models
fastdeploy/config.py	Adds `disable_chunked_mm_input` field to CacheConfig
tests/v1/test_prefix_cache.py	Adds comprehensive tests for multimodal prefix caching functionality
tests/v1/test_revert_blocks.py	Adds tests for block reversion logic when multimodal inputs are chunked

fastdeploy/cache_manager/prefix_cache_manager.py

fastdeploy/engine/args_utils.py

fastdeploy/cache_manager/prefix_cache_manager.py

* mm prefix cache * add _revert_match_blocks * update code * update code * update code * fix bugs * add test case * fix bug * update code * update reserved_dec_block_ids

mm prefix cache

18d1d0f

kevincheng2 added 9 commits October 23, 2025 11:47

add _revert_match_blocks

99fd7dd

update code

048c2b0

update code

e51017b

update code

b15375e

fix bugs

a837863

add test case

334a674

fix bug

a7ba6a1

Merge branch 'feature/experimental_feature_20250908' into eb_mm_prefi…

af0938c

…x_cache

Merge branch 'feature/experimental_feature_20250908' into eb_mm_prefi…

40d6ddf

…x_cache

rainyfly previously approved these changes Nov 19, 2025

View reviewed changes

Jiang-Jia-Jun requested a review from Copilot November 19, 2025 06:33

Copilot started reviewing on behalf of Jiang-Jia-Jun November 19, 2025 06:34 View session

Copilot finished reviewing on behalf of Jiang-Jia-Jun November 19, 2025 06:36

Copilot AI reviewed Nov 19, 2025

View reviewed changes

update code

994890a

kevincheng2 dismissed rainyfly’s stale review via 994890a November 19, 2025 07:41

update reserved_dec_block_ids

cc9b601

Jiang-Jia-Jun approved these changes Nov 19, 2025

View reviewed changes

Jiang-Jia-Jun added the skip-ci: coverage label Nov 19, 2025

Jiang-Jia-Jun merged commit 966297e into PaddlePaddle:feature/experimental_feature_20250908 Nov 19, 2025
13 of 14 checks passed

kevincheng2 deleted the eb_mm_prefix_cache branch January 19, 2026 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] mm prefix cache #4554

[Feature] mm prefix cache #4554

Uh oh!

kevincheng2 commented Oct 23, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Oct 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feature] mm prefix cache #4554

[Feature] mm prefix cache #4554

Uh oh!

Conversation

kevincheng2 commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Oct 23, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kevincheng2 commented Oct 23, 2025 •

edited

Loading