Skip to content

Conversation

@kevincheng2
Copy link
Collaborator

@kevincheng2 kevincheng2 commented Oct 23, 2025

Motivation

mm prefix cache

启动时需要增加参数:

  python -m fastdeploy.entrypoints.openai.api_server \
       ...
       --enable-prefix-caching \
       --disable-chunked-mm-input

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link

paddle-bot bot commented Oct 23, 2025

Thanks for your contribution!

rainyfly
rainyfly previously approved these changes Nov 19, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements multimodal prefix caching functionality, enabling efficient cache reuse for requests containing images or other multimodal inputs. The key changes involve refactoring the prefix cache manager to handle multimodal data with proper hashing and block management.

  • Adds multimodal-aware prefix caching with image hash tracking
  • Introduces disable_chunked_mm_input flag to prevent splitting multimodal inputs across cache blocks
  • Updates cache hit tracking to use token counts instead of block counts for more accurate metrics

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
fastdeploy/cache_manager/prefix_cache_manager.py Implements core multimodal prefix caching logic including mm_match_block, mm_build_path, and hash computation with image keys
fastdeploy/engine/sched/resource_manager_v1.py Updates cache hit metrics to use token-level granularity instead of block-level
fastdeploy/engine/common_engine.py Simplifies available blocks calculation by removing multimodal-specific logic
fastdeploy/engine/args_utils.py Adds disable_chunked_mm_input configuration option and removes restriction preventing prefix caching with multimodal models
fastdeploy/config.py Adds disable_chunked_mm_input field to CacheConfig
tests/v1/test_prefix_cache.py Adds comprehensive tests for multimodal prefix caching functionality
tests/v1/test_revert_blocks.py Adds tests for block reversion logic when multimodal inputs are chunked

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 966297e into PaddlePaddle:feature/experimental_feature_20250908 Nov 19, 2025
13 of 14 checks passed
Deleter-D pushed a commit to Deleter-D/FastDeploy that referenced this pull request Nov 26, 2025
* mm prefix cache

* add _revert_match_blocks

* update code

* update code

* update code

* fix bugs

* add test case

* fix bug

* update code

* update reserved_dec_block_ids
@kevincheng2 kevincheng2 deleted the eb_mm_prefix_cache branch January 19, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants