Skip to content

[Fearture] Support cache kv cache for output tokens#4535

Merged
Jiang-Jia-Jun merged 9 commits intoPaddlePaddle:developfrom
rainyfly:support_cache_output
Dec 4, 2025
Merged

[Fearture] Support cache kv cache for output tokens#4535
Jiang-Jia-Jun merged 9 commits intoPaddlePaddle:developfrom
rainyfly:support_cache_output

Conversation

@rainyfly
Copy link
Collaborator

@rainyfly rainyfly commented Oct 22, 2025

Motivation

  1. In prefix caching, support cache kv cache for output tokens.

Modifications

  1. Enable cahing output tokens by default if enable prefix caching.

Usage or Command

How to enable:
--enable-output-caching

How to disable:
--no-enable-output-caching

Accuracy Tests

None

Checklist

None

@paddle-bot
Copy link

paddle-bot bot commented Oct 22, 2025

Thanks for your contribution!

Copilot AI review requested due to automatic review settings December 2, 2025 10:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for caching KV cache for output tokens when prefix caching is enabled in the V1 scheduler. The feature aims to improve cache efficiency by allowing the system to cache generated output tokens in addition to input prompt tokens.

Key Changes:

  • Added enable_output_caching configuration flag to control output token caching behavior
  • Implemented automatic caching of output tokens at block boundaries in the token processor
  • Added test coverage for the new caching functionality

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
fastdeploy/engine/args_utils.py Added enable_output_caching CLI argument and configuration field (default: True)
fastdeploy/config.py Added enable_output_caching field to CacheConfig class with documentation
fastdeploy/engine/sched/resource_manager_v1.py Implemented cache_output_tokens() method to update cache blocks for output tokens
fastdeploy/output/token_processor.py Integrated output caching logic to automatically cache tokens at block boundaries
tests/v1/test_schedule_output.py Added test_caching_output() test case to verify output token caching behavior
tests/output/test_process_batch_output.py Updated mock CacheConfig class to include new caching configuration fields
tests/output/test_get_save_output_v1.py Updated mock CacheConfig class to include new caching configuration fields

Important Notes:

  1. PR Title Issue: The title contains a spelling error - "Fearture" should be "Feature"
  2. Code Issues Found: Several bugs were identified in the implementation and test code, including inconsistent flag checking and incorrect return value handling in tests

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 63.63636% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@209006e). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/output/token_processor.py 0.00% 2 Missing and 1 partial ⚠️
fastdeploy/engine/sched/resource_manager_v1.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #4535   +/-   ##
==========================================
  Coverage           ?   58.60%           
==========================================
  Files              ?      325           
  Lines              ?    40283           
  Branches           ?     6100           
==========================================
  Hits               ?    23606           
  Misses             ?    14792           
  Partials           ?     1885           
Flag Coverage Δ
GPU 58.60% <63.63%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Jiang-Jia-Jun Jiang-Jia-Jun merged commit 3878a99 into PaddlePaddle:develop Dec 4, 2025
15 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants