[Fearture] Support cache kv cache for output tokens#4535
[Fearture] Support cache kv cache for output tokens#4535Jiang-Jia-Jun merged 9 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
…into support_cache_output
…into support_cache_output
There was a problem hiding this comment.
Pull request overview
This pull request adds support for caching KV cache for output tokens when prefix caching is enabled in the V1 scheduler. The feature aims to improve cache efficiency by allowing the system to cache generated output tokens in addition to input prompt tokens.
Key Changes:
- Added
enable_output_cachingconfiguration flag to control output token caching behavior - Implemented automatic caching of output tokens at block boundaries in the token processor
- Added test coverage for the new caching functionality
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
fastdeploy/engine/args_utils.py |
Added enable_output_caching CLI argument and configuration field (default: True) |
fastdeploy/config.py |
Added enable_output_caching field to CacheConfig class with documentation |
fastdeploy/engine/sched/resource_manager_v1.py |
Implemented cache_output_tokens() method to update cache blocks for output tokens |
fastdeploy/output/token_processor.py |
Integrated output caching logic to automatically cache tokens at block boundaries |
tests/v1/test_schedule_output.py |
Added test_caching_output() test case to verify output token caching behavior |
tests/output/test_process_batch_output.py |
Updated mock CacheConfig class to include new caching configuration fields |
tests/output/test_get_save_output_v1.py |
Updated mock CacheConfig class to include new caching configuration fields |
Important Notes:
- PR Title Issue: The title contains a spelling error - "Fearture" should be "Feature"
- Code Issues Found: Several bugs were identified in the implementation and test code, including inconsistent flag checking and incorrect return value handling in tests
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4535 +/- ##
==========================================
Coverage ? 58.60%
==========================================
Files ? 325
Lines ? 40283
Branches ? 6100
==========================================
Hits ? 23606
Misses ? 14792
Partials ? 1885
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Modifications
Usage or Command
How to enable:
--enable-output-caching
How to disable:
--no-enable-output-caching
Accuracy Tests
None
Checklist
None