Skip to content

fix(cache): reduce DeepSeek prefix churn from dynamic tool catalog growth#1576

Closed
caulif wants to merge 2 commits into
Hmbown:mainfrom
caulif:codex/fix-cache-prefix-churn
Closed

fix(cache): reduce DeepSeek prefix churn from dynamic tool catalog growth#1576
caulif wants to merge 2 commits into
Hmbown:mainfrom
caulif:codex/fix-cache-prefix-churn

Conversation

@caulif
Copy link
Copy Markdown

@caulif caulif commented May 13, 2026

Summary

DeepSeek's prompt cache only hits when later requests replay the same cached prefix units. In Agent mode, DeepSeek-TUI was still deferring several tools that belong to the normal coding loop, so the model-visible tools array kept growing during read -> edit -> diff/test workflows.

This patch preloads the core coding toolset in Agent mode so those tools are present from turn one instead of being appended later after first use.

What changed

  • Keep these tools loaded by default in Agent mode:
    • write_file
    • edit_file
    • apply_patch
    • git_status
    • git_diff
    • git_show
    • git_log
    • git_blame
    • run_tests
  • Add regression tests that assert:
    • the core coding toolset is not deferred in Agent mode
    • the core coding toolset is active from the first turn
  • Add an analysis document with official links and cache validation:
    • docs/deepseek_cache_prefix_churn.md

Why

DeepSeek's official KV cache docs say prompt caching is enabled by default, and that later requests only hit when they reuse a repeated prefix; because of Sliding Window Attention, matching happens on complete cache prefix units.

Claude Code's official docs describe a similar context-stability principle from the opposite direction: tool definitions are deferred and loaded on demand so the stable prefix stays small and stable.

DeepSeek-TUI already sorts tool partitions deterministically and appends deferred activations to the tail, which is good. The remaining problem was that many tools the project itself treats as first-class coding tools were still deferred, so a normal Agent session kept mutating the visible tool catalog.

Validation

Rust verification

  • cargo fmt --all
  • cargo test -p deepseek-tui --bin deepseek-tui agent_mode_preloads_core_coding_toolset -- --nocapture
  • cargo test -p deepseek-tui --bin deepseek-tui initial_active_tools_include_core_coding_toolset_in_agent_mode -- --nocapture
  • cargo test -p deepseek-tui --bin deepseek-tui non_yolo_mode_retains_default_defer_policy -- --nocapture
  • cargo test -p deepseek-tui --bin deepseek-tui active_tool_list_pushes_deferred_activations_to_the_tail -- --nocapture
  • cargo test -p deepseek-tui --bin deepseek-tui deferred_edit_file_first_use_hydrates_schema_without_execution -- --nocapture
  • cargo test -p deepseek-tui --bin deepseek-tui model_tool_catalog -- --nocapture
  • cargo build -p deepseek-tui
  • cargo test --workspace --all-features (passes for this change set's touched areas, but currently reports two unrelated failures in commands::skills::*)

DeepSeek API cache reproduction

Using the official Chat Completions API with a long repeated prefix:

  • Stable tools on repeat:
    • prompt_cache_hit_tokens = 4096
    • prompt_cache_miss_tokens = 18
  • Expanded tool catalog on repeat:
    • prompt_cache_hit_tokens = 3328
    • prompt_cache_miss_tokens = 791

That is a loss of 768 cache-hit tokens and an increase of 773 miss tokens on the repeated request when the visible tool catalog grows between requests.

Notes

This is a focused first fix. It does not try to solve MCP churn, compaction strategy, or tool-result payload size.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves DeepSeek prompt cache hit rates in Agent mode by preloading core coding tools, such as file editing, git operations, and test execution, rather than deferring them. The changes include new unit tests to verify this behavior and comprehensive documentation detailing the cache churn analysis. The reviewer recommended extracting the list of preloaded tools into a constant for better maintainability and sorting the shell execution tools alphabetically.

Comment on lines 42 to 59
let always_loaded_in_agent_mode = matches!(mode, AppMode::Agent)
&& matches!(
name,
"exec_shell"
"apply_patch"
| "edit_file"
| "exec_shell"
| "exec_shell_wait"
| "exec_shell_interact"
| "exec_wait"
| "exec_interact"
| "git_blame"
| "git_diff"
| "git_log"
| "git_show"
| "git_status"
| "run_tests"
| "write_file"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of tools in always_loaded_in_agent_mode is becoming quite large. While the matches! macro is efficient, consider extracting this list into a named constant (e.g., CORE_CODING_TOOLS) to improve maintainability and allow reuse in tests without duplication.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cad7154: extracted the Agent preload set into AGENT_MODE_PRELOADED_TOOLS and reused it in the tests so the policy and assertions stay in sync.

Comment on lines 47 to 51
| "exec_shell"
| "exec_shell_wait"
| "exec_shell_interact"
| "exec_wait"
| "exec_interact"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The shell execution tools are not strictly sorted alphabetically. For better maintainability as this list grows, consider sorting them (e.g., exec_interact should come before exec_shell).

Suggested change
| "exec_shell"
| "exec_shell_wait"
| "exec_shell_interact"
| "exec_wait"
| "exec_interact"
| "exec_interact"
| "exec_shell"
| "exec_shell_interact"
| "exec_shell_wait"
| "exec_wait"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in cad7154: the shared Agent preload list is now alphabetized, including the shell execution tools.

@Hmbown
Copy link
Copy Markdown
Owner

Hmbown commented May 23, 2026

This PR was opened before the v0.8.41 rebrand and is now stale. Feel free to rebase onto current main and reopen. 鲸鱼兄弟们等你 🐋

@Hmbown Hmbown closed this May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants