Context
Follow-ups from an external code review of the cache-aligned compaction work that landed in v0.8.10 (#580 / 754e8bd4 / crates/tui/src/compaction.rs).
The review's verdict on the change itself was positive: the should_use_cache_aligned_summary gate is conservative (V4-scale window, ≤ 85% of budget, model identifiable), the 512-token instruction buffer is reasonable, and the test cache_aligned_summary_request_preserves_message_prefix validates the structural invariant the cache needs.
The reviewer flagged one nuance worth tracking and one piece of telemetry that would let us see the win in production.
The behavioral fork
The two paths give the model slightly different context for the summary. The cache-aligned path has the model "reading" the conversation as its own history and then being asked to summarize it. The fallback path feeds a reformatted User:/Assistant: transcript with a "You are a helpful assistant that creates concise conversation summaries" system prompt. In practice this shouldn't matter — the V4 models are strong enough that either framing produces equivalent summaries — but it's a behavioral fork worth being aware of.
Concretely:
- Cache-aligned path (
build_cache_aligned_summary_request): replays the original message prefix, sends system: None, appends the instruction as a final user turn.
- Fallback path: reformats
User:/Assistant: into a flat transcript and sends it under a "helpful assistant that creates concise conversation summaries" system prompt.
Two healthy outcomes:
- Document the divergence. Add a doc-comment on
should_use_cache_aligned_summary explaining why the framing diverges and that the empirical bar is "summaries are equivalent in practice on V4."
- Decide whether to unify. Either drop the system prompt from the fallback path so both paths look like "summarize your own conversation," or keep both and codify the rationale.
The missing telemetry
Today the footer chip shows cache-hit % for normal turns, but the compaction summary call is the request we most expect to benefit from the cache-aligned change — and we don't have a per-call number for it. Without it the win is unobservable post-deploy.
Suggested:
- Record the cache-hit rate of the summary call itself and surface it once (transcript line, log, or footer chip) when a compaction completes.
- Optional: emit a one-line debug log under
RUST_LOG=deepseek_cli::core::compaction=debug with path={cache_aligned|fallback}, prompt_tokens=…, cache_hit_pct=… so we can grep production logs.
Suggested follow-ups for v0.8.11
References
Context
Follow-ups from an external code review of the cache-aligned compaction work that landed in v0.8.10 (#580 /
754e8bd4/crates/tui/src/compaction.rs).The review's verdict on the change itself was positive: the
should_use_cache_aligned_summarygate is conservative (V4-scale window, ≤ 85% of budget, model identifiable), the 512-token instruction buffer is reasonable, and the testcache_aligned_summary_request_preserves_message_prefixvalidates the structural invariant the cache needs.The reviewer flagged one nuance worth tracking and one piece of telemetry that would let us see the win in production.
The behavioral fork
Concretely:
build_cache_aligned_summary_request): replays the original message prefix, sendssystem: None, appends the instruction as a finaluserturn.User:/Assistant:into a flat transcript and sends it under a "helpful assistant that creates concise conversation summaries" system prompt.Two healthy outcomes:
should_use_cache_aligned_summaryexplaining why the framing diverges and that the empirical bar is "summaries are equivalent in practice on V4."The missing telemetry
Today the footer chip shows cache-hit % for normal turns, but the compaction summary call is the request we most expect to benefit from the cache-aligned change — and we don't have a per-call number for it. Without it the win is unobservable post-deploy.
Suggested:
RUST_LOG=deepseek_cli::core::compaction=debugwithpath={cache_aligned|fallback}, prompt_tokens=…, cache_hit_pct=…so we can grep production logs.Suggested follow-ups for v0.8.11
should_use_cache_aligned_summary.References
754e8bd4, PR feat(v0.8.10): shell_env hook + toast stack + @-mention frecency + keybindings audit (#456 #439 #441 #559) #572, motivated by token花的是不是有点快? #575 / 推荐一个压缩上下文更省钱的策略 #580.crates/tui/src/compaction.rs—should_use_cache_aligned_summary,build_cache_aligned_summary_request.crates/tui/src/prompts.rs(874e8b4b).