Skip to content

Compaction: align summary framing across cache-aligned and fallback paths + add cache-hit telemetry #584

@Hmbown

Description

@Hmbown

Context

Follow-ups from an external code review of the cache-aligned compaction work that landed in v0.8.10 (#580 / 754e8bd4 / crates/tui/src/compaction.rs).

The review's verdict on the change itself was positive: the should_use_cache_aligned_summary gate is conservative (V4-scale window, ≤ 85% of budget, model identifiable), the 512-token instruction buffer is reasonable, and the test cache_aligned_summary_request_preserves_message_prefix validates the structural invariant the cache needs.

The reviewer flagged one nuance worth tracking and one piece of telemetry that would let us see the win in production.

The behavioral fork

The two paths give the model slightly different context for the summary. The cache-aligned path has the model "reading" the conversation as its own history and then being asked to summarize it. The fallback path feeds a reformatted User:/Assistant: transcript with a "You are a helpful assistant that creates concise conversation summaries" system prompt. In practice this shouldn't matter — the V4 models are strong enough that either framing produces equivalent summaries — but it's a behavioral fork worth being aware of.

Concretely:

  • Cache-aligned path (build_cache_aligned_summary_request): replays the original message prefix, sends system: None, appends the instruction as a final user turn.
  • Fallback path: reformats User:/Assistant: into a flat transcript and sends it under a "helpful assistant that creates concise conversation summaries" system prompt.

Two healthy outcomes:

  1. Document the divergence. Add a doc-comment on should_use_cache_aligned_summary explaining why the framing diverges and that the empirical bar is "summaries are equivalent in practice on V4."
  2. Decide whether to unify. Either drop the system prompt from the fallback path so both paths look like "summarize your own conversation," or keep both and codify the rationale.

The missing telemetry

Today the footer chip shows cache-hit % for normal turns, but the compaction summary call is the request we most expect to benefit from the cache-aligned change — and we don't have a per-call number for it. Without it the win is unobservable post-deploy.

Suggested:

  • Record the cache-hit rate of the summary call itself and surface it once (transcript line, log, or footer chip) when a compaction completes.
  • Optional: emit a one-line debug log under RUST_LOG=deepseek_cli::core::compaction=debug with path={cache_aligned|fallback}, prompt_tokens=…, cache_hit_pct=… so we can grep production logs.

Suggested follow-ups for v0.8.11

  • Add a test (or runtime sanity check in debug builds) asserting that summaries from both paths are roughly equivalent — e.g. length within ±25% and both contain the user's original goal phrase.
  • Decide whether to unify the framing or document the divergence; land whichever you choose with a doc-comment on should_use_cache_aligned_summary.
  • Surface the compaction-summary call's cache-hit rate on completion (transcript, footer chip, or log line) so the V4 cache win is observable.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    compactionContext management / compactionv0.8.11Targeting v0.8.11

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions