Skip to content

fix: cache-friendly feed injection ordering in cllama#123

Merged
mostlydev merged 2 commits intomasterfrom
fix/issue-122-cache-friendly-injection
Apr 7, 2026
Merged

fix: cache-friendly feed injection ordering in cllama#123
mostlydev merged 2 commits intomasterfrom
fix/issue-122-cache-friendly-injection

Conversation

@mostlydev
Copy link
Copy Markdown
Owner

Summary

  • Reverses feed/time injection order in cllama so the static system prompt is the prefix (cacheable) and dynamic content (feeds, timestamp) is appended after
  • Fixes Anthropic prompt cache thrashing that was costing ~$60/week on the tiverton-house pod ($91.56/week → estimated ~$30/week)

Root Cause

InjectAnthropic and InjectOpenAI prepended dynamic content before the static system prompt. Anthropic prompt caching is prefix-matched (5-min TTL), so the volatile prefix invalidated the cache on every request — causing cache_creation_input_tokens at 1.25x cost instead of cache_read_input_tokens at 0.1x cost for the ~120K static system prompt.

Changed Files (cllama submodule)

  • internal/feeds/inject.go — swap prepend → append in 3 code paths (OpenAI string, Anthropic string, Anthropic content-blocks)
  • internal/feeds/inject_test.go — update assertions to expect system-prompt-first order
  • internal/proxy/handler_test.go — update 2 handler tests from HasPrefix to Contains + ordering assertion

Test plan

  • All cllama unit tests pass (go test ./... — 11 packages)
  • Deploy to tiverton-house and verify cache_read ratio improves over 24h via cllama session-history JSONL analysis

Fixes #122

🤖 Generated with Claude Code

Points cllama submodule at fix that appends feeds/time after the static
system prompt instead of prepending. Keeps the system prompt as a stable
prefix for Anthropic prompt caching (~$60/week savings on tiverton pod).

Fixes #122
@mostlydev mostlydev merged commit 99e7feb into master Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cllama feed injection breaks Anthropic prompt cache

1 participant