cllama feed injection breaks Anthropic prompt cache

## Problem

`InjectAnthropic` and `InjectOpenAI` in `cllama/internal/feeds/inject.go` **prepend** dynamic content (feeds, timestamp) before the static system prompt. Anthropic's prompt cache is prefix-matched with a 5-minute TTL — since the prefix changes every request, the entire system prompt (~120K tokens for Allen) is re-cached from scratch every time.

Observed on tiverton-house over 7 days:

| Agent | Model | Requests | Cache Create Tokens | Cache Reuse | Weekly Cost |
|-------|-------|----------|-------------------|-------------|-------------|
| Allen | Sonnet 4.6 | 624 | 13.5M | 18% | $52.84 |
| Tiverton | Sonnet 4.6 | 467 | 5.4M | 8% | $21.30 |
| Dundas | Haiku 4.5 | 1,243 | 16.7M | 29% | $17.42 |
| **Total** | | **2,334** | | | **$91.56** |

Cache creation is ~98% of the bill. Median request gap is 15 min (well beyond the 5-min TTL), so almost every request cold-starts the cache. But even when requests burst within 5 min, the volatile prefix (timestamp + feeds) still invalidates it.

## Root Cause

`inject.go` line 87 (OpenAI) and line 118 (Anthropic):
```go
// OpenAI path — prepends feed before existing system message
first["content"] = feedBlock + "\n\n" + existing

// Anthropic path — prepends feed before existing system prompt  
payload["system"] = feedBlock + "\n\n" + s
```

`handler.go` calls inject twice per request:
```go
feeds.InjectAnthropic(payload, feedBlock)                              // feeds prepended
feeds.InjectAnthropic(payload, currentTimeLine(agentCtx, time.Now()))  // time prepended again
```

Final system prompt order: `time (changes every minute) → feeds (change every request) → static system prompt (~120K)`

## Fix

**Append** dynamic content instead of prepending. The static system prompt should come first so Anthropic can cache the prefix:

```
static system prompt (~120K, stable) → feeds (dynamic) → time (dynamic)
```

This means:
1. `InjectAnthropic` and `InjectOpenAI` → append feed block after existing system content
2. Same for the content-blocks path (line 122-130)
3. Handler call order stays the same (feeds then time) — both append

## Expected Savings

With static prefix cached, cache_read (0.1x cost) replaces cache_create (1.25x cost) for the bulk of the system prompt on every request where the TTL hasn't expired. Estimated ~$60/week savings on the current tiverton-house pod.

## Notes

- OpenAI path should also append for consistency, though OpenAI caching semantics differ
- The "no existing system" paths (setting `payload["system"] = feedBlock` directly) are fine — no reordering needed there
- Tests in `inject_test.go` assert prepend order — they'll need updating

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cllama feed injection breaks Anthropic prompt cache #122

Problem

Root Cause

Fix

Expected Savings

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Agent	Model	Requests	Cache Create Tokens	Cache Reuse	Weekly Cost
Allen	Sonnet 4.6	624	13.5M	18%	$52.84
Tiverton	Sonnet 4.6	467	5.4M	8%	$21.30
Dundas	Haiku 4.5	1,243	16.7M	29%	$17.42
Total		2,334			$91.56

cllama feed injection breaks Anthropic prompt cache #122

Description

Problem

Root Cause

Fix

Expected Savings

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions