System prompt injection busts Anthropic prompt cache on every turn

## Problem

`renderMemoryBlocks()` injects dynamic content into the system prompt via `experimental.chat.system.transform` that changes on every API call, invalidating Anthropic's prompt cache for the entire system prompt.

Two issues compound:

### 1. Timestamp in `renderMemoryMetadata()` changes every call

```typescript
// prompt.ts
function renderMemoryMetadata(blocks: MemoryBlock[]): string {
  const now = new Date();
  return `<memory_metadata>
- The current system date is: ${now.toISOString()}
...`;
}
```

`new Date().toISOString()` produces a different string on every invocation. Since Anthropic's prompt caching requires an exact prefix match, this busts the cache every turn.

### 2. Injection at position 1 maximizes cache damage

```typescript
// plugin.ts
const insertAt = output.system.length > 0 ? 1 : 0;
output.system.splice(insertAt, 0, xml);
```

The memory blocks (including the changing timestamp) are spliced *early* in the system prompt — right after the provider header. Everything after the insertion point (AGENTS.md, tool definitions, ~30-50K tokens) loses its cache hit.

## Impact

Measured on a real workload (same opencode command, same project, before/after enabling this plugin):

| Metric | Without plugin | With plugin |
|---|---|---|
| `cache_read` (sonnet, daily) | 77-186M | 29.8M |
| `cache_write` (sonnet, daily) | 4.7-21M | 25.7M |
| Read/Write ratio | 9:1 – 16:1 | **1.2:1** |

Anthropic rate limits count cache writes at full input token price and cache reads at 1/10th. So while raw token counts appeared similar, the effective rate-limit cost increased roughly **10x per token**, causing users to hit rolling rate limits much faster than expected.

## Suggested fix

1. **Remove `new Date()` from the rendered output.** The model already receives the current date from the provider's system header. If needed, inject the date once at plugin init, not per-call.

2. **Move injection to the end of the system prompt** (`.push()` instead of `.splice(insertAt, 0, ...)`). Dynamic content at the end preserves the prefix cache for everything before it. The comment says "insert early for salience" but position in the system prompt has minimal effect on attention — the `<memory_blocks>` XML tags provide sufficient salience regardless of position.

3. **`lastModified` timestamp only changes when a block is edited** — less volatile than `now`, but still a cache-buster when it does change. Consider omitting it from the rendered string as well, or rounding to the hour.

These changes would preserve the full prompt cache prefix on every turn, restoring the 10-16x cache read/write ratio.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System prompt injection busts Anthropic prompt cache on every turn #8

Problem

1. Timestamp in `renderMemoryMetadata()` changes every call

2. Injection at position 1 maximizes cache damage

Impact

Suggested fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Without plugin	With plugin
`cache_read` (sonnet, daily)	77-186M	29.8M
`cache_write` (sonnet, daily)	4.7-21M	25.7M
Read/Write ratio	9:1 – 16:1	1.2:1

System prompt injection busts Anthropic prompt cache on every turn #8

Description

Problem

1. Timestamp in renderMemoryMetadata() changes every call

2. Injection at position 1 maximizes cache damage

Impact

Suggested fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Timestamp in `renderMemoryMetadata()` changes every call