Problem
renderMemoryBlocks() injects dynamic content into the system prompt via experimental.chat.system.transform that changes on every API call, invalidating Anthropic's prompt cache for the entire system prompt.
Two issues compound:
1. Timestamp in renderMemoryMetadata() changes every call
// prompt.ts
function renderMemoryMetadata(blocks: MemoryBlock[]): string {
const now = new Date();
return `<memory_metadata>
- The current system date is: ${now.toISOString()}
...`;
}
new Date().toISOString() produces a different string on every invocation. Since Anthropic's prompt caching requires an exact prefix match, this busts the cache every turn.
2. Injection at position 1 maximizes cache damage
// plugin.ts
const insertAt = output.system.length > 0 ? 1 : 0;
output.system.splice(insertAt, 0, xml);
The memory blocks (including the changing timestamp) are spliced early in the system prompt — right after the provider header. Everything after the insertion point (AGENTS.md, tool definitions, ~30-50K tokens) loses its cache hit.
Impact
Measured on a real workload (same opencode command, same project, before/after enabling this plugin):
| Metric |
Without plugin |
With plugin |
cache_read (sonnet, daily) |
77-186M |
29.8M |
cache_write (sonnet, daily) |
4.7-21M |
25.7M |
| Read/Write ratio |
9:1 – 16:1 |
1.2:1 |
Anthropic rate limits count cache writes at full input token price and cache reads at 1/10th. So while raw token counts appeared similar, the effective rate-limit cost increased roughly 10x per token, causing users to hit rolling rate limits much faster than expected.
Suggested fix
-
Remove new Date() from the rendered output. The model already receives the current date from the provider's system header. If needed, inject the date once at plugin init, not per-call.
-
Move injection to the end of the system prompt (.push() instead of .splice(insertAt, 0, ...)). Dynamic content at the end preserves the prefix cache for everything before it. The comment says "insert early for salience" but position in the system prompt has minimal effect on attention — the <memory_blocks> XML tags provide sufficient salience regardless of position.
-
lastModified timestamp only changes when a block is edited — less volatile than now, but still a cache-buster when it does change. Consider omitting it from the rendered string as well, or rounding to the hour.
These changes would preserve the full prompt cache prefix on every turn, restoring the 10-16x cache read/write ratio.
Problem
renderMemoryBlocks()injects dynamic content into the system prompt viaexperimental.chat.system.transformthat changes on every API call, invalidating Anthropic's prompt cache for the entire system prompt.Two issues compound:
1. Timestamp in
renderMemoryMetadata()changes every callnew Date().toISOString()produces a different string on every invocation. Since Anthropic's prompt caching requires an exact prefix match, this busts the cache every turn.2. Injection at position 1 maximizes cache damage
The memory blocks (including the changing timestamp) are spliced early in the system prompt — right after the provider header. Everything after the insertion point (AGENTS.md, tool definitions, ~30-50K tokens) loses its cache hit.
Impact
Measured on a real workload (same opencode command, same project, before/after enabling this plugin):
cache_read(sonnet, daily)cache_write(sonnet, daily)Anthropic rate limits count cache writes at full input token price and cache reads at 1/10th. So while raw token counts appeared similar, the effective rate-limit cost increased roughly 10x per token, causing users to hit rolling rate limits much faster than expected.
Suggested fix
Remove
new Date()from the rendered output. The model already receives the current date from the provider's system header. If needed, inject the date once at plugin init, not per-call.Move injection to the end of the system prompt (
.push()instead of.splice(insertAt, 0, ...)). Dynamic content at the end preserves the prefix cache for everything before it. The comment says "insert early for salience" but position in the system prompt has minimal effect on attention — the<memory_blocks>XML tags provide sufficient salience regardless of position.lastModifiedtimestamp only changes when a block is edited — less volatile thannow, but still a cache-buster when it does change. Consider omitting it from the rendered string as well, or rounding to the hour.These changes would preserve the full prompt cache prefix on every turn, restoring the 10-16x cache read/write ratio.