Problem
Session history (history.jsonl) is append-only with no post-processing hooks. ADR-018 acknowledges retention as future work, and Phase 2/3 plans call for derived memory, summaries, and embeddings — but these are separate concerns in the doc. In practice they're all the same shape: something consumes session history entries and produces a derived artifact or mutates the history.
Rather than building each as a bespoke feature, this issue proposes a general session post-processing layer — a pipeline of pluggable processors that run over session history, with retention/truncation as one of several built-in processors.
Motivating example
On Tiverton, sentinel runs a heartbeat poll every ~9 minutes (144 calls/day). Each entry is ~154 KB (full request + system prompt + response). After ~3 weeks the file is 380 MB, growing ~22 MB/day, and it's almost entirely HEARTBEAT_OK responses with no long-term value. A simple age/size-based retention processor would solve this. But the same plumbing would let us plug in summarization for traders, embeddings for research agents, and cold archival for everyone.
Processor types
Different processors, same interface — consume entries, optionally produce derived output, optionally mutate history:
| Processor |
Purpose |
Output |
| retain |
Drop entries older than N days or when file exceeds N MB |
Mutates history.jsonl |
| summarize |
LLM-driven condensation of stale entries into rolling memory |
Writes to .claw-memory/<agent>/ |
| embed |
Generate embeddings for semantic recall |
Writes to embedding store |
| extract |
Pull structured facts/decisions/tasks into memory |
Writes to memory |
| archive |
Move old entries to cold storage (e.g. xz tarball) |
External blob |
| redact |
Strip secrets/PII before retention or archival |
Mutates entries |
| forward |
Push to external analytics (PostHog LLM analytics, etc.) |
External sink |
Phase 2/3 of ADR-018 (derived memory, summaries, embeddings) fall out naturally as implementations of this interface instead of separate subsystems.
Config surface
session_history:
processors:
- type: retain
max_age_days: 30
max_size_mb: 100
- type: archive
older_than_days: 30
destination: ./cold-storage
- type: summarize
trigger: on_size_threshold
threshold_mb: 50
model: claude-haiku-4-5
output: .claw-memory/{agent}/summary.md
agents:
sentinel:
processors:
- type: retain
max_age_days: 3
trader-dundas:
processors:
- type: summarize
trigger: nightly
- type: embed
trigger: on_write
Processor interface (sketch)
type Processor interface {
Name() string
Triggers() []Trigger // on_write, on_schedule, on_startup, on_size_threshold
Process(ctx context.Context, agent string, entries []Entry) (ProcessResult, error)
}
type ProcessResult struct {
Drop []EntryID // entries to remove from history
Replace map[EntryID]Entry // entries to rewrite (e.g. redaction)
Derived []Artifact // memory, embeddings, summaries, archive blobs
}
Runs out-of-band from the hot path — the recorder keeps writing synchronously, processors run on a schedule or against a backlog so they never block LLM turns.
Implementation notes
- Recorder lives in
cllama/internal/sessionhistory/recorder.go
- The existing
history.index.json (checkpoints every 128 entries) makes time-range seeks cheap — retention and archival can operate incrementally
- Per-agent config plumbs through
CllamaProxyConfig in internal/pod/compose_emit.go
- Running processors in the cllama process keeps them close to the data, but a sidecar model is also viable (read via the existing
/history/{agentID} HTTP endpoint)
References
- ADR-018:
docs/decisions/018-session-history-and-memory-retention.md (Phase 1 complete; Phase 2/3 subsumed by this)
- Implementation plan:
docs/plans/2026-03-26-cllama-session-history.md
Problem
Session history (
history.jsonl) is append-only with no post-processing hooks. ADR-018 acknowledges retention as future work, and Phase 2/3 plans call for derived memory, summaries, and embeddings — but these are separate concerns in the doc. In practice they're all the same shape: something consumes session history entries and produces a derived artifact or mutates the history.Rather than building each as a bespoke feature, this issue proposes a general session post-processing layer — a pipeline of pluggable processors that run over session history, with retention/truncation as one of several built-in processors.
Motivating example
On Tiverton,
sentinelruns a heartbeat poll every ~9 minutes (144 calls/day). Each entry is ~154 KB (full request + system prompt + response). After ~3 weeks the file is 380 MB, growing ~22 MB/day, and it's almost entirelyHEARTBEAT_OKresponses with no long-term value. A simple age/size-based retention processor would solve this. But the same plumbing would let us plug in summarization for traders, embeddings for research agents, and cold archival for everyone.Processor types
Different processors, same interface — consume entries, optionally produce derived output, optionally mutate history:
history.jsonl.claw-memory/<agent>/Phase 2/3 of ADR-018 (derived memory, summaries, embeddings) fall out naturally as implementations of this interface instead of separate subsystems.
Config surface
Processor interface (sketch)
Runs out-of-band from the hot path — the recorder keeps writing synchronously, processors run on a schedule or against a backlog so they never block LLM turns.
Implementation notes
cllama/internal/sessionhistory/recorder.gohistory.index.json(checkpoints every 128 entries) makes time-range seeks cheap — retention and archival can operate incrementallyCllamaProxyConfigininternal/pod/compose_emit.go/history/{agentID}HTTP endpoint)References
docs/decisions/018-session-history-and-memory-retention.md(Phase 1 complete; Phase 2/3 subsumed by this)docs/plans/2026-03-26-cllama-session-history.md