feat(agent): add episodic memory persistence/injection across codergen session retries#136
Conversation
Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/74be5335-77b5-446d-b7a7-5495c4069c94 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds “episodic memory” for the codergen agent so tool-attempt history survives session retries/resumes via pipeline context, reducing repeated failing strategies after compaction.
Changes:
- Introduces an
EpisodeLogmodel and session-level episode summary capture/injection in the agent runtime. - Persists the latest
episode_summaryand rollingepisode_summariesin pipeline context and reinjects them into subsequent codergen sessions. - Updates docs + dippin lint reserved-key set, and adds tests covering persistence/injection.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
pipeline/lint_dippin_extra.go |
Reserves new built-in context keys to avoid param/read collisions. |
pipeline/handlers/codergen.go |
Injects prior episode summaries into native sessions and persists new summaries back to context. |
pipeline/handlers/codergen_test.go |
Adds tests validating persistence of episode summaries and reinjection behavior. |
pipeline/context.go |
Defines new built-in context keys for episodic memory. |
docs/pipeline-context-flow.md |
Documents the new built-in context keys and their purpose. |
agent/config.go |
Adds PriorEpisodeSummaries to session configuration. |
agent/session_run.go |
Injects prior attempt summaries into initial session messages; records tool-call episodes. |
agent/session.go |
Stores per-session episode log and publishes EpisodeSummary in SessionResult. |
agent/result.go |
Extends SessionResult with EpisodeSummary. |
agent/memory.go |
Implements episode log, summarization, and JSON (de)serialization helpers. |
agent/memory_test.go |
Tests episode summary generation and parse fallback behavior. |
agent/session_test.go |
Tests tool outcome capture into episode summary and prior summary injection into messages. |
CHANGELOG.md |
Notes the new episodic memory feature in the Unreleased changelog. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n") | ||
| for i, summary := range s.config.PriorEpisodeSummaries { | ||
| if strings.TrimSpace(summary) == "" { | ||
| continue | ||
| } | ||
| b.WriteString(fmt.Sprintf("%d. %s\n", i+1, strings.TrimSpace(summary))) | ||
| } | ||
| s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String()))) |
There was a problem hiding this comment.
When building the "Prior attempts summary" message, the numbering uses the original slice index (i+1) even when empty/whitespace summaries are skipped. This can produce gaps (e.g., "1., 3.") and, if all entries are empty/whitespace, will still inject a header-only user message. Consider filtering to non-empty summaries first (or maintaining a separate counter) and only appending the message if at least one non-empty summary exists.
| b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n") | |
| for i, summary := range s.config.PriorEpisodeSummaries { | |
| if strings.TrimSpace(summary) == "" { | |
| continue | |
| } | |
| b.WriteString(fmt.Sprintf("%d. %s\n", i+1, strings.TrimSpace(summary))) | |
| } | |
| s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String()))) | |
| count := 0 | |
| for _, summary := range s.config.PriorEpisodeSummaries { | |
| trimmed := strings.TrimSpace(summary) | |
| if trimmed == "" { | |
| continue | |
| } | |
| if count == 0 { | |
| b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n") | |
| } | |
| count++ | |
| b.WriteString(fmt.Sprintf("%d. %s\n", count, trimmed)) | |
| } | |
| if count > 0 { | |
| s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String()))) | |
| } |
| func (h *CodergenHandler) injectPriorEpisodes(runCfg pipeline.AgentRunConfig, pctx *pipeline.PipelineContext) []string { | ||
| sc, ok := runCfg.Extra.(*agent.SessionConfig) | ||
| if !ok || sc == nil { | ||
| return nil | ||
| } | ||
| raw, ok := pctx.Get(pipeline.ContextKeyEpisodeSummaries) | ||
| if !ok || strings.TrimSpace(raw) == "" { | ||
| return nil | ||
| } | ||
| sc.PriorEpisodeSummaries = agent.ParseEpisodeSummaries(raw) | ||
| return append([]string(nil), sc.PriorEpisodeSummaries...) | ||
| } | ||
|
|
||
| func (h *CodergenHandler) applyEpisodeContextUpdates(updates map[string]string, sessResult agent.SessionResult, existing []string) { | ||
| if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" { | ||
| return | ||
| } | ||
| updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary | ||
| summaries := append(append([]string(nil), existing...), sessResult.EpisodeSummary) | ||
| updates[pipeline.ContextKeyEpisodeSummaries] = agent.SerializeEpisodeSummaries(summaries) | ||
| } |
There was a problem hiding this comment.
episode_summaries is appended to on every session completion and then fully reinjected into subsequent sessions. As written, this list can grow without bound across retries/resumes, directly increasing prompt size and risking context-window bloat over long runs. Consider enforcing a cap (e.g., keep last N summaries and/or trim to a max total byte/rune budget) both when injecting prior episodes and when persisting updates back into context.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa480103bf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| statusSummary := summarizeEpisodeOutput(output, isError) | ||
| l.Entries = append(l.Entries, EpisodeEntry{ | ||
| Tool: tool, | ||
| Args: compactJSON(args), |
There was a problem hiding this comment.
Truncate tool arguments before recording episodic memory
EpisodeLog.Record stores compactJSON(args) verbatim, and Summary() later injects that Args string back into retry prompts/context. For tools like write/edit, arguments can contain full file contents, so a single run can persist very large payloads (and sensitive text) into episode_summary/episode_summaries, quickly ballooning prompt size and causing retry sessions to hit context limits or leak content unnecessarily.
Useful? React with 👍 / 👎.
| result.ToolTimings = s.toolTimings | ||
| result.ContextUtilization = tracker.Utilization() | ||
| result.EpisodeSummary = s.episodeLog.Summary() |
There was a problem hiding this comment.
Populate EpisodeSummary before returning run errors
Session.Run now computes EpisodeSummary only in the success tail, after runTurnLoop returns nil. When a retryable error happens after some tool calls (for example a transient provider failure on a later turn), the early return result, err path skips this assignment, so CodergenHandler.handleRunError receives an empty summary and cannot persist prior attempt memory for the next retry.
Useful? React with 👍 / 👎.
|
@copilot address the reviewer feedback |
Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/cdc2fd79-c3fa-4d52-b1b8-7459a91874c8 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>
Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/cdc2fd79-c3fa-4d52-b1b8-7459a91874c8 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>
Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/cdc2fd79-c3fa-4d52-b1b8-7459a91874c8 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>
Implemented the reviewer feedback in the latest commits: prior-attempt injection now filters/renumbers non-empty summaries and skips header-only injection, and |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" { | ||
| return | ||
| } | ||
| updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary |
There was a problem hiding this comment.
applyEpisodeContextUpdates returns early when sessResult.EpisodeSummary is empty, which means a subsequent run with no tool calls will not update ctx.episode_summary and the context will retain a stale value from a previous run. Consider always writing ContextKeyEpisodeSummary (set to "" when no episodes) so the key accurately reflects the most recent session, and only append to episode_summaries when the new summary is non-empty.
| if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" { | |
| return | |
| } | |
| updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary | |
| if updates == nil { | |
| return | |
| } | |
| updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary | |
| if strings.TrimSpace(sessResult.EpisodeSummary) == "" { | |
| return | |
| } |
| func normalizeEpisodeSummaries(in []string) []string { | ||
| out := make([]string, 0, len(in)) | ||
| runeLens := make([]int, 0, len(in)) | ||
| for _, s := range in { | ||
| s = strings.TrimSpace(s) | ||
| if s != "" { | ||
| out = append(out, s) | ||
| runeLens = append(runeLens, len([]rune(s))) | ||
| } | ||
| } | ||
| if len(out) > maxEpisodeSummaryCount { | ||
| start := len(out) - maxEpisodeSummaryCount | ||
| out = out[start:] | ||
| runeLens = runeLens[start:] | ||
| } | ||
| totalRunes := 0 | ||
| for _, n := range runeLens { | ||
| totalRunes += n | ||
| } | ||
| for len(out) > 1 && totalRunes > maxEpisodeSummaryTotalRunes { |
There was a problem hiding this comment.
normalizeEpisodeSummaries only drops items while len(out) > 1, so a single very large summary can still exceed maxEpisodeSummaryTotalRunes and be serialized/injected unbounded. To actually cap growth, consider truncating any individual summary that exceeds the rune budget (or enforcing a per-summary rune limit) so even a single entry cannot blow up context size.
| // Record appends a tool-call episode. | ||
| func (l *EpisodeLog) Record(tool, args, output string, isError bool) { | ||
| statusSummary := summarizeEpisodeOutput(output, isError) | ||
| l.Entries = append(l.Entries, EpisodeEntry{ | ||
| Tool: tool, | ||
| Args: compactJSON(args), | ||
| Success: !isError, | ||
| Summary: statusSummary, | ||
| }) |
There was a problem hiding this comment.
EpisodeLog.Record stores compactJSON(args) without any length cap, and EpisodeLog.Summary prints args inline. Tool-call arguments can be arbitrarily large (e.g., write/edit content), which can make EpisodeSummary huge and defeat the episodic-memory size caps. Consider truncating/summarizing args (and/or omitting high-entropy fields like content) before storing/printing so episode summaries stay predictably small.
| if len(s.config.PriorEpisodeSummaries) > 0 { | ||
| var b strings.Builder | ||
| nonEmpty := make([]string, 0, len(s.config.PriorEpisodeSummaries)) | ||
| for _, summary := range s.config.PriorEpisodeSummaries { | ||
| trimmed := strings.TrimSpace(summary) | ||
| if trimmed == "" { | ||
| continue | ||
| } | ||
| nonEmpty = append(nonEmpty, trimmed) | ||
| } | ||
| if len(nonEmpty) > 0 { | ||
| b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n") | ||
| for i, summary := range nonEmpty { | ||
| b.WriteString(fmt.Sprintf("%d. %s\n", i+1, summary)) | ||
| } | ||
| s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String()))) | ||
| } | ||
| } |
There was a problem hiding this comment.
The prior-episode injection builds a numbered list but treats each prior summary as a single string. If a prior summary contains newlines (e.g., EpisodeLog.Summary is multi-line for multiple tool calls), only the first line gets the attempt number prefix and subsequent lines can start with their own "2."/"3." numbering, which is ambiguous in the injected prompt. Consider formatting multi-line summaries with indentation or a clear per-attempt header so nested numbering can't be misread.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Summary renders a compact multiline summary for injection into future sessions. | ||
| func (l EpisodeLog) Summary() string { | ||
| if len(l.Entries) == 0 { | ||
| return "" | ||
| } | ||
| var b strings.Builder | ||
| for i, e := range l.Entries { | ||
| status := "success" | ||
| if !e.Success { | ||
| status = "fail" | ||
| } | ||
| fmt.Fprintf(&b, "%d. %s args=%s outcome=%s summary=%s", i+1, e.Tool, e.Args, status, e.Summary) | ||
| if i < len(l.Entries)-1 { | ||
| b.WriteByte('\n') | ||
| } | ||
| } | ||
| return b.String() | ||
| } |
There was a problem hiding this comment.
EpisodeLog.Summary renders one line per tool call with no cap on number of entries or total size. In sessions with many tool calls this can create a very large episode_summary that then gets persisted and potentially injected into future prompts. Consider enforcing a max entry count and/or max total runes for the rendered summary (dropping oldest entries and/or truncating), similar to the caps used for episode_summaries.
| totalRunes -= runeLens[0] | ||
| out = out[1:] | ||
| runeLens = runeLens[1:] | ||
| } |
There was a problem hiding this comment.
normalizeEpisodeSummaries enforces maxEpisodeSummaryTotalRunes only while len(out) > 1; a single very large summary can still exceed the rune budget and be persisted/injected. Given episode summaries can grow large (e.g., if args include big payloads), consider truncating a single over-budget entry to the budget (or dropping it) so the cap is always honored.
| } | |
| } | |
| if len(out) == 1 && totalRunes > maxEpisodeSummaryTotalRunes { | |
| r := []rune(out[0]) | |
| out[0] = string(r[:maxEpisodeSummaryTotalRunes]) | |
| runeLens[0] = maxEpisodeSummaryTotalRunes | |
| } |
| if len(s.config.PriorEpisodeSummaries) > 0 { | ||
| var b strings.Builder | ||
| nonEmpty := make([]string, 0, len(s.config.PriorEpisodeSummaries)) | ||
| for _, summary := range s.config.PriorEpisodeSummaries { | ||
| trimmed := strings.TrimSpace(summary) | ||
| if trimmed == "" { | ||
| continue | ||
| } | ||
| nonEmpty = append(nonEmpty, trimmed) | ||
| } | ||
| if len(nonEmpty) > 0 { | ||
| b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n") | ||
| for i, summary := range nonEmpty { | ||
| b.WriteString(fmt.Sprintf("%d. %s\n", i+1, summary)) | ||
| } | ||
| s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String()))) |
There was a problem hiding this comment.
The prior-episodes injection prefixes each prior summary with its own numeric index, but each per-session EpisodeLog.Summary is itself a numbered (potentially multi-line) list. This can produce confusing output like "1. 1. read..." followed by unindented continuation lines ("2. ...") that look like a new attempt rather than a tool call. Consider switching to an "Attempt N:" header and indenting the summary lines (or storing per-session summaries as single-line text) so numbering remains unambiguous.
|
@copilot handle all feedback on this PR |
Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/050361c9-e9f9-41d0-998f-0d7931f9732c Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>
Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/050361c9-e9f9-41d0-998f-0d7931f9732c Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>
Handled all feedback in this PR. Included fixes for stale |
episode_summarycontext behavior when current session has no episodesEpisodeLog.Summarysize and clarify injected prior-attempt formatting for multiline summariesagentandpipeline/handlersgo build ./...andgo test ./... -shortparallel_validationand address any valid findings