feat(agent): add episodic memory persistence/injection across codergen session retries by Copilot · Pull Request #136 · 2389-research/tracker

Copilot · 2026-04-21T16:57:22Z

Review all new Copilot reviewer threads and map each to a concrete code/test change
Fix stale episode_summary context behavior when current session has no episodes
Enforce per-entry episode summary bounds so single oversized entries cannot bypass rune budget
Bound rendered EpisodeLog.Summary size and clarify injected prior-attempt formatting for multiline summaries
Add/adjust targeted tests for each reviewer concern
Run targeted tests for agent and pipeline/handlers
Run full go build ./... and go test ./... -short
Run parallel_validation and address any valid findings
Reply to PR comment with addressing commit hash

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/74be5335-77b5-446d-b7a7-5495c4069c94 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

Copilot

Pull request overview

Adds “episodic memory” for the codergen agent so tool-attempt history survives session retries/resumes via pipeline context, reducing repeated failing strategies after compaction.

Changes:

Introduces an EpisodeLog model and session-level episode summary capture/injection in the agent runtime.
Persists the latest episode_summary and rolling episode_summaries in pipeline context and reinjects them into subsequent codergen sessions.
Updates docs + dippin lint reserved-key set, and adds tests covering persistence/injection.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`pipeline/lint_dippin_extra.go`	Reserves new built-in context keys to avoid param/read collisions.
`pipeline/handlers/codergen.go`	Injects prior episode summaries into native sessions and persists new summaries back to context.
`pipeline/handlers/codergen_test.go`	Adds tests validating persistence of episode summaries and reinjection behavior.
`pipeline/context.go`	Defines new built-in context keys for episodic memory.
`docs/pipeline-context-flow.md`	Documents the new built-in context keys and their purpose.
`agent/config.go`	Adds `PriorEpisodeSummaries` to session configuration.
`agent/session_run.go`	Injects prior attempt summaries into initial session messages; records tool-call episodes.
`agent/session.go`	Stores per-session episode log and publishes `EpisodeSummary` in `SessionResult`.
`agent/result.go`	Extends `SessionResult` with `EpisodeSummary`.
`agent/memory.go`	Implements episode log, summarization, and JSON (de)serialization helpers.
`agent/memory_test.go`	Tests episode summary generation and parse fallback behavior.
`agent/session_test.go`	Tests tool outcome capture into episode summary and prior summary injection into messages.
`CHANGELOG.md`	Notes the new episodic memory feature in the Unreleased changelog.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T17:59:57Z

+		b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")
+		for i, summary := range s.config.PriorEpisodeSummaries {
+			if strings.TrimSpace(summary) == "" {
+				continue
+			}
+			b.WriteString(fmt.Sprintf("%d. %s\n", i+1, strings.TrimSpace(summary)))
+		}
+		s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))


When building the "Prior attempts summary" message, the numbering uses the original slice index (i+1) even when empty/whitespace summaries are skipped. This can produce gaps (e.g., "1., 3.") and, if all entries are empty/whitespace, will still inject a header-only user message. Consider filtering to non-empty summaries first (or maintaining a separate counter) and only appending the message if at least one non-empty summary exists.

Suggested change

b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")

for i, summary := range s.config.PriorEpisodeSummaries {

if strings.TrimSpace(summary) == "" {

continue

}

b.WriteString(fmt.Sprintf("%d. %s\n", i+1, strings.TrimSpace(summary)))

}

s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))

count := 0

for _, summary := range s.config.PriorEpisodeSummaries {

trimmed := strings.TrimSpace(summary)

if trimmed == "" {

continue

}

if count == 0 {

b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")

}

count++

b.WriteString(fmt.Sprintf("%d. %s\n", count, trimmed))

}

if count > 0 {

s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))

}

Copilot · 2026-04-21T17:59:57Z

+func (h *CodergenHandler) injectPriorEpisodes(runCfg pipeline.AgentRunConfig, pctx *pipeline.PipelineContext) []string {
+	sc, ok := runCfg.Extra.(*agent.SessionConfig)
+	if !ok || sc == nil {
+		return nil
+	}
+	raw, ok := pctx.Get(pipeline.ContextKeyEpisodeSummaries)
+	if !ok || strings.TrimSpace(raw) == "" {
+		return nil
+	}
+	sc.PriorEpisodeSummaries = agent.ParseEpisodeSummaries(raw)
+	return append([]string(nil), sc.PriorEpisodeSummaries...)
+}
+
+func (h *CodergenHandler) applyEpisodeContextUpdates(updates map[string]string, sessResult agent.SessionResult, existing []string) {
+	if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" {
+		return
+	}
+	updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary
+	summaries := append(append([]string(nil), existing...), sessResult.EpisodeSummary)
+	updates[pipeline.ContextKeyEpisodeSummaries] = agent.SerializeEpisodeSummaries(summaries)
+}


episode_summaries is appended to on every session completion and then fully reinjected into subsequent sessions. As written, this list can grow without bound across retries/resumes, directly increasing prompt size and risking context-window bloat over long runs. Consider enforcing a cap (e.g., keep last N summaries and/or trim to a max total byte/rune budget) both when injecting prior episodes and when persisting updates back into context.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa480103bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T18:04:59Z

+	statusSummary := summarizeEpisodeOutput(output, isError)
+	l.Entries = append(l.Entries, EpisodeEntry{
+		Tool:    tool,
+		Args:    compactJSON(args),


Truncate tool arguments before recording episodic memory

EpisodeLog.Record stores compactJSON(args) verbatim, and Summary() later injects that Args string back into retry prompts/context. For tools like write/edit, arguments can contain full file contents, so a single run can persist very large payloads (and sensitive text) into episode_summary/episode_summaries, quickly ballooning prompt size and causing retry sessions to hit context limits or leak content unnecessarily.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-21T18:04:59Z

 	result.ToolTimings = s.toolTimings
 	result.ContextUtilization = tracker.Utilization()
+	result.EpisodeSummary = s.episodeLog.Summary()


Populate EpisodeSummary before returning run errors

Session.Run now computes EpisodeSummary only in the success tail, after runTurnLoop returns nil. When a retryable error happens after some tool calls (for example a transient provider failure on a later turn), the early return result, err path skips this assignment, so CodergenHandler.handleRunError receives an empty summary and cannot persist prior attempt memory for the next retry.

Useful? React with 👍 / 👎.

clintecker · 2026-04-21T18:11:43Z

@copilot address the reviewer feedback

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/cdc2fd79-c3fa-4d52-b1b8-7459a91874c8 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

Copilot · 2026-04-21T18:19:42Z

@copilot address the reviewer feedback

Implemented the reviewer feedback in the latest commits: prior-attempt injection now filters/renumbers non-empty summaries and skips header-only injection, and episode_summaries is now bounded (count + rune budget) on both parse/inject and serialize/persist paths. Added focused tests for both behaviors. Addressed in commit aedf3d4 (plus supporting tests in d6782a6/489a61f).

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T18:22:37Z

+	if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" {
+		return
+	}
+	updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary


applyEpisodeContextUpdates returns early when sessResult.EpisodeSummary is empty, which means a subsequent run with no tool calls will not update ctx.episode_summary and the context will retain a stale value from a previous run. Consider always writing ContextKeyEpisodeSummary (set to "" when no episodes) so the key accurately reflects the most recent session, and only append to episode_summaries when the new summary is non-empty.

Suggested change

if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" {

return

}

updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary

if updates == nil {

return

}

updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary

if strings.TrimSpace(sessResult.EpisodeSummary) == "" {

return

}

Copilot · 2026-04-21T18:22:38Z

+func normalizeEpisodeSummaries(in []string) []string {
+	out := make([]string, 0, len(in))
+	runeLens := make([]int, 0, len(in))
+	for _, s := range in {
+		s = strings.TrimSpace(s)
+		if s != "" {
+			out = append(out, s)
+			runeLens = append(runeLens, len([]rune(s)))
+		}
+	}
+	if len(out) > maxEpisodeSummaryCount {
+		start := len(out) - maxEpisodeSummaryCount
+		out = out[start:]
+		runeLens = runeLens[start:]
+	}
+	totalRunes := 0
+	for _, n := range runeLens {
+		totalRunes += n
+	}
+	for len(out) > 1 && totalRunes > maxEpisodeSummaryTotalRunes {


normalizeEpisodeSummaries only drops items while len(out) > 1, so a single very large summary can still exceed maxEpisodeSummaryTotalRunes and be serialized/injected unbounded. To actually cap growth, consider truncating any individual summary that exceeds the rune budget (or enforcing a per-summary rune limit) so even a single entry cannot blow up context size.

Copilot · 2026-04-21T18:22:38Z

+// Record appends a tool-call episode.
+func (l *EpisodeLog) Record(tool, args, output string, isError bool) {
+	statusSummary := summarizeEpisodeOutput(output, isError)
+	l.Entries = append(l.Entries, EpisodeEntry{
+		Tool:    tool,
+		Args:    compactJSON(args),
+		Success: !isError,
+		Summary: statusSummary,
+	})


EpisodeLog.Record stores compactJSON(args) without any length cap, and EpisodeLog.Summary prints args inline. Tool-call arguments can be arbitrarily large (e.g., write/edit content), which can make EpisodeSummary huge and defeat the episodic-memory size caps. Consider truncating/summarizing args (and/or omitting high-entropy fields like content) before storing/printing so episode summaries stay predictably small.

Copilot · 2026-04-21T18:22:38Z

+	if len(s.config.PriorEpisodeSummaries) > 0 {
+		var b strings.Builder
+		nonEmpty := make([]string, 0, len(s.config.PriorEpisodeSummaries))
+		for _, summary := range s.config.PriorEpisodeSummaries {
+			trimmed := strings.TrimSpace(summary)
+			if trimmed == "" {
+				continue
+			}
+			nonEmpty = append(nonEmpty, trimmed)
+		}
+		if len(nonEmpty) > 0 {
+			b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")
+			for i, summary := range nonEmpty {
+				b.WriteString(fmt.Sprintf("%d. %s\n", i+1, summary))
+			}
+			s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))
+		}
+	}


The prior-episode injection builds a numbered list but treats each prior summary as a single string. If a prior summary contains newlines (e.g., EpisodeLog.Summary is multi-line for multiple tool calls), only the first line gets the attempt number prefix and subsequent lines can start with their own "2."/"3." numbering, which is ambiguous in the injected prompt. Consider formatting multi-line summaries with indentation or a clear per-attempt header so nested numbering can't be misread.

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T18:30:18Z

+// Summary renders a compact multiline summary for injection into future sessions.
+func (l EpisodeLog) Summary() string {
+	if len(l.Entries) == 0 {
+		return ""
+	}
+	var b strings.Builder
+	for i, e := range l.Entries {
+		status := "success"
+		if !e.Success {
+			status = "fail"
+		}
+		fmt.Fprintf(&b, "%d. %s args=%s outcome=%s summary=%s", i+1, e.Tool, e.Args, status, e.Summary)
+		if i < len(l.Entries)-1 {
+			b.WriteByte('\n')
+		}
+	}
+	return b.String()
+}


EpisodeLog.Summary renders one line per tool call with no cap on number of entries or total size. In sessions with many tool calls this can create a very large episode_summary that then gets persisted and potentially injected into future prompts. Consider enforcing a max entry count and/or max total runes for the rendered summary (dropping oldest entries and/or truncating), similar to the caps used for episode_summaries.

Copilot · 2026-04-21T18:30:18Z

+		totalRunes -= runeLens[0]
+		out = out[1:]
+		runeLens = runeLens[1:]
+	}


normalizeEpisodeSummaries enforces maxEpisodeSummaryTotalRunes only while len(out) > 1; a single very large summary can still exceed the rune budget and be persisted/injected. Given episode summaries can grow large (e.g., if args include big payloads), consider truncating a single over-budget entry to the budget (or dropping it) so the cap is always honored.

Suggested change

}

}

if len(out) == 1 && totalRunes > maxEpisodeSummaryTotalRunes {

r := []rune(out[0])

out[0] = string(r[:maxEpisodeSummaryTotalRunes])

runeLens[0] = maxEpisodeSummaryTotalRunes

}

Copilot · 2026-04-21T18:30:19Z

+	if len(s.config.PriorEpisodeSummaries) > 0 {
+		var b strings.Builder
+		nonEmpty := make([]string, 0, len(s.config.PriorEpisodeSummaries))
+		for _, summary := range s.config.PriorEpisodeSummaries {
+			trimmed := strings.TrimSpace(summary)
+			if trimmed == "" {
+				continue
+			}
+			nonEmpty = append(nonEmpty, trimmed)
+		}
+		if len(nonEmpty) > 0 {
+			b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")
+			for i, summary := range nonEmpty {
+				b.WriteString(fmt.Sprintf("%d. %s\n", i+1, summary))
+			}
+			s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))


The prior-episodes injection prefixes each prior summary with its own numeric index, but each per-session EpisodeLog.Summary is itself a numbered (potentially multi-line) list. This can produce confusing output like "1. 1. read..." followed by unindented continuation lines ("2. ...") that look like a new attempt rather than a tool call. Consider switching to an "Attempt N:" header and indenting the summary lines (or storing per-session summaries as single-line text) so numbering remains unambiguous.

clintecker · 2026-04-21T18:50:17Z

@copilot handle all feedback on this PR

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/050361c9-e9f9-41d0-998f-0d7931f9732c Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

Copilot · 2026-04-21T18:57:36Z

@copilot handle all feedback on this PR

Handled all feedback in this PR. Included fixes for stale episode_summary clearing, bounded per-session episode rendering/args, single-entry oversize truncation behavior, and unambiguous multiline prior-attempt injection formatting, plus focused tests. Main addressing commits: 6cc29f3 and de667b7. No UI changes were made in this PR.

Initial plan

0575e4b

Copilot AI assigned Copilot and clintecker Apr 21, 2026

Copilot AI linked an issue Apr 21, 2026 that may be closed by this pull request

feat(agent): episodic memory across sessions #96

Closed

4 tasks

Copilot started work on behalf of clintecker April 21, 2026 16:57 View session

feat(agent): add episodic memory across session retries

fa48010

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/74be5335-77b5-446d-b7a7-5495c4069c94 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add episodic memory logging for multi-step reasoning~~ feat(agent): add episodic memory persistence/injection across codergen session retries Apr 21, 2026

Copilot finished work on behalf of clintecker April 21, 2026 17:06

Copilot AI requested a review from clintecker April 21, 2026 17:06

clintecker requested a review from Copilot April 21, 2026 17:56

Copilot started reviewing on behalf of clintecker April 21, 2026 17:56 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

clintecker marked this pull request as ready for review April 21, 2026 18:01

chatgpt-codex-connector Bot reviewed Apr 21, 2026

View reviewed changes

Copilot started work on behalf of clintecker April 21, 2026 18:11 View session

fix episodic memory reviewer feedback

d6782a6

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/cdc2fd79-c3fa-4d52-b1b8-7459a91874c8 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

clintecker requested a review from Copilot April 21, 2026 18:17

test+perf: refine episode summary normalization

489a61f

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/cdc2fd79-c3fa-4d52-b1b8-7459a91874c8 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

Copilot started reviewing on behalf of clintecker April 21, 2026 18:17 View session

chore: polish episodic memory feedback fixes

aedf3d4

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/cdc2fd79-c3fa-4d52-b1b8-7459a91874c8 Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

Copilot finished work on behalf of clintecker April 21, 2026 18:22

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Merge branch 'main' into copilot/add-structured-episode-log

72f78c9

clintecker requested a review from Copilot April 21, 2026 18:24

Copilot started reviewing on behalf of clintecker April 21, 2026 18:24 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Copilot started work on behalf of clintecker April 21, 2026 18:50 View session

Copilot AI and others added 2 commits April 21, 2026 18:54

fix episodic memory follow-up review feedback

6cc29f3

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/050361c9-e9f9-41d0-998f-0d7931f9732c Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

chore: refine episodic memory normalization path

de667b7

Agent-Logs-Url: https://github.com/2389-research/tracker/sessions/050361c9-e9f9-41d0-998f-0d7931f9732c Co-authored-by: clintecker <26985+clintecker@users.noreply.github.com>

Copilot finished work on behalf of clintecker April 21, 2026 18:58

clintecker merged commit 960264b into main Apr 21, 2026
1 check passed

clintecker deleted the copilot/add-structured-episode-log branch April 21, 2026 19:12

clintecker mentioned this pull request Apr 21, 2026

release: v0.21.0 #146

Merged

3 tasks

-	}
+	}
+	if len(out) == 1 && totalRunes > maxEpisodeSummaryTotalRunes {
+		r := []rune(out[0])
+		out[0] = string(r[:maxEpisodeSummaryTotalRunes])
+		runeLens[0] = maxEpisodeSummaryTotalRunes
+	}

Conversation

Copilot AI commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

clintecker commented Apr 21, 2026

Uh oh!

Copilot AI commented Apr 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

clintecker commented Apr 21, 2026

Uh oh!

Copilot AI commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 21, 2026 •

edited

Loading