Skip to content

feat(agent): add episodic memory persistence/injection across codergen session retries#136

Merged
clintecker merged 8 commits intomainfrom
copilot/add-structured-episode-log
Apr 21, 2026
Merged

feat(agent): add episodic memory persistence/injection across codergen session retries#136
clintecker merged 8 commits intomainfrom
copilot/add-structured-episode-log

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 21, 2026

  • Review all new Copilot reviewer threads and map each to a concrete code/test change
  • Fix stale episode_summary context behavior when current session has no episodes
  • Enforce per-entry episode summary bounds so single oversized entries cannot bypass rune budget
  • Bound rendered EpisodeLog.Summary size and clarify injected prior-attempt formatting for multiline summaries
  • Add/adjust targeted tests for each reviewer concern
  • Run targeted tests for agent and pipeline/handlers
  • Run full go build ./... and go test ./... -short
  • Run parallel_validation and address any valid findings
  • Reply to PR comment with addressing commit hash

Copilot AI linked an issue Apr 21, 2026 that may be closed by this pull request
4 tasks
Copilot AI changed the title [WIP] Add episodic memory logging for multi-step reasoning feat(agent): add episodic memory persistence/injection across codergen session retries Apr 21, 2026
Copilot AI requested a review from clintecker April 21, 2026 17:06
@clintecker clintecker requested a review from Copilot April 21, 2026 17:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds “episodic memory” for the codergen agent so tool-attempt history survives session retries/resumes via pipeline context, reducing repeated failing strategies after compaction.

Changes:

  • Introduces an EpisodeLog model and session-level episode summary capture/injection in the agent runtime.
  • Persists the latest episode_summary and rolling episode_summaries in pipeline context and reinjects them into subsequent codergen sessions.
  • Updates docs + dippin lint reserved-key set, and adds tests covering persistence/injection.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pipeline/lint_dippin_extra.go Reserves new built-in context keys to avoid param/read collisions.
pipeline/handlers/codergen.go Injects prior episode summaries into native sessions and persists new summaries back to context.
pipeline/handlers/codergen_test.go Adds tests validating persistence of episode summaries and reinjection behavior.
pipeline/context.go Defines new built-in context keys for episodic memory.
docs/pipeline-context-flow.md Documents the new built-in context keys and their purpose.
agent/config.go Adds PriorEpisodeSummaries to session configuration.
agent/session_run.go Injects prior attempt summaries into initial session messages; records tool-call episodes.
agent/session.go Stores per-session episode log and publishes EpisodeSummary in SessionResult.
agent/result.go Extends SessionResult with EpisodeSummary.
agent/memory.go Implements episode log, summarization, and JSON (de)serialization helpers.
agent/memory_test.go Tests episode summary generation and parse fallback behavior.
agent/session_test.go Tests tool outcome capture into episode summary and prior summary injection into messages.
CHANGELOG.md Notes the new episodic memory feature in the Unreleased changelog.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/session_run.go Outdated
Comment on lines +30 to +37
b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")
for i, summary := range s.config.PriorEpisodeSummaries {
if strings.TrimSpace(summary) == "" {
continue
}
b.WriteString(fmt.Sprintf("%d. %s\n", i+1, strings.TrimSpace(summary)))
}
s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building the "Prior attempts summary" message, the numbering uses the original slice index (i+1) even when empty/whitespace summaries are skipped. This can produce gaps (e.g., "1., 3.") and, if all entries are empty/whitespace, will still inject a header-only user message. Consider filtering to non-empty summaries first (or maintaining a separate counter) and only appending the message if at least one non-empty summary exists.

Suggested change
b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")
for i, summary := range s.config.PriorEpisodeSummaries {
if strings.TrimSpace(summary) == "" {
continue
}
b.WriteString(fmt.Sprintf("%d. %s\n", i+1, strings.TrimSpace(summary)))
}
s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))
count := 0
for _, summary := range s.config.PriorEpisodeSummaries {
trimmed := strings.TrimSpace(summary)
if trimmed == "" {
continue
}
if count == 0 {
b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")
}
count++
b.WriteString(fmt.Sprintf("%d. %s\n", count, trimmed))
}
if count > 0 {
s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))
}

Copilot uses AI. Check for mistakes.
Comment on lines +510 to +530
func (h *CodergenHandler) injectPriorEpisodes(runCfg pipeline.AgentRunConfig, pctx *pipeline.PipelineContext) []string {
sc, ok := runCfg.Extra.(*agent.SessionConfig)
if !ok || sc == nil {
return nil
}
raw, ok := pctx.Get(pipeline.ContextKeyEpisodeSummaries)
if !ok || strings.TrimSpace(raw) == "" {
return nil
}
sc.PriorEpisodeSummaries = agent.ParseEpisodeSummaries(raw)
return append([]string(nil), sc.PriorEpisodeSummaries...)
}

func (h *CodergenHandler) applyEpisodeContextUpdates(updates map[string]string, sessResult agent.SessionResult, existing []string) {
if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" {
return
}
updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary
summaries := append(append([]string(nil), existing...), sessResult.EpisodeSummary)
updates[pipeline.ContextKeyEpisodeSummaries] = agent.SerializeEpisodeSummaries(summaries)
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

episode_summaries is appended to on every session completion and then fully reinjected into subsequent sessions. As written, this list can grow without bound across retries/resumes, directly increasing prompt size and risking context-window bloat over long runs. Consider enforcing a cap (e.g., keep last N summaries and/or trim to a max total byte/rune budget) both when injecting prior episodes and when persisting updates back into context.

Copilot uses AI. Check for mistakes.
@clintecker clintecker marked this pull request as ready for review April 21, 2026 18:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa480103bf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent/memory.go Outdated
statusSummary := summarizeEpisodeOutput(output, isError)
l.Entries = append(l.Entries, EpisodeEntry{
Tool: tool,
Args: compactJSON(args),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Truncate tool arguments before recording episodic memory

EpisodeLog.Record stores compactJSON(args) verbatim, and Summary() later injects that Args string back into retry prompts/context. For tools like write/edit, arguments can contain full file contents, so a single run can persist very large payloads (and sensitive text) into episode_summary/episode_summaries, quickly ballooning prompt size and causing retry sessions to hit context limits or leak content unnecessarily.

Useful? React with 👍 / 👎.

Comment thread agent/session.go
Comment on lines 199 to +201
result.ToolTimings = s.toolTimings
result.ContextUtilization = tracker.Utilization()
result.EpisodeSummary = s.episodeLog.Summary()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Populate EpisodeSummary before returning run errors

Session.Run now computes EpisodeSummary only in the success tail, after runTurnLoop returns nil. When a retryable error happens after some tool calls (for example a transient provider failure on a later turn), the early return result, err path skips this assignment, so CodergenHandler.handleRunError receives an empty summary and cannot persist prior attempt memory for the next retry.

Useful? React with 👍 / 👎.

@clintecker
Copy link
Copy Markdown
Collaborator

@copilot address the reviewer feedback

@clintecker clintecker requested a review from Copilot April 21, 2026 18:17
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 21, 2026

@copilot address the reviewer feedback

Implemented the reviewer feedback in the latest commits: prior-attempt injection now filters/renumbers non-empty summaries and skips header-only injection, and episode_summaries is now bounded (count + rune budget) on both parse/inject and serialize/persist paths. Added focused tests for both behaviors. Addressed in commit aedf3d4 (plus supporting tests in d6782a6/489a61f).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pipeline/handlers/codergen.go Outdated
Comment on lines +524 to +527
if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" {
return
}
updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applyEpisodeContextUpdates returns early when sessResult.EpisodeSummary is empty, which means a subsequent run with no tool calls will not update ctx.episode_summary and the context will retain a stale value from a previous run. Consider always writing ContextKeyEpisodeSummary (set to "" when no episodes) so the key accurately reflects the most recent session, and only append to episode_summaries when the new summary is non-empty.

Suggested change
if updates == nil || strings.TrimSpace(sessResult.EpisodeSummary) == "" {
return
}
updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary
if updates == nil {
return
}
updates[pipeline.ContextKeyEpisodeSummary] = sessResult.EpisodeSummary
if strings.TrimSpace(sessResult.EpisodeSummary) == "" {
return
}

Copilot uses AI. Check for mistakes.
Comment thread agent/memory.go
Comment on lines +102 to +121
func normalizeEpisodeSummaries(in []string) []string {
out := make([]string, 0, len(in))
runeLens := make([]int, 0, len(in))
for _, s := range in {
s = strings.TrimSpace(s)
if s != "" {
out = append(out, s)
runeLens = append(runeLens, len([]rune(s)))
}
}
if len(out) > maxEpisodeSummaryCount {
start := len(out) - maxEpisodeSummaryCount
out = out[start:]
runeLens = runeLens[start:]
}
totalRunes := 0
for _, n := range runeLens {
totalRunes += n
}
for len(out) > 1 && totalRunes > maxEpisodeSummaryTotalRunes {
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalizeEpisodeSummaries only drops items while len(out) > 1, so a single very large summary can still exceed maxEpisodeSummaryTotalRunes and be serialized/injected unbounded. To actually cap growth, consider truncating any individual summary that exceeds the rune budget (or enforcing a per-summary rune limit) so even a single entry cannot blow up context size.

Copilot uses AI. Check for mistakes.
Comment thread agent/memory.go
Comment on lines +28 to +36
// Record appends a tool-call episode.
func (l *EpisodeLog) Record(tool, args, output string, isError bool) {
statusSummary := summarizeEpisodeOutput(output, isError)
l.Entries = append(l.Entries, EpisodeEntry{
Tool: tool,
Args: compactJSON(args),
Success: !isError,
Summary: statusSummary,
})
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EpisodeLog.Record stores compactJSON(args) without any length cap, and EpisodeLog.Summary prints args inline. Tool-call arguments can be arbitrarily large (e.g., write/edit content), which can make EpisodeSummary huge and defeat the episodic-memory size caps. Consider truncating/summarizing args (and/or omitting high-entropy fields like content) before storing/printing so episode summaries stay predictably small.

Copilot uses AI. Check for mistakes.
Comment thread agent/session_run.go
Comment on lines +28 to +45
if len(s.config.PriorEpisodeSummaries) > 0 {
var b strings.Builder
nonEmpty := make([]string, 0, len(s.config.PriorEpisodeSummaries))
for _, summary := range s.config.PriorEpisodeSummaries {
trimmed := strings.TrimSpace(summary)
if trimmed == "" {
continue
}
nonEmpty = append(nonEmpty, trimmed)
}
if len(nonEmpty) > 0 {
b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")
for i, summary := range nonEmpty {
b.WriteString(fmt.Sprintf("%d. %s\n", i+1, summary))
}
s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))
}
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior-episode injection builds a numbered list but treats each prior summary as a single string. If a prior summary contains newlines (e.g., EpisodeLog.Summary is multi-line for multiple tool calls), only the first line gets the attempt number prefix and subsequent lines can start with their own "2."/"3." numbering, which is ambiguous in the injected prompt. Consider formatting multi-line summaries with indentation or a clear per-attempt header so nested numbering can't be misread.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread agent/memory.go
Comment on lines +39 to +56
// Summary renders a compact multiline summary for injection into future sessions.
func (l EpisodeLog) Summary() string {
if len(l.Entries) == 0 {
return ""
}
var b strings.Builder
for i, e := range l.Entries {
status := "success"
if !e.Success {
status = "fail"
}
fmt.Fprintf(&b, "%d. %s args=%s outcome=%s summary=%s", i+1, e.Tool, e.Args, status, e.Summary)
if i < len(l.Entries)-1 {
b.WriteByte('\n')
}
}
return b.String()
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EpisodeLog.Summary renders one line per tool call with no cap on number of entries or total size. In sessions with many tool calls this can create a very large episode_summary that then gets persisted and potentially injected into future prompts. Consider enforcing a max entry count and/or max total runes for the rendered summary (dropping oldest entries and/or truncating), similar to the caps used for episode_summaries.

Copilot uses AI. Check for mistakes.
Comment thread agent/memory.go
totalRunes -= runeLens[0]
out = out[1:]
runeLens = runeLens[1:]
}
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalizeEpisodeSummaries enforces maxEpisodeSummaryTotalRunes only while len(out) > 1; a single very large summary can still exceed the rune budget and be persisted/injected. Given episode summaries can grow large (e.g., if args include big payloads), consider truncating a single over-budget entry to the budget (or dropping it) so the cap is always honored.

Suggested change
}
}
if len(out) == 1 && totalRunes > maxEpisodeSummaryTotalRunes {
r := []rune(out[0])
out[0] = string(r[:maxEpisodeSummaryTotalRunes])
runeLens[0] = maxEpisodeSummaryTotalRunes
}

Copilot uses AI. Check for mistakes.
Comment thread agent/session_run.go
Comment on lines +33 to +48
if len(s.config.PriorEpisodeSummaries) > 0 {
var b strings.Builder
nonEmpty := make([]string, 0, len(s.config.PriorEpisodeSummaries))
for _, summary := range s.config.PriorEpisodeSummaries {
trimmed := strings.TrimSpace(summary)
if trimmed == "" {
continue
}
nonEmpty = append(nonEmpty, trimmed)
}
if len(nonEmpty) > 0 {
b.WriteString("Prior attempts summary (avoid repeating failed approaches):\n")
for i, summary := range nonEmpty {
b.WriteString(fmt.Sprintf("%d. %s\n", i+1, summary))
}
s.messages = append(s.messages, llm.UserMessage(strings.TrimSpace(b.String())))
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prior-episodes injection prefixes each prior summary with its own numeric index, but each per-session EpisodeLog.Summary is itself a numbered (potentially multi-line) list. This can produce confusing output like "1. 1. read..." followed by unindented continuation lines ("2. ...") that look like a new attempt rather than a tool call. Consider switching to an "Attempt N:" header and indenting the summary lines (or storing per-session summaries as single-line text) so numbering remains unambiguous.

Copilot uses AI. Check for mistakes.
@clintecker
Copy link
Copy Markdown
Collaborator

@copilot handle all feedback on this PR

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 21, 2026

@copilot handle all feedback on this PR

Handled all feedback in this PR. Included fixes for stale episode_summary clearing, bounded per-session episode rendering/args, single-entry oversize truncation behavior, and unambiguous multiline prior-attempt injection formatting, plus focused tests. Main addressing commits: 6cc29f3 and de667b7. No UI changes were made in this PR.

@clintecker clintecker merged commit 960264b into main Apr 21, 2026
1 check passed
@clintecker clintecker deleted the copilot/add-structured-episode-log branch April 21, 2026 19:12
@clintecker clintecker mentioned this pull request Apr 21, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(agent): episodic memory across sessions

3 participants