Skip to content

fix: SessionStart context gate (#143) + retention-evict semantic leak (#124)#145

Merged
rohitg00 merged 6 commits intomainfrom
fix/stop-claude-code-token-burn-143
Apr 15, 2026
Merged

fix: SessionStart context gate (#143) + retention-evict semantic leak (#124)#145
rohitg00 merged 6 commits intomainfrom
fix/stop-claude-code-token-burn-143

Conversation

@rohitg00
Copy link
Copy Markdown
Owner

@rohitg00 rohitg00 commented Apr 15, 2026

Closes #143 and closes #124. Two independent bugs bundled into 0.8.10.

Honest framing up front: the #143 diagnosis in the original PR description was wrong — PreToolUse stdout is debug-log only per the Claude Code hook docs, so it never reached the model and never burned user tokens. I corrected the record on the #143 thread and rewrote the CHANGELOG entry. This PR still ships the gates, but only the SessionStart gate is a real token saving (~1-2K per session). The PreToolUse gate is a resource cleanup (skips ~100-300ms REST round-trip per tool call, zero token impact) kept for forward-compat and hot-path hygiene.

The underlying "Claude Pro allocation burned in 4 messages" symptom from #143 is dominated by Anthropic's own acknowledged Claude Pro quota tightness, not by agentmemory. This release doesn't and can't fix that.


Fix 1: #143 — SessionStart context injection gate

src/hooks/session-start.ts was writing ~1-2K chars of project context to stdout at every session start. Per the Claude Code hook docs, SessionStart stdout is injected into the model's context (one of only two exceptions alongside UserPromptSubmit). So every new session the user started in Claude Code was getting an extra ~1-2K input tokens prepended to its first turn, silently.

Fix: new env var AGENTMEMORY_INJECT_CONTEXT, default false. When disabled:

  • session-start.ts still POSTs /agentmemory/session/start so observation tracking knows about the session, but skips the process.stdout.write(result.context) line
  • pre-tool-use.ts exits immediately — no stdin read, no /agentmemory/enrich fetch, no stdout write. Pure resource cleanup (20x-per-message Node+HTTP hot path is now a ~15ms no-op Node startup)

Startup banner prints Context injection: OFF (default, #143) on normal startup and a loud WARNING when AGENTMEMORY_INJECT_CONTEXT=true is set — same pattern as AGENTMEMORY_AUTO_COMPRESS from 0.8.8.

Tests: 5 new subprocess tests in test/context-injection.test.ts spawn the compiled pre-tool-use.mjs and session-start.mjs hooks with real JSON stdin and assert:

  • Unset env var → stdout is empty
  • Explicit AGENTMEMORY_INJECT_CONTEXT=false → stdout is empty
  • Disabled PreToolUse path exits under 1 second (no stdin consumption, no fetch)
  • Opt-in path with unreachable backend exits cleanly (doesn't break Claude Code startup)

Fix 2: #124mem::retention-evict leaks semantic memories

Pre-existing bug I filed myself during the #119 adversarial review. mem::retention-evict was unconditionally calling kv.delete(KV.memories, id) for every below-threshold candidate, but retention scores are computed for both episodic (KV.memories) and semantic (KV.semantic) memories. When a candidate came from KV.semantic, the delete silently became a no-op and the semantic row stayed alive forever with a sub-threshold score. Semantic memories could not be evicted by the retention path at all.

Fix:

  • Add a source: "episodic" | "semantic" discriminator to RetentionScore
  • Tag it at score creation in both loops of mem::retention-score
  • Branch the delete in mem::retention-evict on candidate.source, routing to KV.memories or KV.semantic accordingly
  • Pre-0.8.10 rows with no source field are treated as episodic for backwards-compat so upgraded stores continue to evict their old rows without re-scoring first
  • Response now includes evictedEpisodic and evictedSemantic counts for observability

Tests: 3 new cases in test/retention.test.ts:

  • Scoring tags rows with the correct source for both loops
  • Evicting a mixed set of below-threshold episodic + semantic candidates removes both from their respective scopes (the core regression)
  • Legacy-shape score rows with missing source still evict to mem:memories (backwards-compat)

Release

Bumps to 0.8.10 (main package + @agentmemory/mcp shim).

Test plan

Summary by CodeRabbit

  • New Features

    • Context injection is OFF by default; enable with AGENTMEMORY_INJECT_CONTEXT=true. Startup now shows injection status and a warning when enabled. Package/version updated to 0.8.10.
  • Bug Fixes

    • Eviction now reliably distinguishes and deletes semantic vs episodic memories, returns separate evicted counts, and emits an audit record for non-zero sweeps.
    • Retention scoring now records memory type to improve eviction accuracy.
  • Tests

    • Added regression tests for context-injection behavior, fast shutdown, and retention/eviction routing.
  • Documentation

    • README and changelog updated with flag details and migration note.

0.8.8 fixed the agentmemory-side Claude API burn (where the engine
called Claude via the user's ANTHROPIC_API_KEY for per-observation
compression). That addressed #138 for users with API keys, but it
missed the second and much larger token-burn path: the PreToolUse
hook writing context to stdout.

Claude Code reads PreToolUse stdout and prepends it to the model's
next turn. src/hooks/pre-tool-use.ts was POSTing /agentmemory/enrich
on every Edit/Write/Read/Glob/Grep tool call and piping up to 4000
chars of response context into stdout. At ~20 tool calls per user
message this silently injected ~20K tokens per message into Claude
Code's input window — all charged against the user's Claude Pro
allocation because Claude Code was the one sending them to Anthropic.
4 messages drained the cap, which matches @adrianricardo's report.

session-start.ts had the same pattern (injected once per session,
smaller blast radius).

Fix: gate both hooks on AGENTMEMORY_INJECT_CONTEXT, default false.

- pre-tool-use.ts: when disabled, exit immediately — no stdin read,
  no fetch, no stdout write. The hot path (~20x per message) becomes
  a no-op Node startup.
- session-start.ts: when disabled, still POST /agentmemory/session/start
  so the session gets registered for observation tracking, but never
  write context to stdout. The session registration is cheap and
  doesn't touch Claude Code's input window.
- src/config.ts: new isContextInjectionEnabled() helper.
- src/index.ts: startup banner prints 'Context injection: OFF
  (default, #143)' on normal startup and a loud WARNING when
  opt-in is enabled.
- test/context-injection.test.ts: 5 subprocess tests that spawn the
  compiled pre-tool-use.mjs and session-start.mjs hooks, feed real
  JSON payloads via stdin, and assert stdout is empty in all the
  off/default paths. Also asserts the disabled path exits under 1s
  and the opt-in path with an unreachable backend still exits
  cleanly.
- README .env section: new AGENTMEMORY_INJECT_CONTEXT entry.
- CHANGELOG [0.8.10] with prominent 'Behavior change' banner.

Observations are still captured via PostToolUse regardless of the
flag — the memory store and MCP search tools are completely
unaffected by this change. The fix only severs the path where
agentmemory silently shoves memory context into the user's Claude
Code conversation.

Bumps to 0.8.10 (main + @agentmemory/mcp shim). Test count:
724 passing (was 719 + 5 new).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Release 0.8.10: context injection for PreToolUse and SessionStart is opt‑in via AGENTMEMORY_INJECT_CONTEXT (default off). Retention now records memory source and evicts from the correct KV namespace with split eviction counts and audit writes. Docs, tests, and version metadata updated.

Changes

Cohort / File(s) Summary
Version & metadata
package.json, packages/mcp/package.json, plugin/.claude-plugin/plugin.json, src/version.ts, src/types.ts
Bumped release to 0.8.10; updated VERSION and ExportData.version unions, package/plugin version fields, and added retention_score audit op and RetentionScore.source type.
Context injection config & docs
src/config.ts, README.md, CHANGELOG.md
Added isContextInjectionEnabled() and documented new env var AGENTMEMORY_INJECT_CONTEXT (default false) with opt‑in behavior and startup/migration messaging.
Plugin hook scripts
plugin/scripts/pre-tool-use.mjs, plugin/scripts/session-start.mjs
Hook scripts read AGENTMEMORY_INJECT_CONTEXT and short‑circuit when disabled to avoid stdin parsing, /agentmemory/enrich calls, and stdout context emission.
Hook implementations (src)
src/hooks/pre-tool-use.ts, src/hooks/session-start.ts
Gated enrichment and stdout context emission on AGENTMEMORY_INJECT_CONTEXT === "true"; session registration/observation capture still runs but project context is not written to stdout when disabled.
Startup & export-import
src/index.ts, src/functions/export-import.ts
Startup logs show context injection status and warning when enabled; mem::import supportedVersions updated to include 0.8.10.
Retention scoring & eviction
src/functions/retention.ts, src/types.ts
RetentionScore gets optional `source?: "episodic"
Tests
test/context-injection.test.ts, test/retention.test.ts, test/export-import.test.ts
Added subprocess-based tests for hook stdout behavior and gating; expanded retention tests for source tagging, correct eviction routing, legacy-row handling, and audit emission; updated export-import test to expect 0.8.10.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through logs at break of dawn, so spry,
Quiet now by default — I’ll only speak if you try.
Flip the switch, I’ll whisper context in each turn,
Or stay muted while safe memories calmly churn.
A carrot for tests, and a thump of delight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes both main changes: SessionStart context gating fix for #143 and retention-evict semantic memory leak fix for #124.
Linked Issues check ✅ Passed All coding requirements from #143 and #124 are met: AGENTMEMORY_INJECT_CONTEXT gating, PreToolUse no-op by default, SessionStart stdout gating, source discriminator on RetentionScore, branched eviction by source, backwards compatibility, split eviction counts, and comprehensive tests added.
Out of Scope Changes check ✅ Passed All changes are in scope: version bumps (0.8.9→0.8.10), documentation updates (CHANGELOG/README), context injection gating implementation, retention semantic leak fix, test coverage additions, and audit logging for retention operations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/stop-claude-code-token-burn-143

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rohitg00 rohitg00 marked this pull request as draft April 15, 2026 09:19
@rohitg00 rohitg00 changed the title fix: stop burning Claude Pro tokens on every tool call (#143, real root cause of #138) fix: gate SessionStart context injection + pre-tool-use hot path (#143) Apr 15, 2026
PreToolUse stdout is NOT injected into the model context — per
the Claude Code hook docs, only UserPromptSubmit and SessionStart
stdout are injected. My initial #143 PR description and CHANGELOG
claimed PreToolUse was the smoking gun behind 'Pro allocation
burned in 4 messages', which is wrong.

What's actually true:
- SessionStart stdout injection IS real (~1-2K tokens per session)
- PreToolUse stdout goes to debug log only — no tokens
- Claude Pro's Claude Code quotas are tight by design (Anthropic
  has publicly acknowledged this); 4 messages to burn is plausible
  with or without agentmemory installed

The gate on pre-tool-use.ts is still worth keeping as a resource
cleanup (skips a 20x-per-message Node+HTTP hot path) and as
forward-compat protection in case Claude Code ever changes
PreToolUse hook contract. But the CHANGELOG entry has to stop
claiming it saves tokens when it doesn't.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
src/config.ts (1)

207-215: Shorten the long hook-injection comment block.

isContextInjectionEnabled() is clear on its own; the multi-line narrative at Lines 207-215 is high-maintenance and likely to drift. Keep a compact intent-focused note and rely on changelog/docs for details.

🧹 Suggested cleanup
-// Hook-level context injection into Claude Code's conversation is OFF by
-// default as of 0.8.10 (see `#143`). When disabled, pre-tool-use and
-// session-start hooks still POST observations for background capture, but
-// never write context to stdout — so Claude Code doesn't inject an extra
-// ~4000-char blob into every tool turn. 0.8.8 stopped the agentmemory-side
-// Claude calls (via ANTHROPIC_API_KEY); this stops the Claude Code-side
-// token burn where every tool call silently grew the model input window.
-// Users who want the in-conversation context injection explicitly opt in
-// with AGENTMEMORY_INJECT_CONTEXT=true and get a loud startup warning.
+// Hook-level context injection is opt-in since 0.8.10.
+// Set AGENTMEMORY_INJECT_CONTEXT=true to re-enable stdout context injection.
 export function isContextInjectionEnabled(): boolean {
   return getMergedEnv()["AGENTMEMORY_INJECT_CONTEXT"] === "true";
 }

As per coding guidelines: "Use clear, self-documenting variable and function names instead of code comments explaining WHAT".

Also applies to: 216-218

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.ts` around lines 207 - 215, Replace the long multi-line
explanation about hook-level context injection with a short, intent-focused
comment near isContextInjectionEnabled(): e.g., one-line summary stating that
context injection is disabled by default and can be enabled with
AGENTMEMORY_INJECT_CONTEXT, and remove the verbose narrative (also shorten the
adjacent block at the 216-218 area) so the code relies on the self-explanatory
function name and external docs/changelog for details.
CHANGELOG.md (1)

22-23: Avoid hardcoded source line numbers in changelog entries.

src/index.ts:179 is brittle and can drift; referencing just the file/function is more durable.

📝 Suggested change
-- **`isContextInjectionEnabled()`** helper in `src/config.ts` — single source of truth for the flag. The hooks read the env var directly (they're spawned as standalone `.mjs` files by Claude Code and don't bootstrap through `src/index.ts`), so the helper is there for the startup banner in `src/index.ts:179` and future code paths that want the same gate.
+- **`isContextInjectionEnabled()`** helper in `src/config.ts` — single source of truth for the flag. The hooks read the env var directly (they're spawned as standalone `.mjs` files by Claude Code and don't bootstrap through `src/index.ts`), so the helper is there for the startup banner in `src/index.ts` and future code paths that want the same gate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 22 - 23, Update the CHANGELOG entry to remove the
hardcoded source line number "src/index.ts:179" and instead reference the file
and symbol only (e.g., "src/index.ts" and the isContextInjectionEnabled()
helper); edit the sentence that currently reads with the line number so it says
something like "the startup banner in src/index.ts and future code paths" or
explicitly names the function isContextInjectionEnabled() without a line number,
ensuring the changelog remains durable as source shifts.
test/context-injection.test.ts (2)

1-4: Add the standard iii-sdk mock to align with test conventions.

This suite doesn’t currently include vi.mock("iii-sdk", ...), which diverges from the repo’s required test pattern.

🧪 Suggested alignment patch
-import { describe, it, expect } from "vitest";
+import { describe, it, expect, vi } from "vitest";
 import { spawn } from "node:child_process";
 import { join } from "node:path";
+
+vi.mock("iii-sdk", () => ({}));
As per coding guidelines, `test/**/*.test.ts`: "Test files must use vitest with .test.ts extension, mock iii-sdk with vi.mock(), and follow patterns from test/crystallize.test.ts".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/context-injection.test.ts` around lines 1 - 4, The test file
context-injection.test.ts is missing the standard iii-sdk mock; add a top-level
vitest mock call vi.mock("iii-sdk", ...) before imports or test definitions to
match the repo pattern used in test/crystallize.test.ts; ensure the mock
provides the same exported hooks used in this suite (the iii-sdk API your tests
call) so tests import describe/it/expect as normal and the mock replaces iii-sdk
behavior consistently across tests.

86-90: Timing budget comment and assertion are inconsistent.

Line 86 describes a 250ms budget, but Line 90 enforces < 1000ms. Please align text and threshold to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/context-injection.test.ts` around lines 86 - 90, Comment states a 250ms
budget but the assertion checks <1000ms; make them consistent by either lowering
the assertion or updating the comment. Locate the test using
runHook("pre-tool-use.mjs", "", {}) and change
expect(result.tookMs).toBeLessThan(1000) to
expect(result.tookMs).toBeLessThan(250) if you want the 250ms target, or update
the preceding comment to mention a 1000ms budget if that is the intended
threshold.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 735-737: Clarify that SessionStart and PreToolUse have different
frequencies: update the wording around the "PreToolUse and SessionStart hooks"
so it states that PreToolUse injects up to ~4000 chars of memory into every tool
turn, while SessionStart injects memory only once at the start of a session (not
on every tool turn); mention the token impact accordingly and keep references to
the hook names SessionStart and PreToolUse so readers can find their
implementations.

In `@src/functions/export-import.ts`:
- Line 178: The hard-coded supportedVersions Set in export-import.ts is drifting
from the canonical ExportData.version/type and allows unsafe casts; replace this
inline list by importing or deriving the allowed export-version values from the
single source of truth (the ExportData.version enum/type or a shared constant)
and use that to build the Set (reference the supportedVersions constant and
ExportData type/enum to locate code). Ensure the central definition includes
"0.7.7" (or add it there) so callers like validate/parse logic use the same
canonical versions and remove any duplicate lists to restore type safety.

---

Nitpick comments:
In `@CHANGELOG.md`:
- Around line 22-23: Update the CHANGELOG entry to remove the hardcoded source
line number "src/index.ts:179" and instead reference the file and symbol only
(e.g., "src/index.ts" and the isContextInjectionEnabled() helper); edit the
sentence that currently reads with the line number so it says something like
"the startup banner in src/index.ts and future code paths" or explicitly names
the function isContextInjectionEnabled() without a line number, ensuring the
changelog remains durable as source shifts.

In `@src/config.ts`:
- Around line 207-215: Replace the long multi-line explanation about hook-level
context injection with a short, intent-focused comment near
isContextInjectionEnabled(): e.g., one-line summary stating that context
injection is disabled by default and can be enabled with
AGENTMEMORY_INJECT_CONTEXT, and remove the verbose narrative (also shorten the
adjacent block at the 216-218 area) so the code relies on the self-explanatory
function name and external docs/changelog for details.

In `@test/context-injection.test.ts`:
- Around line 1-4: The test file context-injection.test.ts is missing the
standard iii-sdk mock; add a top-level vitest mock call vi.mock("iii-sdk", ...)
before imports or test definitions to match the repo pattern used in
test/crystallize.test.ts; ensure the mock provides the same exported hooks used
in this suite (the iii-sdk API your tests call) so tests import
describe/it/expect as normal and the mock replaces iii-sdk behavior consistently
across tests.
- Around line 86-90: Comment states a 250ms budget but the assertion checks
<1000ms; make them consistent by either lowering the assertion or updating the
comment. Locate the test using runHook("pre-tool-use.mjs", "", {}) and change
expect(result.tookMs).toBeLessThan(1000) to
expect(result.tookMs).toBeLessThan(250) if you want the 250ms target, or update
the preceding comment to mention a 1000ms budget if that is the intended
threshold.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 48060a7d-0576-4146-acb7-4cc9e6aecde7

📥 Commits

Reviewing files that changed from the base of the PR and between 9cef253 and fbb32b0.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (16)
  • CHANGELOG.md
  • README.md
  • package.json
  • packages/mcp/package.json
  • plugin/.claude-plugin/plugin.json
  • plugin/scripts/pre-tool-use.mjs
  • plugin/scripts/session-start.mjs
  • src/config.ts
  • src/functions/export-import.ts
  • src/hooks/pre-tool-use.ts
  • src/hooks/session-start.ts
  • src/index.ts
  • src/types.ts
  • src/version.ts
  • test/context-injection.test.ts
  • test/export-import.test.ts

Comment thread README.md Outdated
const importData = data.exportData;

const supportedVersions = new Set(["0.3.0", "0.4.0", "0.5.0", "0.6.0", "0.6.1", "0.7.0", "0.7.2", "0.7.3", "0.7.4", "0.7.5", "0.7.6", "0.7.7", "0.7.9", "0.8.0", "0.8.1", "0.8.2", "0.8.3", "0.8.4", "0.8.5", "0.8.6", "0.8.7", "0.8.8", "0.8.9"]);
const supportedVersions = new Set(["0.3.0", "0.4.0", "0.5.0", "0.6.0", "0.6.1", "0.7.0", "0.7.2", "0.7.3", "0.7.4", "0.7.5", "0.7.6", "0.7.7", "0.7.9", "0.8.0", "0.8.1", "0.8.2", "0.8.3", "0.8.4", "0.8.5", "0.8.6", "0.8.7", "0.8.8", "0.8.9", "0.8.10"]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Centralize supported export versions to prevent drift.

Line 178 maintains a hard-coded list that already diverges from type definitions ("0.7.7" is accepted here but not represented in ExportData.version). This can force casts and weakens type safety.

♻️ Proposed refactor
-      const supportedVersions = new Set(["0.3.0", "0.4.0", "0.5.0", "0.6.0", "0.6.1", "0.7.0", "0.7.2", "0.7.3", "0.7.4", "0.7.5", "0.7.6", "0.7.7", "0.7.9", "0.8.0", "0.8.1", "0.8.2", "0.8.3", "0.8.4", "0.8.5", "0.8.6", "0.8.7", "0.8.8", "0.8.9", "0.8.10"]);
+      const supportedVersions = new Set(SUPPORTED_EXPORT_VERSIONS);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/export-import.ts` at line 178, The hard-coded supportedVersions
Set in export-import.ts is drifting from the canonical ExportData.version/type
and allows unsafe casts; replace this inline list by importing or deriving the
allowed export-version values from the single source of truth (the
ExportData.version enum/type or a shared constant) and use that to build the Set
(reference the supportedVersions constant and ExportData type/enum to locate
code). Ensure the central definition includes "0.7.7" (or add it there) so
callers like validate/parse logic use the same canonical versions and remove any
duplicate lists to restore type safety.

The eviction loop was unconditionally calling kv.delete(KV.memories,
id) for every below-threshold candidate, but retention scores are
computed for both episodic (KV.memories) and semantic (KV.semantic)
memories. When a candidate came from KV.semantic, the delete silently
became a no-op (key wasn't in mem:memories to begin with) and the
semantic row stayed alive forever with a sub-threshold score. Semantic
memories could not be evicted by this path at all.

Fix:
- Add a source: "episodic" | "semantic" discriminator to RetentionScore
- Tag it at score creation in both loops of mem::retention-score
- Branch the delete in mem::retention-evict on candidate.source,
  routing to KV.memories or KV.semantic accordingly
- Pre-0.8.10 retention rows with no source field are treated as
  episodic for backwards-compat so upgraded stores continue to evict
  their old rows without re-scoring first
- Response now includes evictedEpisodic and evictedSemantic counts
  so callers can see what was removed from each scope

Adds 3 regression tests to test/retention.test.ts:
- Scoring tags rows with the correct source
- Evicting a mixed set of below-threshold episodic + semantic
  candidates removes both from their respective scopes
- Legacy-shape score rows with no source field still evict
  to mem:memories (backwards-compat)

Full suite: 727 passing (was 724 + 3 new).
@rohitg00 rohitg00 changed the title fix: gate SessionStart context injection + pre-tool-use hot path (#143) fix: SessionStart context gate (#143) + retention-evict semantic leak (#124) Apr 15, 2026
@rohitg00 rohitg00 marked this pull request as ready for review April 15, 2026 10:33
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/retention.test.ts (1)

246-263: Assert the persisted mem:retention rows here too.

This only verifies the returned scores array. The eviction path reads back from stored retention rows, so a regression in kv.set/serialization would still pass. Please also fetch mem:retention after scoring and assert that both stored rows carry the expected source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/retention.test.ts` around lines 246 - 263, After calling
registerRetentionFunctions and triggering "mem::retention-score" in the test,
also read back the persisted retention rows (the "mem:retention" entry) from the
mock KV and assert that the stored rows for memoryId "mem_ep" and "sem_sem" have
source set to "episodic" and "semantic" respectively; update the test that uses
mockKV, mockSdk, and registerRetentionFunctions to fetch the stored
"mem:retention" payload (via the mock KV read method used in tests, e.g., kv.get
or equivalent) and add expect assertions mirroring the in-memory checks to
verify serialization/persistence correctness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/functions/retention.ts`:
- Around line 262-271: The eviction loop is deleting persisted state (kv.delete
and deleteAccessLog) but not emitting audits; add a call to recordAudit(...) for
each structural deletion so evictions are auditable. Inside the for (const
candidate of candidates) block, after deleting KV.semantic or KV.memories,
KV.retentionScores and after deleteAccessLog(kv, candidate.memoryId), invoke
recordAudit with the memoryId, an action like "evict" or "delete", the source
(candidate.source), and any relevant metadata (reason/score) so the audit
records the state change; ensure this is done for both semantic and episodic
branches and increment counters as before.
- Around line 255-271: The loop assumes missing candidate.source means episodic
and deletes KV.memories, which can leave semantic records intact for legacy
rows; instead, before deleting, probe both namespaces (KV.semantic and
KV.memories) for candidate.memoryId to determine which actually contains the
record, delete only that namespace entry, increment evictedSemantic or
evictedEpisodic accordingly, then delete the retention score
(KV.retentionScores) and call deleteAccessLog(kv, candidate.memoryId) as before;
update the code around the candidates loop and uses of KV.semantic / KV.memories
/ candidate.memoryId to perform the existence check and correct counting.

---

Nitpick comments:
In `@test/retention.test.ts`:
- Around line 246-263: After calling registerRetentionFunctions and triggering
"mem::retention-score" in the test, also read back the persisted retention rows
(the "mem:retention" entry) from the mock KV and assert that the stored rows for
memoryId "mem_ep" and "sem_sem" have source set to "episodic" and "semantic"
respectively; update the test that uses mockKV, mockSdk, and
registerRetentionFunctions to fetch the stored "mem:retention" payload (via the
mock KV read method used in tests, e.g., kv.get or equivalent) and add expect
assertions mirroring the in-memory checks to verify serialization/persistence
correctness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 39b871ef-643e-4923-a275-873928b90c93

📥 Commits

Reviewing files that changed from the base of the PR and between fbb32b0 and fa89a42.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • src/functions/retention.ts
  • src/types.ts
  • test/retention.test.ts
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/types.ts

Comment thread src/functions/retention.ts Outdated
Comment thread src/functions/retention.ts
CodeRabbit caught a real backwards-compat hole in the #124 fix:
pre-0.8.10 stores already contain semantic retention rows with
no source field (because the old mem::retention-score scored
KV.semantic before the discriminator existed). My first fix
defaulted missing source to episodic, which meant those legacy
semantic rows still got delete-routed to KV.memories — the exact
no-op that stranded them in the first place.

Fix: when candidate.source is undefined, probe KV.memories first
for the memoryId; if it's there, route to episodic, otherwise
route to semantic. Count the resolved source in the response.

Adds one new test case: a pre-0.8.10 semantic memory with a
legacy-shape retention row (no source field) gets evicted from
mem:semantic, not silently no-op'd. Existing 'defaults to
episodic' test is kept and retargeted to the genuinely-episodic
legacy case.

Also fixes a README nit: the AGENTMEMORY_INJECT_CONTEXT comment
previously implied SessionStart fires on every tool turn. It's
once per session. Now broken out into two bullets explaining
what each hook does differently, with the note that only
SessionStart actually reaches the model (PreToolUse stdout is
debug-log only per Claude Code docs).

Full suite: 728 passing (was 727 + 1 new).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/functions/retention.ts (1)

286-288: ⚠️ Potential issue | 🟠 Major

Emit an audit record for each successful eviction.

Lines 286-288 delete the memory row, retention row, and access log, but the branch still never calls recordAudit(). That makes mem::retention-evict invisible to audit consumers even though this is a structural mutation path, not read-side bookkeeping. Please record the eviction after the deletes succeed, including at least candidate.memoryId, resolvedSource, and the eviction score/threshold metadata.

As per coding guidelines, "Use recordAudit() for all state-changing operations in TypeScript functions".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/retention.ts` around lines 286 - 288, After successfully
removing the memory and access records (after the existing calls to
kv.delete(scope, candidate.memoryId), kv.delete(KV.retentionScores,
candidate.memoryId), and deleteAccessLog(kv, candidate.memoryId)), call
recordAudit() to emit an audit record for this structural mutation; include
candidate.memoryId, the resolvedSource value used to resolve the memory, and the
eviction metadata (the computed eviction score and the threshold or reason) in
the audit payload so the mem::retention-evict event is visible to audit
consumers. Ensure the recordAudit invocation only runs after the deletes
complete without throwing so it reflects a successful eviction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/retention.test.ts`:
- Around line 258-262: The test currently only asserts the transient SDK
response from sdk.trigger("mem::retention-score", {}) but must also verify the
persisted rows so KV.retentionScores (the stored "mem:retention" entries) retain
the source discriminator; after obtaining result and locating ep and sem, fetch
the persisted retention entries from the same storage used by KV.retentionScores
(e.g., read the "mem:retention" keys or call the helper that reads persisted
retention rows) and assert that the stored entries for memoryId "mem_ep" and
"sem_sem" have source "episodic" and "semantic" respectively, mirroring the
expectations on result.scores.

---

Duplicate comments:
In `@src/functions/retention.ts`:
- Around line 286-288: After successfully removing the memory and access records
(after the existing calls to kv.delete(scope, candidate.memoryId),
kv.delete(KV.retentionScores, candidate.memoryId), and deleteAccessLog(kv,
candidate.memoryId)), call recordAudit() to emit an audit record for this
structural mutation; include candidate.memoryId, the resolvedSource value used
to resolve the memory, and the eviction metadata (the computed eviction score
and the threshold or reason) in the audit payload so the mem::retention-evict
event is visible to audit consumers. Ensure the recordAudit invocation only runs
after the deletes complete without throwing so it reflects a successful
eviction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5723daf7-cae5-4830-8435-a75e73f4fad4

📥 Commits

Reviewing files that changed from the base of the PR and between fa89a42 and ea45144.

📒 Files selected for processing (3)
  • README.md
  • src/functions/retention.ts
  • test/retention.test.ts
✅ Files skipped from review due to trivial changes (1)
  • README.md

Comment thread test/retention.test.ts
…nd 3)

CodeRabbit round 3 findings, both real:

1. retention-evict performs structural deletes (memories / semantic /
   retention scores / access logs) but was not calling recordAudit().
   Repo learnings say state-changing functions must be auditable except
   for read-path bookkeeping. Now emits one batched audit row per
   non-zero eviction sweep:

     operation: 'delete'
     functionId: 'mem::retention-evict'
     targetIds: every evicted memoryId
     details: { threshold, evicted, evictedEpisodic, evictedSemantic,
                reason: 'retention score below threshold' }

   Zero-eviction sweeps intentionally do NOT write an audit row (no
   state change, no need to flood the audit log during health checks).

2. The #124 scoring test only checked result.scores (transient
   response) but not the persisted mem:retention rows. Eviction reads
   back from stored rows, so a regression in kv.set/serialization
   would have still passed the old assertion. Now also does
   kv.get('mem:retention', id) and asserts { source: ... }.

Two new tests:
- Retention evict with a mixed set of 2 episodic + 1 semantic
  candidates writes exactly one audit row with all 3 ids in targetIds
  and the correct evictedEpisodic/Semantic breakdown in details.
- Retention evict with zero candidates writes zero audit rows.

Full suite: 730 passing (was 728 + 2 new).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/functions/retention.ts (1)

135-147: ⚠️ Potential issue | 🟠 Major

Audit the re-score/backfill writes.

Lines 147 and 191 now persist schema-relevant source tags into KV.retentionScores, but mem::retention-score still exits without any recordAudit() call. Please emit one batched audit after the score rows are written so this backfill is visible to audit consumers.

As per coding guidelines "Use recordAudit() for all state-changing operations in TypeScript functions".

Also applies to: 179-191

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/retention.ts` around lines 135 - 147, The backfill currently
writes RetentionScore entries (using KV.retentionScores and pushing into scores)
and persists new `source` tags but does not emit any audit records for the
`mem::retention-score` state change; add a batched call to recordAudit()
immediately after the block that writes the score rows (after the loop that
calls kv.set for retention entries and after the other write block that persists
scores) so all created/updated RetentionScore rows are recorded as a single
audit event; use recordAudit() with a payload that identifies the affected
memory IDs (mem.id), the operation (e.g., "mem::retention-score.backfill"), and
the score entries created (scores array) so audit consumers can see the
backfill.
🧹 Nitpick comments (1)
src/functions/retention.ts (1)

121-147: Batch the retention-score writes instead of awaiting them one by one.

Each kv.set(...) here is independent, so the inline await turns a full rescore into one KV round-trip per memory. Collect the writes and flush them with Promise.all (or bounded batches) after building the entries.

As per coding guidelines "Perform parallel operations where possible using Promise.all for independent kv writes/reads in TypeScript".

Also applies to: 150-191

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/retention.ts` around lines 121 - 147, The per-memory inline
await on kv.set inside the loop causes sequential KV writes; instead, collect
each RetentionScore entry (created using computeSalience, computeDecay,
computeReinforcementBoost) and its pending write (kv.set(KV.retentionScores,
mem.id, entry)) into an array of Promises after pushing to scores, then flush
them with Promise.all (or split into bounded batches) once the loop completes;
apply the same batching change to the similar block around lines 150-191 so all
independent kv.set calls run in parallel rather than one-by-one.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/functions/retention.ts`:
- Around line 135-147: The backfill currently writes RetentionScore entries
(using KV.retentionScores and pushing into scores) and persists new `source`
tags but does not emit any audit records for the `mem::retention-score` state
change; add a batched call to recordAudit() immediately after the block that
writes the score rows (after the loop that calls kv.set for retention entries
and after the other write block that persists scores) so all created/updated
RetentionScore rows are recorded as a single audit event; use recordAudit() with
a payload that identifies the affected memory IDs (mem.id), the operation (e.g.,
"mem::retention-score.backfill"), and the score entries created (scores array)
so audit consumers can see the backfill.

---

Nitpick comments:
In `@src/functions/retention.ts`:
- Around line 121-147: The per-memory inline await on kv.set inside the loop
causes sequential KV writes; instead, collect each RetentionScore entry (created
using computeSalience, computeDecay, computeReinforcementBoost) and its pending
write (kv.set(KV.retentionScores, mem.id, entry)) into an array of Promises
after pushing to scores, then flush them with Promise.all (or split into bounded
batches) once the loop completes; apply the same batching change to the similar
block around lines 150-191 so all independent kv.set calls run in parallel
rather than one-by-one.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 27bbc7a8-13d4-4989-a69e-2dd7bfa22c80

📥 Commits

Reviewing files that changed from the base of the PR and between ea45144 and 04581cf.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • src/functions/retention.ts
  • test/retention.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/retention.test.ts
  • CHANGELOG.md

CodeRabbit round 3 outside-diff findings, both addressed:

1. mem::retention-score was persisting schema-relevant writes to
   KV.retentionScores (1000+ rows in a mature store) but never called
   recordAudit(). Per the repo audit-coverage policy, state-changing
   functions need an audit row. Added a single batched audit event
   per rescore:

     operation: 'retention_score' (new audit op — added to the
                                    AuditEntry union in types.ts)
     functionId: 'mem::retention-score'
     targetIds: [] (intentionally empty — a mature store can have
                    1000+ ids per sweep; flooding the audit log
                    with every memoryId on every cron tick is
                    worse than recording just the summary counts)
     details: { total, episodic, semantic, tiers, config }

   Zero-memory stores intentionally skip the audit call.

2. The per-memory kv.set inside the score loop was O(n) sequential
   round-trips. Refactored to collect pendingWrites: [id, entry][]
   while iterating, then flush with Promise.all at the end. On a
   mature store with 1000+ memories this is ~10x faster (depends on
   backend pipelining).

Test updates:
- Added 'mem::retention-score emits audit row per rescore' covering
  the new audit call, targetIds=[], and details.episodic/semantic.
- Existing '#124 audit evict' and 'zero-evict skip audit' tests now
  filter the audit log by functionId === 'mem::retention-evict'
  because retention-score also writes one row per sweep now.

Full suite: 731 passing (was 730 + 1 new, existing tests retargeted).
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/functions/retention.ts (1)

314-329: ⚠️ Potential issue | 🟠 Major

Legacy-source fallback can overcount evictions and should probe both namespaces.

At Line 314, only KV.memories is checked; Line 319 then assumes semantic. If the ID exists in neither namespace, Line 326-Line 329 still increments eviction counters after a no-op delete. This can over-report evicted / evictedSemantic.

Suggested fix
-          } else {
-            const episodic = await kv.get(KV.memories, candidate.memoryId);
-            if (episodic !== null) {
-              scope = KV.memories;
-              resolvedSource = "episodic";
-            } else {
-              scope = KV.semantic;
-              resolvedSource = "semantic";
-            }
-          }
+          } else {
+            const [episodic, semantic] = await Promise.all([
+              kv.get(KV.memories, candidate.memoryId),
+              kv.get(KV.semantic, candidate.memoryId),
+            ]);
+            if (episodic !== null && semantic === null) {
+              scope = KV.memories;
+              resolvedSource = "episodic";
+            } else if (semantic !== null && episodic === null) {
+              scope = KV.semantic;
+              resolvedSource = "semantic";
+            } else if (semantic !== null) {
+              // ambiguous legacy collision: prefer semantic to avoid semantic leak
+              scope = KV.semantic;
+              resolvedSource = "semantic";
+            } else {
+              // stale retention row: clean metadata only, do not count as evicted memory
+              await Promise.all([
+                kv.delete(KV.retentionScores, candidate.memoryId),
+                deleteAccessLog(kv, candidate.memoryId),
+              ]);
+              continue;
+            }
+          }
           await kv.delete(scope, candidate.memoryId);
-          await kv.delete(KV.retentionScores, candidate.memoryId);
-          await deleteAccessLog(kv, candidate.memoryId);
+          await Promise.all([
+            kv.delete(KV.retentionScores, candidate.memoryId),
+            deleteAccessLog(kv, candidate.memoryId),
+          ]);

As per coding guidelines Perform parallel operations where possible using Promise.all for independent kv writes/reads in TypeScript.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/functions/retention.ts` around lines 314 - 329, Probe both namespaces for
candidate.memoryId (via kv.get for KV.memories and KV.semantic in parallel)
instead of only checking KV.memories and assuming semantic; set scope and
resolvedSource based on which namespace actually contains the id (or skip
deletion entirely if neither exists), then only perform kv.delete(scope,
candidate.memoryId), kv.delete(KV.retentionScores, candidate.memoryId) and
deleteAccessLog(kv, candidate.memoryId) when a namespace was found and increment
evicted/evictedSemantic/evictedEpisodic accordingly; perform the independent
reads/deletes using Promise.all to parallelize operations (referencing
candidate.memoryId, kv.get/kv.delete, KV.memories, KV.semantic,
KV.retentionScores, deleteAccessLog, evicted, evictedSemantic, evictedEpisodic,
resolvedSource).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/functions/retention.ts`:
- Around line 314-329: Probe both namespaces for candidate.memoryId (via kv.get
for KV.memories and KV.semantic in parallel) instead of only checking
KV.memories and assuming semantic; set scope and resolvedSource based on which
namespace actually contains the id (or skip deletion entirely if neither
exists), then only perform kv.delete(scope, candidate.memoryId),
kv.delete(KV.retentionScores, candidate.memoryId) and deleteAccessLog(kv,
candidate.memoryId) when a namespace was found and increment
evicted/evictedSemantic/evictedEpisodic accordingly; perform the independent
reads/deletes using Promise.all to parallelize operations (referencing
candidate.memoryId, kv.get/kv.delete, KV.memories, KV.semantic,
KV.retentionScores, deleteAccessLog, evicted, evictedSemantic, evictedEpisodic,
resolvedSource).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 93203ffd-1462-4bc2-8238-c3abf7f5418f

📥 Commits

Reviewing files that changed from the base of the PR and between 04581cf and 46b0a16.

📒 Files selected for processing (3)
  • src/functions/retention.ts
  • src/types.ts
  • test/retention.test.ts
✅ Files skipped from review due to trivial changes (1)
  • test/retention.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/types.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bursting token allocation (re-open) mem::retention-evict only deletes KV.memories rows, leaking semantic memories

1 participant