fix(visibility): surface lessons in smart-search + tally per-store in diagnose by efenex · Pull Request #473 · rohitg00/agentmemory

efenex · 2026-05-17T21:22:13Z

Summary

Two related UX gaps in the memory layer's reflection surfaces. A consumer that calls memory_lesson_save and gets success:true reasonably expects to find the lesson via memory_smart_search ("did my save land?") and to see it counted in memory_diagnose ("what's actually in the store?"). Today neither is true:

smart-search queries only KV.observations — lessons are invisible
diagnose's memories category counts only KV.memories — a store with thousands of lessons / summaries / semantic facts can read as "memories: 0"

The trust-shock that prompted this fix: an MCP consumer saved a lesson, got success:true, then could not surface it through memory_smart_search (got only obs_* IDs), nor see any sign of it in memory_diagnose (reported memories: 0). Hypothesized the save was disconnected. It was actually persisted — just invisible to the most-prominent retrieval and health surfaces.

A: smart-search returns lessons

New optional project and includeLessons (default true) params on mem::smart-search
Delegates lesson scoring to existing mem::lesson-recall via sdk.trigger — no duplicate scoring logic; future scoring tweaks in mem::lesson-recall auto-propagate
Lessons come back in a separate lessons field on the response, not merged into results. Existing consumers reading result.results are unaffected. New consumers can read result.lessons (typed CompactLessonResult[])
Content truncated to 240 chars in compact mode (full content via mem::lesson-recall directly when needed)
Lesson-recall failures are soft-handled: log warn, return empty lessons, observation results still flow

B: diagnose adds per-store tally categories

New categories: lessons, summaries, semantic, procedural, crystals, insights
Mirrors the existing memories pattern: count + light consistency check (confidence range for scored memories; non-empty title/narrative/steps for the rest)
Each new category added to ALL_CATEGORIES so --categories lessons filtering works as expected
Tombstoned lessons (deleted: true) are excluded from the live count and reported separately

Backward compatibility

mem::smart-search: response shape additive — adds optional lessons field. Existing readers of results see no behavior change.
mem::diagnose: empty-system pass count goes from 8 → 14 (8 original + 6 new). Existing tests for the existing categories untouched.

Test plan

npx vitest run test/smart-search.test.ts test/diagnostics.test.ts — 43 cases pass

New cases:

file	added
`test/smart-search.test.ts`	lesson inclusion, content-preview truncation, `includeLessons=false` opt-out, project filter passthrough, soft-fail on recall error, soft-fail on non-success response
`test/diagnostics.test.ts`	pass + warn cases for each of the 6 new categories, categories filter accepts new names

Files

src/types.ts — new CompactLessonResult interface
src/functions/smart-search.ts — lesson recall + merge (single-call observation path + expand mode unchanged)
src/functions/diagnostics.ts — six new category blocks inserted before the existing mesh block
test/smart-search.test.ts — 6 new cases
test/diagnostics.test.ts — 7 new cases; empty-system pass count bumped 8→14

Out of scope

Merging lessons into the ranked results array (would require normalizing observation vs lesson scoring to a comparable scale — currently the BM25/vector hybrid score and the lesson confidence/recency score live on different bases). Keeping them as a separate lessons field avoids forcing that decision and gives the caller full flexibility to render however they want.
Including summaries, semantic facts, or insights in smart-search results. Same scoring-comparability concern, plus most consumers will want to query those tiers explicitly via their dedicated endpoints rather than incidentally in a search call.

Summary by CodeRabbit

New Features
- Diagnostics expanded to cover lessons, summaries, semantic, procedural, crystals, and insights with per-category checks and pass/warn reporting
- Smart search compact responses optionally include lesson recall results (enabled by default; can be disabled)
Types
- Added a compact lesson result shape to standardize lesson previews in responses
Tests
- Added comprehensive tests for new diagnostics categories and lesson inclusion behavior in smart search

… diagnose Two related UX gaps in the memory layer's reflection surfaces. A consumer that calls `memory_lesson_save` and gets `success:true` reasonably expects to find the lesson via `memory_smart_search` ("did my save land?") and to see it counted in `memory_diagnose` ("what's in the store?"). Neither was true: lessons live in their own KV store (`KV.lessons`), and both diagnostic surfaces only looked at `KV.observations` / `KV.memories`. A 4,350-lesson store could read as "memories: 0" on diagnose and return zero hits on smart_search — the trust-shock that prompted this fix. A) mem::smart-search: also return lessons in the compact response. - New optional `project` and `includeLessons` (default true) params. - Delegates lesson scoring to the existing mem::lesson-recall via sdk.trigger, so confidence + recency weighting stays consistent with mem::lesson-recall (no duplicate scoring logic). - Lessons come back in a separate `lessons` field on the response, not merged into `results`. Existing consumers reading `results` are unaffected; new consumers can read `result.lessons` too. - Content truncated to 240 chars in compact mode (full content remains available via mem::lesson-recall directly). - Lesson-recall failures are soft: log + return empty lessons, observation results still flow through. B) mem::diagnose: add per-store tally categories for lessons, summaries, semantic, procedural, crystals, insights. Mirrors the existing `memories` pattern: count + light consistency check (confidence range for scored memories; non-empty title/narrative/ steps for the rest). Each new category is in ALL_CATEGORIES so `--categories lessons` filtering works as expected. The empty-system pass count goes from 8 to 14 (8 original + 6 new stores). Test updated accordingly. - src/types.ts: add CompactLessonResult - src/functions/smart-search.ts: lesson recall + merge (single-call path unchanged, expand mode unchanged) - src/functions/diagnostics.ts: six new category blocks before mesh - test/smart-search.test.ts: 6 new cases (lesson inclusion, content preview truncation, includeLessons=false opt-out, project filter passthrough, soft-fail on recall error / non-success response) - test/diagnostics.test.ts: 7 new pass/warn cases for each new category + filter check; empty-system pass count bumped 8→14 43/43 tests pass.

vercel · 2026-05-17T21:22:17Z

@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-05-17T21:22:26Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1937abfd-7191-4a60-94da-8252f77995e2

📥 Commits

Reviewing files that changed from the base of the PR and between 44a3638 and bbfaa16.

📒 Files selected for processing (2)

src/functions/diagnostics.ts
test/diagnostics.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/functions/diagnostics.ts

📝 Walkthrough

Walkthrough

Adds CompactLessonResult, includes optional lesson-recall results in compact mem::smart-search responses, and extends mem::diagnose with validation checks for lessons, summaries, semantic, procedural, crystals, and insights stores.

Changes

Lesson Visibility Feature

Layer / File(s)	Summary
CompactLessonResult type contract `src/types.ts`	Exports `CompactLessonResult` interface with lesson identifiers, content, confidence/score fields, creation time, optional project, and tags.
Smart search lesson recall integration `src/functions/smart-search.ts`, `test/smart-search.test.ts`	Extends `registerSmartSearchFunction` to run concurrent observation hybrid search and lesson recall via `mem::lesson-recall`, capping lesson results (max 10); adds `includeLessons` flag (default true) and `recallLessons` helper that validates response shape, truncates content to a preview, maps fields to `CompactLessonResult`, and handles errors; compact response may include optional `lessons` and logs lesson counts. Tests cover inclusion, truncation, conditional omission, project forwarding, and resilience to recall failures.
Diagnostic validation for memory store categories `src/functions/diagnostics.ts`, `test/diagnostics.test.ts`	Extends `ALL_CATEGORIES` to include `lessons`, `summaries`, `semantic`, `procedural`, `crystals`, and `insights`. `mem::diagnose` now validates each store for common issues: confidence ranges (lessons/insights/semantic), missing/blank summary titles, empty procedural steps, and empty crystal narratives; accumulates warnings per record, emits `*-ok` when no issues (lessons OK includes tombstone count), and supports category filtering. Tests exercise per-store pass/warn behavior, category filtering, and defensive handling of corrupted row shapes.

Sequence Diagram(s)

See diagram above in the hidden review stack artifact.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A memory grows wise with lessons learned,
Smart searches now recall what was earned,
Diagnostics check each store with care,
Confidence ranges, no blanks anywhere—
The system hops forward, tidy and prepared!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding lesson visibility to smart-search and per-store tallying to diagnose, matching the core objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/functions/diagnostics.ts (1)

370-372: ⚡ Quick win

Remove inline WHAT-comments and express intent via naming.

These comments describe behavior already evident from code and should be replaced with clearer identifiers.

Suggested patch

-        // Counts only live lessons (deleted=true rows are tombstoned).
-        // Catches bad confidence values that would silently break recall
-        // scoring (memory_lesson_recall multiplies by confidence).
         const lessons = await kv.list<Lesson>(KV.lessons);
-        const live = lessons.filter((l) => !l.deleted);
+        const liveLessons = lessons.filter((l) => !l.deleted);
         let lessonIssues = 0;
-        for (const l of live) {
+        for (const l of liveLessons) {
@@
-            message: `All ${live.length} lessons are healthy (${lessons.length - live.length} tombstoned)`,
+            message: `All ${liveLessons.length} lessons are healthy (${lessons.length - liveLessons.length} tombstoned)`,

As per coding guidelines, "Do not use code comments explaining WHAT — use clear naming instead".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/diagnostics.ts` around lines 370 - 372, Remove the inline
WHAT-comments and express intent via clearer names: replace generic
variable/flag names used in the live-lesson counting logic with descriptive
identifiers (e.g., rename a boolean/field used to exclude deleted rows to
isTombstoned or includeOnlyLiveLessons, rename the counter/aggregate to
liveLessonCount or countOnlyLiveLessons, and rename any filter/validator for
confidence to validConfidence or isConfidenceInRange); update the function or
helper used for recall scoring (reference memory_lesson_recall) to call these
new predicates/helpers so the code reads self-documentingly and the comments can
be deleted.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/diagnostics.ts`:
- Around line 377-378: The validators must be hardened: when validating label
strings (the places that call .trim()) add a runtime type guard so you only call
.trim() if typeof value === 'string' (otherwise record a validation error into
checks) and avoid mutating non-strings; and when validating numeric confidence
(the l.confidence checks) replace the simple range test with an explicit
finite-number check using Number.isFinite (e.g. reject if
!Number.isFinite(l.confidence) || l.confidence < 0 || l.confidence > 1) so
NaN/Infinity are treated as errors; apply these changes in the validator code
referenced by mem::diagnose where l.confidence and the .trim() calls occur and
push appropriate entries into checks for malformed row shapes.

---

Nitpick comments:
In `@src/functions/diagnostics.ts`:
- Around line 370-372: Remove the inline WHAT-comments and express intent via
clearer names: replace generic variable/flag names used in the live-lesson
counting logic with descriptive identifiers (e.g., rename a boolean/field used
to exclude deleted rows to isTombstoned or includeOnlyLiveLessons, rename the
counter/aggregate to liveLessonCount or countOnlyLiveLessons, and rename any
filter/validator for confidence to validConfidence or isConfidenceInRange);
update the function or helper used for recall scoring (reference
memory_lesson_recall) to call these new predicates/helpers so the code reads
self-documentingly and the comments can be deleted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0c4b66f0-ea68-469b-a831-a3fd1b2a12d5

📥 Commits

Reviewing files that changed from the base of the PR and between 9061da5 and 44a3638.

📒 Files selected for processing (5)

src/functions/diagnostics.ts
src/functions/smart-search.ts
src/types.ts
test/diagnostics.test.ts
test/smart-search.test.ts

…tg00#473 review) CodeRabbit flagged two patterns in the per-store validators added in the parent commit: 1. .trim() on .title / .narrative was unconditional — a corrupted row with title=null or title=42 would throw, abort the whole diagnose run, and silently skip every later category. Add typeof guards. 2. confidence range checks were `< 0 || > 1` which silently passes NaN and Infinity (NaN < 0 is false, NaN > 1 is false → "healthy"). Add Number.isFinite(...) prefix so corrupted scored rows surface as warnings instead. Applied across all 6 new validators: lesson confidence, summary title, semantic confidence, crystal narrative, insight confidence. Tests added in test/diagnostics.test.ts under "defensive row-shape handling": NaN confidence on a lesson, null summary title (verifies diagnose still completes and later categories still execute), undefined crystal narrative, Infinity / NaN on insight + semantic. 34/34 tests pass.

efenex · 2026-05-17T22:15:27Z

Thanks for the review — pushed bbfaa16:

All 5 confidence checks (lesson / semantic / insight) now prefix with Number.isFinite(...) so NaN and Infinity surface as warnings instead of silently passing the range check
summary.title and crystal.narrative use typeof x !== "string" before .trim() so corrupted rows produce a warn rather than throwing and aborting the whole mem::diagnose run
Error messages updated to read "finite number in 0..1"
4 new test cases under defensive row-shape handling in test/diagnostics.test.ts, including one that explicitly verifies a later category check still executes when an earlier one had a bad row

34/34 tests pass.

rohitg00 · 2026-05-19T18:30:14Z

Merged + shipping in v0.9.21. Thanks @efenex — closes the lesson round-trip with #458: save → indexed → recalled via smart-search → counted in diagnose. The trust-shock pattern (save success but invisible to recall) was a real contract violation.

@cl0ckt0wer

Quality + integration wave. Bundles 11 PRs since v0.9.20: Contributor feature: - #237 OpenCode plugin with 22 auto-capture hooks (@cl0ckt0wer) Bug fixes (9): - #516 memory_recall endpoint + format/token_budget (@serhiizghama, closes #507/#440) - #461 env-file AGENTMEMORY_DROP_STALE_INDEX flag honored (@honor2030, closes #456) - #487 Windows hook path quoting (@honor2030, closes #477) - #517 viewer IME composition guard (@jonathanzhan1975) - #472 chunk large sessions for LLM context window (@efenex) - #473 surface lessons in smart-search + diagnose tally (@efenex) - #486 declare all Hermes plugin hooks (@honor2030) - #500 rebuildIndex non-blocking on boot (@efenex) - #504 batched embed in rebuildIndex (25h -> 3h) (@efenex) - #491 cli skip onboarding without tty (@honor2030) Upstream-installer revert: - #546 drop --next workaround now that iii-hq/iii#1660 shipped 1067/1067 tests pass across 95 files.

Follow-up to rohitg00#473. The Claude Code SessionStart hook (and seven other hooks that POST observations) computed `project = data.cwd || process.cwd()` — i.e. the absolute path of the working directory. But every other code path that tags records with a project uses the basename: - mem::replay::import-jsonl preserves whatever the JSONL file contains, which is the basename Claude Code itself stores in transcripts - humans calling memory_lesson_save naturally pass the short name ("gitops-assistant", not "/Users/me/work/foo/gitops-assistant") Result: the auto-inject context block at session start ran a project-equality filter that matched zero of the bulk of the operator's data. A 4351-lesson, 52-summary store rendered as a 3-block context payload with no Lessons Learned section at all (just session summaries from the rare records tagged with the full cwd). Two-sided fix: 1. Hooks now resolve project via a shared `src/hooks/_project.ts` helper. Resolution order: - AGENTMEMORY_PROJECT_NAME env var (operator escape hatch, settable per-repo via Claude Code's .claude/settings.json `env` block) - basename(`git rev-parse --show-toplevel`) — handles sessions started in a subdirectory of the repo, survives moving the repo - basename(cwd) — final fallback for non-git directories All eight hooks that POST a project field were updated (session-start, prompt-submit, post-tool-use, post-tool-failure, notification, subagent-start, subagent-stop, task-completed). 2. mem::context tolerates the mixed-vintage corpus that this regression has already produced. The project-match for both lessons and sessions now uses a basename-equality fallback: exact match wins, then basename(stored) === basename(requested). Means existing records tagged with the full cwd continue to surface for callers passing the basename, and vice versa, without requiring re-import. Documented trade-off: same-named repos in different paths now collapse to one logical project at retrieval time (e.g. /work/foo and /personal/foo). Operators who need to keep them separate set AGENTMEMORY_PROJECT_NAME explicitly per repo. Behavior is asserted in test/context-lessons.test.ts to make any future tightening a known break, not an accidental one. Tests: 17/17 in the directly impacted files. Test layout: - test/hook-project.test.ts: AGENTMEMORY_PROJECT_NAME precedence, whitespace handling, git-toplevel basename when inside a repo, cwd basename when not in a repo, subdir-of-repo handling - test/context-lessons.test.ts: full-path-caller matches basename-tagged lesson, basename-caller matches full-path-tagged lesson, same-basename-different-path collapse The remaining 10 failures in the full suite are pre-existing on main (embedding-provider / auto-compress / fetch-timeout env-pollution tests that fail when ~/.agentmemory/.env has API keys set); unrelated to this change. Filed against rohitg00#474.

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Comment thread src/functions/diagnostics.ts Outdated

This was referenced May 17, 2026

agentmemory stop disconnects worker but leaves iii engine running with stale function registrations #474

Open

fix(hooks): resolve project name to repo basename, not raw cwd #475

Open

rohitg00 mentioned this pull request May 19, 2026

chore(release): v0.9.21 #550

Closed

rohitg00 merged commit cb2e013 into rohitg00:main May 19, 2026
1 of 2 checks passed

rohitg00 mentioned this pull request May 19, 2026

chore(release): v0.9.21 #551

Merged

nickdesi mentioned this pull request May 19, 2026

docs: memory_lesson_save MCP tool exists but is completely undocumented (confidence, decay, reinforcement lifecycle) #552

Open

coderabbitai Bot mentioned this pull request May 20, 2026

feat(smart-search): boost title/narrative matches on 'who/what is X' queries #571

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(visibility): surface lessons in smart-search + tally per-store in diagnose#473

fix(visibility): surface lessons in smart-search + tally per-store in diagnose#473
rohitg00 merged 2 commits into
rohitg00:mainfrom
efenex:fix/lesson-visibility

efenex commented May 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

efenex commented May 17, 2026

Uh oh!

Uh oh!

rohitg00 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

efenex commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

A: smart-search returns lessons

B: diagnose adds per-store tally categories

Backward compatibility

Test plan

Files

Out of scope

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

efenex commented May 17, 2026

Uh oh!

Uh oh!

rohitg00 commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

efenex commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading