Skip to content

fix(visibility): surface lessons in smart-search + tally per-store in diagnose#473

Merged
rohitg00 merged 2 commits into
rohitg00:mainfrom
efenex:fix/lesson-visibility
May 19, 2026
Merged

fix(visibility): surface lessons in smart-search + tally per-store in diagnose#473
rohitg00 merged 2 commits into
rohitg00:mainfrom
efenex:fix/lesson-visibility

Conversation

@efenex
Copy link
Copy Markdown
Contributor

@efenex efenex commented May 17, 2026

Summary

Two related UX gaps in the memory layer's reflection surfaces. A consumer that calls memory_lesson_save and gets success:true reasonably expects to find the lesson via memory_smart_search ("did my save land?") and to see it counted in memory_diagnose ("what's actually in the store?"). Today neither is true:

  • smart-search queries only KV.observations — lessons are invisible
  • diagnose's memories category counts only KV.memories — a store with thousands of lessons / summaries / semantic facts can read as "memories: 0"

The trust-shock that prompted this fix: an MCP consumer saved a lesson, got success:true, then could not surface it through memory_smart_search (got only obs_* IDs), nor see any sign of it in memory_diagnose (reported memories: 0). Hypothesized the save was disconnected. It was actually persisted — just invisible to the most-prominent retrieval and health surfaces.

A: smart-search returns lessons

  • New optional project and includeLessons (default true) params on mem::smart-search
  • Delegates lesson scoring to existing mem::lesson-recall via sdk.trigger — no duplicate scoring logic; future scoring tweaks in mem::lesson-recall auto-propagate
  • Lessons come back in a separate lessons field on the response, not merged into results. Existing consumers reading result.results are unaffected. New consumers can read result.lessons (typed CompactLessonResult[])
  • Content truncated to 240 chars in compact mode (full content via mem::lesson-recall directly when needed)
  • Lesson-recall failures are soft-handled: log warn, return empty lessons, observation results still flow

B: diagnose adds per-store tally categories

  • New categories: lessons, summaries, semantic, procedural, crystals, insights
  • Mirrors the existing memories pattern: count + light consistency check (confidence range for scored memories; non-empty title/narrative/steps for the rest)
  • Each new category added to ALL_CATEGORIES so --categories lessons filtering works as expected
  • Tombstoned lessons (deleted: true) are excluded from the live count and reported separately

Backward compatibility

  • mem::smart-search: response shape additive — adds optional lessons field. Existing readers of results see no behavior change.
  • mem::diagnose: empty-system pass count goes from 8 → 14 (8 original + 6 new). Existing tests for the existing categories untouched.

Test plan

  • npx vitest run test/smart-search.test.ts test/diagnostics.test.ts — 43 cases pass

New cases:

file added
test/smart-search.test.ts lesson inclusion, content-preview truncation, includeLessons=false opt-out, project filter passthrough, soft-fail on recall error, soft-fail on non-success response
test/diagnostics.test.ts pass + warn cases for each of the 6 new categories, categories filter accepts new names

Files

  • src/types.ts — new CompactLessonResult interface
  • src/functions/smart-search.ts — lesson recall + merge (single-call observation path + expand mode unchanged)
  • src/functions/diagnostics.ts — six new category blocks inserted before the existing mesh block
  • test/smart-search.test.ts — 6 new cases
  • test/diagnostics.test.ts — 7 new cases; empty-system pass count bumped 8→14

Out of scope

  • Merging lessons into the ranked results array (would require normalizing observation vs lesson scoring to a comparable scale — currently the BM25/vector hybrid score and the lesson confidence/recency score live on different bases). Keeping them as a separate lessons field avoids forcing that decision and gives the caller full flexibility to render however they want.
  • Including summaries, semantic facts, or insights in smart-search results. Same scoring-comparability concern, plus most consumers will want to query those tiers explicitly via their dedicated endpoints rather than incidentally in a search call.

Summary by CodeRabbit

  • New Features

    • Diagnostics expanded to cover lessons, summaries, semantic, procedural, crystals, and insights with per-category checks and pass/warn reporting
    • Smart search compact responses optionally include lesson recall results (enabled by default; can be disabled)
  • Types

    • Added a compact lesson result shape to standardize lesson previews in responses
  • Tests

    • Added comprehensive tests for new diagnostics categories and lesson inclusion behavior in smart search

Review Change Stack

… diagnose

Two related UX gaps in the memory layer's reflection surfaces. A consumer
that calls `memory_lesson_save` and gets `success:true` reasonably expects
to find the lesson via `memory_smart_search` ("did my save land?") and
to see it counted in `memory_diagnose` ("what's in the store?"). Neither
was true: lessons live in their own KV store (`KV.lessons`), and both
diagnostic surfaces only looked at `KV.observations` / `KV.memories`.
A 4,350-lesson store could read as "memories: 0" on diagnose and return
zero hits on smart_search — the trust-shock that prompted this fix.

A) mem::smart-search: also return lessons in the compact response.
   - New optional `project` and `includeLessons` (default true) params.
   - Delegates lesson scoring to the existing mem::lesson-recall via
     sdk.trigger, so confidence + recency weighting stays consistent
     with mem::lesson-recall (no duplicate scoring logic).
   - Lessons come back in a separate `lessons` field on the response,
     not merged into `results`. Existing consumers reading `results`
     are unaffected; new consumers can read `result.lessons` too.
   - Content truncated to 240 chars in compact mode (full content
     remains available via mem::lesson-recall directly).
   - Lesson-recall failures are soft: log + return empty lessons,
     observation results still flow through.

B) mem::diagnose: add per-store tally categories for lessons,
   summaries, semantic, procedural, crystals, insights. Mirrors the
   existing `memories` pattern: count + light consistency check
   (confidence range for scored memories; non-empty title/narrative/
   steps for the rest). Each new category is in ALL_CATEGORIES so
   `--categories lessons` filtering works as expected.

The empty-system pass count goes from 8 to 14 (8 original + 6 new
stores). Test updated accordingly.

- src/types.ts: add CompactLessonResult
- src/functions/smart-search.ts: lesson recall + merge (single-call
  path unchanged, expand mode unchanged)
- src/functions/diagnostics.ts: six new category blocks before mesh
- test/smart-search.test.ts: 6 new cases (lesson inclusion, content
  preview truncation, includeLessons=false opt-out, project filter
  passthrough, soft-fail on recall error / non-success response)
- test/diagnostics.test.ts: 7 new pass/warn cases for each new
  category + filter check; empty-system pass count bumped 8→14

43/43 tests pass.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 17, 2026

@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1937abfd-7191-4a60-94da-8252f77995e2

📥 Commits

Reviewing files that changed from the base of the PR and between 44a3638 and bbfaa16.

📒 Files selected for processing (2)
  • src/functions/diagnostics.ts
  • test/diagnostics.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/functions/diagnostics.ts

📝 Walkthrough

Walkthrough

Adds CompactLessonResult, includes optional lesson-recall results in compact mem::smart-search responses, and extends mem::diagnose with validation checks for lessons, summaries, semantic, procedural, crystals, and insights stores.

Changes

Lesson Visibility Feature

Layer / File(s) Summary
CompactLessonResult type contract
src/types.ts
Exports CompactLessonResult interface with lesson identifiers, content, confidence/score fields, creation time, optional project, and tags.
Smart search lesson recall integration
src/functions/smart-search.ts, test/smart-search.test.ts
Extends registerSmartSearchFunction to run concurrent observation hybrid search and lesson recall via mem::lesson-recall, capping lesson results (max 10); adds includeLessons flag (default true) and recallLessons helper that validates response shape, truncates content to a preview, maps fields to CompactLessonResult, and handles errors; compact response may include optional lessons and logs lesson counts. Tests cover inclusion, truncation, conditional omission, project forwarding, and resilience to recall failures.
Diagnostic validation for memory store categories
src/functions/diagnostics.ts, test/diagnostics.test.ts
Extends ALL_CATEGORIES to include lessons, summaries, semantic, procedural, crystals, and insights. mem::diagnose now validates each store for common issues: confidence ranges (lessons/insights/semantic), missing/blank summary titles, empty procedural steps, and empty crystal narratives; accumulates warnings per record, emits *-ok when no issues (lessons OK includes tombstone count), and supports category filtering. Tests exercise per-store pass/warn behavior, category filtering, and defensive handling of corrupted row shapes.

Sequence Diagram(s)

See diagram above in the hidden review stack artifact.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A memory grows wise with lessons learned,
Smart searches now recall what was earned,
Diagnostics check each store with care,
Confidence ranges, no blanks anywhere—
The system hops forward, tidy and prepared!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding lesson visibility to smart-search and per-store tallying to diagnose, matching the core objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/functions/diagnostics.ts (1)

370-372: ⚡ Quick win

Remove inline WHAT-comments and express intent via naming.

These comments describe behavior already evident from code and should be replaced with clearer identifiers.

Suggested patch
-        // Counts only live lessons (deleted=true rows are tombstoned).
-        // Catches bad confidence values that would silently break recall
-        // scoring (memory_lesson_recall multiplies by confidence).
         const lessons = await kv.list<Lesson>(KV.lessons);
-        const live = lessons.filter((l) => !l.deleted);
+        const liveLessons = lessons.filter((l) => !l.deleted);
         let lessonIssues = 0;
-        for (const l of live) {
+        for (const l of liveLessons) {
@@
-            message: `All ${live.length} lessons are healthy (${lessons.length - live.length} tombstoned)`,
+            message: `All ${liveLessons.length} lessons are healthy (${lessons.length - liveLessons.length} tombstoned)`,

As per coding guidelines, "Do not use code comments explaining WHAT — use clear naming instead".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/diagnostics.ts` around lines 370 - 372, Remove the inline
WHAT-comments and express intent via clearer names: replace generic
variable/flag names used in the live-lesson counting logic with descriptive
identifiers (e.g., rename a boolean/field used to exclude deleted rows to
isTombstoned or includeOnlyLiveLessons, rename the counter/aggregate to
liveLessonCount or countOnlyLiveLessons, and rename any filter/validator for
confidence to validConfidence or isConfidenceInRange); update the function or
helper used for recall scoring (reference memory_lesson_recall) to call these
new predicates/helpers so the code reads self-documentingly and the comments can
be deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/diagnostics.ts`:
- Around line 377-378: The validators must be hardened: when validating label
strings (the places that call .trim()) add a runtime type guard so you only call
.trim() if typeof value === 'string' (otherwise record a validation error into
checks) and avoid mutating non-strings; and when validating numeric confidence
(the l.confidence checks) replace the simple range test with an explicit
finite-number check using Number.isFinite (e.g. reject if
!Number.isFinite(l.confidence) || l.confidence < 0 || l.confidence > 1) so
NaN/Infinity are treated as errors; apply these changes in the validator code
referenced by mem::diagnose where l.confidence and the .trim() calls occur and
push appropriate entries into checks for malformed row shapes.

---

Nitpick comments:
In `@src/functions/diagnostics.ts`:
- Around line 370-372: Remove the inline WHAT-comments and express intent via
clearer names: replace generic variable/flag names used in the live-lesson
counting logic with descriptive identifiers (e.g., rename a boolean/field used
to exclude deleted rows to isTombstoned or includeOnlyLiveLessons, rename the
counter/aggregate to liveLessonCount or countOnlyLiveLessons, and rename any
filter/validator for confidence to validConfidence or isConfidenceInRange);
update the function or helper used for recall scoring (reference
memory_lesson_recall) to call these new predicates/helpers so the code reads
self-documentingly and the comments can be deleted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0c4b66f0-ea68-469b-a831-a3fd1b2a12d5

📥 Commits

Reviewing files that changed from the base of the PR and between 9061da5 and 44a3638.

📒 Files selected for processing (5)
  • src/functions/diagnostics.ts
  • src/functions/smart-search.ts
  • src/types.ts
  • test/diagnostics.test.ts
  • test/smart-search.test.ts

Comment thread src/functions/diagnostics.ts Outdated
…tg00#473 review)

CodeRabbit flagged two patterns in the per-store validators added in
the parent commit:

1. .trim() on .title / .narrative was unconditional — a corrupted row
   with title=null or title=42 would throw, abort the whole diagnose
   run, and silently skip every later category. Add typeof guards.

2. confidence range checks were `< 0 || > 1` which silently passes
   NaN and Infinity (NaN < 0 is false, NaN > 1 is false → "healthy").
   Add Number.isFinite(...) prefix so corrupted scored rows surface
   as warnings instead.

Applied across all 6 new validators: lesson confidence, summary title,
semantic confidence, crystal narrative, insight confidence.

Tests added in test/diagnostics.test.ts under "defensive row-shape
handling": NaN confidence on a lesson, null summary title (verifies
diagnose still completes and later categories still execute),
undefined crystal narrative, Infinity / NaN on insight + semantic.

34/34 tests pass.
@efenex
Copy link
Copy Markdown
Contributor Author

efenex commented May 17, 2026

Thanks for the review — pushed bbfaa16:

  • All 5 confidence checks (lesson / semantic / insight) now prefix with Number.isFinite(...) so NaN and Infinity surface as warnings instead of silently passing the range check
  • summary.title and crystal.narrative use typeof x !== "string" before .trim() so corrupted rows produce a warn rather than throwing and aborting the whole mem::diagnose run
  • Error messages updated to read "finite number in 0..1"
  • 4 new test cases under defensive row-shape handling in test/diagnostics.test.ts, including one that explicitly verifies a later category check still executes when an earlier one had a bad row

34/34 tests pass.

@rohitg00 rohitg00 mentioned this pull request May 19, 2026
@rohitg00 rohitg00 merged commit cb2e013 into rohitg00:main May 19, 2026
1 of 2 checks passed
@rohitg00
Copy link
Copy Markdown
Owner

Merged + shipping in v0.9.21. Thanks @efenex — closes the lesson round-trip with #458: save → indexed → recalled via smart-search → counted in diagnose. The trust-shock pattern (save success but invisible to recall) was a real contract violation.

@rohitg00 rohitg00 mentioned this pull request May 19, 2026
rohitg00 added a commit that referenced this pull request May 19, 2026
Quality + integration wave. Bundles 11 PRs since v0.9.20:

Contributor feature:
- #237 OpenCode plugin with 22 auto-capture hooks (@cl0ckt0wer)

Bug fixes (9):
- #516 memory_recall endpoint + format/token_budget (@serhiizghama, closes #507/#440)
- #461 env-file AGENTMEMORY_DROP_STALE_INDEX flag honored (@honor2030, closes #456)
- #487 Windows hook path quoting (@honor2030, closes #477)
- #517 viewer IME composition guard (@jonathanzhan1975)
- #472 chunk large sessions for LLM context window (@efenex)
- #473 surface lessons in smart-search + diagnose tally (@efenex)
- #486 declare all Hermes plugin hooks (@honor2030)
- #500 rebuildIndex non-blocking on boot (@efenex)
- #504 batched embed in rebuildIndex (25h -> 3h) (@efenex)
- #491 cli skip onboarding without tty (@honor2030)

Upstream-installer revert:
- #546 drop --next workaround now that iii-hq/iii#1660 shipped

1067/1067 tests pass across 95 files.
efenex added a commit to efenex/agentmemory that referenced this pull request May 20, 2026
Follow-up to rohitg00#473. The Claude Code SessionStart hook (and seven other
hooks that POST observations) computed `project = data.cwd || process.cwd()` —
i.e. the absolute path of the working directory. But every other code
path that tags records with a project uses the basename:

- mem::replay::import-jsonl preserves whatever the JSONL file contains,
  which is the basename Claude Code itself stores in transcripts
- humans calling memory_lesson_save naturally pass the short name
  ("gitops-assistant", not "/Users/me/work/foo/gitops-assistant")

Result: the auto-inject context block at session start ran a
project-equality filter that matched zero of the bulk of the operator's
data. A 4351-lesson, 52-summary store rendered as a 3-block context
payload with no Lessons Learned section at all (just session summaries
from the rare records tagged with the full cwd).

Two-sided fix:

1. Hooks now resolve project via a shared `src/hooks/_project.ts`
   helper. Resolution order:
   - AGENTMEMORY_PROJECT_NAME env var (operator escape hatch, settable
     per-repo via Claude Code's .claude/settings.json `env` block)
   - basename(`git rev-parse --show-toplevel`) — handles sessions
     started in a subdirectory of the repo, survives moving the repo
   - basename(cwd) — final fallback for non-git directories

   All eight hooks that POST a project field were updated (session-start,
   prompt-submit, post-tool-use, post-tool-failure, notification,
   subagent-start, subagent-stop, task-completed).

2. mem::context tolerates the mixed-vintage corpus that this regression
   has already produced. The project-match for both lessons and sessions
   now uses a basename-equality fallback: exact match wins, then
   basename(stored) === basename(requested). Means existing records
   tagged with the full cwd continue to surface for callers passing
   the basename, and vice versa, without requiring re-import.

Documented trade-off: same-named repos in different paths now collapse
to one logical project at retrieval time (e.g. /work/foo and
/personal/foo). Operators who need to keep them separate set
AGENTMEMORY_PROJECT_NAME explicitly per repo. Behavior is asserted in
test/context-lessons.test.ts to make any future tightening a known
break, not an accidental one.

Tests: 17/17 in the directly impacted files. Test layout:

- test/hook-project.test.ts: AGENTMEMORY_PROJECT_NAME precedence,
  whitespace handling, git-toplevel basename when inside a repo,
  cwd basename when not in a repo, subdir-of-repo handling
- test/context-lessons.test.ts: full-path-caller matches
  basename-tagged lesson, basename-caller matches full-path-tagged
  lesson, same-basename-different-path collapse

The remaining 10 failures in the full suite are pre-existing on main
(embedding-provider / auto-compress / fetch-timeout env-pollution
tests that fail when ~/.agentmemory/.env has API keys set); unrelated
to this change.

Filed against rohitg00#474.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants