fix(visibility): surface lessons in smart-search + tally per-store in diagnose#473
Conversation
… diagnose
Two related UX gaps in the memory layer's reflection surfaces. A consumer
that calls `memory_lesson_save` and gets `success:true` reasonably expects
to find the lesson via `memory_smart_search` ("did my save land?") and
to see it counted in `memory_diagnose` ("what's in the store?"). Neither
was true: lessons live in their own KV store (`KV.lessons`), and both
diagnostic surfaces only looked at `KV.observations` / `KV.memories`.
A 4,350-lesson store could read as "memories: 0" on diagnose and return
zero hits on smart_search — the trust-shock that prompted this fix.
A) mem::smart-search: also return lessons in the compact response.
- New optional `project` and `includeLessons` (default true) params.
- Delegates lesson scoring to the existing mem::lesson-recall via
sdk.trigger, so confidence + recency weighting stays consistent
with mem::lesson-recall (no duplicate scoring logic).
- Lessons come back in a separate `lessons` field on the response,
not merged into `results`. Existing consumers reading `results`
are unaffected; new consumers can read `result.lessons` too.
- Content truncated to 240 chars in compact mode (full content
remains available via mem::lesson-recall directly).
- Lesson-recall failures are soft: log + return empty lessons,
observation results still flow through.
B) mem::diagnose: add per-store tally categories for lessons,
summaries, semantic, procedural, crystals, insights. Mirrors the
existing `memories` pattern: count + light consistency check
(confidence range for scored memories; non-empty title/narrative/
steps for the rest). Each new category is in ALL_CATEGORIES so
`--categories lessons` filtering works as expected.
The empty-system pass count goes from 8 to 14 (8 original + 6 new
stores). Test updated accordingly.
- src/types.ts: add CompactLessonResult
- src/functions/smart-search.ts: lesson recall + merge (single-call
path unchanged, expand mode unchanged)
- src/functions/diagnostics.ts: six new category blocks before mesh
- test/smart-search.test.ts: 6 new cases (lesson inclusion, content
preview truncation, includeLessons=false opt-out, project filter
passthrough, soft-fail on recall error / non-success response)
- test/diagnostics.test.ts: 7 new pass/warn cases for each new
category + filter check; empty-system pass count bumped 8→14
43/43 tests pass.
|
@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds CompactLessonResult, includes optional lesson-recall results in compact mem::smart-search responses, and extends mem::diagnose with validation checks for lessons, summaries, semantic, procedural, crystals, and insights stores. ChangesLesson Visibility Feature
Sequence Diagram(s)See diagram above in the hidden review stack artifact. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/functions/diagnostics.ts (1)
370-372: ⚡ Quick winRemove inline WHAT-comments and express intent via naming.
These comments describe behavior already evident from code and should be replaced with clearer identifiers.
Suggested patch
- // Counts only live lessons (deleted=true rows are tombstoned). - // Catches bad confidence values that would silently break recall - // scoring (memory_lesson_recall multiplies by confidence). const lessons = await kv.list<Lesson>(KV.lessons); - const live = lessons.filter((l) => !l.deleted); + const liveLessons = lessons.filter((l) => !l.deleted); let lessonIssues = 0; - for (const l of live) { + for (const l of liveLessons) { @@ - message: `All ${live.length} lessons are healthy (${lessons.length - live.length} tombstoned)`, + message: `All ${liveLessons.length} lessons are healthy (${lessons.length - liveLessons.length} tombstoned)`,As per coding guidelines, "Do not use code comments explaining WHAT — use clear naming instead".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/diagnostics.ts` around lines 370 - 372, Remove the inline WHAT-comments and express intent via clearer names: replace generic variable/flag names used in the live-lesson counting logic with descriptive identifiers (e.g., rename a boolean/field used to exclude deleted rows to isTombstoned or includeOnlyLiveLessons, rename the counter/aggregate to liveLessonCount or countOnlyLiveLessons, and rename any filter/validator for confidence to validConfidence or isConfidenceInRange); update the function or helper used for recall scoring (reference memory_lesson_recall) to call these new predicates/helpers so the code reads self-documentingly and the comments can be deleted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/functions/diagnostics.ts`:
- Around line 377-378: The validators must be hardened: when validating label
strings (the places that call .trim()) add a runtime type guard so you only call
.trim() if typeof value === 'string' (otherwise record a validation error into
checks) and avoid mutating non-strings; and when validating numeric confidence
(the l.confidence checks) replace the simple range test with an explicit
finite-number check using Number.isFinite (e.g. reject if
!Number.isFinite(l.confidence) || l.confidence < 0 || l.confidence > 1) so
NaN/Infinity are treated as errors; apply these changes in the validator code
referenced by mem::diagnose where l.confidence and the .trim() calls occur and
push appropriate entries into checks for malformed row shapes.
---
Nitpick comments:
In `@src/functions/diagnostics.ts`:
- Around line 370-372: Remove the inline WHAT-comments and express intent via
clearer names: replace generic variable/flag names used in the live-lesson
counting logic with descriptive identifiers (e.g., rename a boolean/field used
to exclude deleted rows to isTombstoned or includeOnlyLiveLessons, rename the
counter/aggregate to liveLessonCount or countOnlyLiveLessons, and rename any
filter/validator for confidence to validConfidence or isConfidenceInRange);
update the function or helper used for recall scoring (reference
memory_lesson_recall) to call these new predicates/helpers so the code reads
self-documentingly and the comments can be deleted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0c4b66f0-ea68-469b-a831-a3fd1b2a12d5
📒 Files selected for processing (5)
src/functions/diagnostics.tssrc/functions/smart-search.tssrc/types.tstest/diagnostics.test.tstest/smart-search.test.ts
…tg00#473 review) CodeRabbit flagged two patterns in the per-store validators added in the parent commit: 1. .trim() on .title / .narrative was unconditional — a corrupted row with title=null or title=42 would throw, abort the whole diagnose run, and silently skip every later category. Add typeof guards. 2. confidence range checks were `< 0 || > 1` which silently passes NaN and Infinity (NaN < 0 is false, NaN > 1 is false → "healthy"). Add Number.isFinite(...) prefix so corrupted scored rows surface as warnings instead. Applied across all 6 new validators: lesson confidence, summary title, semantic confidence, crystal narrative, insight confidence. Tests added in test/diagnostics.test.ts under "defensive row-shape handling": NaN confidence on a lesson, null summary title (verifies diagnose still completes and later categories still execute), undefined crystal narrative, Infinity / NaN on insight + semantic. 34/34 tests pass.
|
Thanks for the review — pushed bbfaa16:
34/34 tests pass. |
Quality + integration wave. Bundles 11 PRs since v0.9.20: Contributor feature: - #237 OpenCode plugin with 22 auto-capture hooks (@cl0ckt0wer) Bug fixes (9): - #516 memory_recall endpoint + format/token_budget (@serhiizghama, closes #507/#440) - #461 env-file AGENTMEMORY_DROP_STALE_INDEX flag honored (@honor2030, closes #456) - #487 Windows hook path quoting (@honor2030, closes #477) - #517 viewer IME composition guard (@jonathanzhan1975) - #472 chunk large sessions for LLM context window (@efenex) - #473 surface lessons in smart-search + diagnose tally (@efenex) - #486 declare all Hermes plugin hooks (@honor2030) - #500 rebuildIndex non-blocking on boot (@efenex) - #504 batched embed in rebuildIndex (25h -> 3h) (@efenex) - #491 cli skip onboarding without tty (@honor2030) Upstream-installer revert: - #546 drop --next workaround now that iii-hq/iii#1660 shipped 1067/1067 tests pass across 95 files.
Follow-up to rohitg00#473. The Claude Code SessionStart hook (and seven other hooks that POST observations) computed `project = data.cwd || process.cwd()` — i.e. the absolute path of the working directory. But every other code path that tags records with a project uses the basename: - mem::replay::import-jsonl preserves whatever the JSONL file contains, which is the basename Claude Code itself stores in transcripts - humans calling memory_lesson_save naturally pass the short name ("gitops-assistant", not "/Users/me/work/foo/gitops-assistant") Result: the auto-inject context block at session start ran a project-equality filter that matched zero of the bulk of the operator's data. A 4351-lesson, 52-summary store rendered as a 3-block context payload with no Lessons Learned section at all (just session summaries from the rare records tagged with the full cwd). Two-sided fix: 1. Hooks now resolve project via a shared `src/hooks/_project.ts` helper. Resolution order: - AGENTMEMORY_PROJECT_NAME env var (operator escape hatch, settable per-repo via Claude Code's .claude/settings.json `env` block) - basename(`git rev-parse --show-toplevel`) — handles sessions started in a subdirectory of the repo, survives moving the repo - basename(cwd) — final fallback for non-git directories All eight hooks that POST a project field were updated (session-start, prompt-submit, post-tool-use, post-tool-failure, notification, subagent-start, subagent-stop, task-completed). 2. mem::context tolerates the mixed-vintage corpus that this regression has already produced. The project-match for both lessons and sessions now uses a basename-equality fallback: exact match wins, then basename(stored) === basename(requested). Means existing records tagged with the full cwd continue to surface for callers passing the basename, and vice versa, without requiring re-import. Documented trade-off: same-named repos in different paths now collapse to one logical project at retrieval time (e.g. /work/foo and /personal/foo). Operators who need to keep them separate set AGENTMEMORY_PROJECT_NAME explicitly per repo. Behavior is asserted in test/context-lessons.test.ts to make any future tightening a known break, not an accidental one. Tests: 17/17 in the directly impacted files. Test layout: - test/hook-project.test.ts: AGENTMEMORY_PROJECT_NAME precedence, whitespace handling, git-toplevel basename when inside a repo, cwd basename when not in a repo, subdir-of-repo handling - test/context-lessons.test.ts: full-path-caller matches basename-tagged lesson, basename-caller matches full-path-tagged lesson, same-basename-different-path collapse The remaining 10 failures in the full suite are pre-existing on main (embedding-provider / auto-compress / fetch-timeout env-pollution tests that fail when ~/.agentmemory/.env has API keys set); unrelated to this change. Filed against rohitg00#474.
Summary
Two related UX gaps in the memory layer's reflection surfaces. A consumer that calls
memory_lesson_saveand getssuccess:truereasonably expects to find the lesson viamemory_smart_search("did my save land?") and to see it counted inmemory_diagnose("what's actually in the store?"). Today neither is true:KV.observations— lessons are invisiblememoriescategory counts onlyKV.memories— a store with thousands of lessons / summaries / semantic facts can read as "memories: 0"The trust-shock that prompted this fix: an MCP consumer saved a lesson, got
success:true, then could not surface it throughmemory_smart_search(got onlyobs_*IDs), nor see any sign of it inmemory_diagnose(reportedmemories: 0). Hypothesized the save was disconnected. It was actually persisted — just invisible to the most-prominent retrieval and health surfaces.A: smart-search returns lessons
projectandincludeLessons(defaulttrue) params onmem::smart-searchmem::lesson-recallviasdk.trigger— no duplicate scoring logic; future scoring tweaks inmem::lesson-recallauto-propagatelessonsfield on the response, not merged intoresults. Existing consumers readingresult.resultsare unaffected. New consumers can readresult.lessons(typedCompactLessonResult[])mem::lesson-recalldirectly when needed)lessons, observation results still flowB: diagnose adds per-store tally categories
lessons,summaries,semantic,procedural,crystals,insightsmemoriespattern: count + light consistency check (confidence range for scored memories; non-empty title/narrative/steps for the rest)ALL_CATEGORIESso--categories lessonsfiltering works as expecteddeleted: true) are excluded from the live count and reported separatelyBackward compatibility
mem::smart-search: response shape additive — adds optionallessonsfield. Existing readers ofresultssee no behavior change.mem::diagnose: empty-system pass count goes from 8 → 14 (8 original + 6 new). Existing tests for the existing categories untouched.Test plan
npx vitest run test/smart-search.test.ts test/diagnostics.test.ts— 43 cases passNew cases:
test/smart-search.test.tsincludeLessons=falseopt-out, project filter passthrough, soft-fail on recall error, soft-fail on non-success responsetest/diagnostics.test.tsFiles
src/types.ts— newCompactLessonResultinterfacesrc/functions/smart-search.ts— lesson recall + merge (single-call observation path + expand mode unchanged)src/functions/diagnostics.ts— six new category blocks inserted before the existingmeshblocktest/smart-search.test.ts— 6 new casestest/diagnostics.test.ts— 7 new cases; empty-system pass count bumped 8→14Out of scope
resultsarray (would require normalizing observation vs lesson scoring to a comparable scale — currently the BM25/vector hybrid score and the lesson confidence/recency score live on different bases). Keeping them as a separatelessonsfield avoids forcing that decision and gives the caller full flexibility to render however they want.smart-searchresults. Same scoring-comparability concern, plus most consumers will want to query those tiers explicitly via their dedicated endpoints rather than incidentally in a search call.Summary by CodeRabbit
New Features
Types
Tests