feat(smart-search): boost title/narrative matches on 'who/what is X' queries#571
feat(smart-search): boost title/narrative matches on 'who/what is X' queries#571efenex wants to merge 2 commits into
Conversation
For "who is X" / "what is X" / "what does X mean" queries, BM25 ranks busier observations above the records that actually name X — the question-scaffolding tokens add noise that dilutes the true match signal. Pre-existing regression test: docs/plans/v4-lineage-test-case- careful-generator.md (Gap exposed there, but the fix lives in smart- search rather than lineage since smart-search is the lessons-first ranker used by the recall paths). Approach: detect the question pattern at handler entry, extract the concept phrase, deepen the BM25 sweep to limit*3 so the boost has candidates, then post-multiply combinedScore by 2.0 for title matches and 1.3 for narrative matches, re-sort, trim to limit. Lessons whose content names the concept get the same 2.0 title-boost. Single-token / 6+ token phrases are skipped (degenerate). Original ordering is preserved on non-named-concept queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds extraction of named concepts from "who is / what is / what does X mean" queries and uses the concept to expand observation fetches and apply multiplicative boosts when the concept appears in observation titles/narratives or lesson content, then re-sorts and trims results. ChangesNamed-Concept Query Detection and Ranking Boost
Sequence DiagramsequenceDiagram
participant Query
participant extractNamedConcept
participant hybridSearch
participant lessonRecall
participant boostProcessor
participant returnSorted
Query->>extractNamedConcept: parse query -> concept or null
extractNamedConcept-->>Query: concept|null
Query->>hybridSearch: run observation search (expanded limit if concept)
Query->>lessonRecall: run lesson recall (pass boostPhrase)
hybridSearch-->>boostProcessor: observations with combinedScore
lessonRecall-->>boostProcessor: lessons with boostMatched flag
boostProcessor->>boostProcessor: multiply observation combinedScore for title/body matches
boostProcessor->>boostProcessor: multiply lesson score when boostMatched or content includes concept
boostProcessor->>returnSorted: re-sort and truncate to limit
returnSorted-->>Query: final observations and lessons
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/smart-search.test.ts (1)
331-335: ⚡ Quick winTighten this assertion so dual-match regressions actually fail.
obsNamedalready contains"careful generator"in bothtitleandnarrative, but the test only assertsscore > 1.0. That still passes with a single applied boost, so it won't catch the bug in the new re-ranker. Either remove the narrative match from the fixture for a pure title-only case, or assert the full expected multiplier for a dual-match case.Also applies to: 387-389
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/smart-search.test.ts` around lines 331 - 335, The test fixture obsNamed created via makeObs currently contains "careful generator" in both title and narrative, which makes the weak assertion (score > 1.0) insufficient; either remove the phrase from the narrative so the fixture is a title-only match and keep the simple assertion, or tighten the assertion to check the full expected boosted score for a dual-match (compute and assert the exact expected multiplier/threshold instead of >1.0). Update the corresponding duplicate assertions mentioned (around the second occurrence at lines 387-389) to use the same fix and reference obsNamed/makeObs when locating the fixture and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/functions/smart-search.ts`:
- Around line 151-156: The current boost logic uses the truncated preview in
rawLessons, so named-concept matching misses occurrences beyond the 240-char
cutoff; update the scoring to operate on the full lesson text before any preview
truncation by either (A) running this phrase includes check against the
untruncated field returned by recallLessons (e.g., use the original full content
property such as fullContent or contentFull instead of the previewed content) or
(B) change recallLessons to preserve a fullContent field on each lesson and use
that field in the map that adjusts score (referencing rawLessons, lessons,
phrase, and NAMED_CONCEPT_TITLE_BOOST). Ensure the boost is applied using the
full text and only truncate for presentation after ranking is complete.
- Around line 143-145: The current logic in smart-search that sets mult using an
if/else if (checking title.includes(phrase) then else if
narrative.includes(phrase)) prevents applying both NAMED_CONCEPT_TITLE_BOOST and
NAMED_CONCEPT_BODY_BOOST when both title and narrative match; change it to
compute the multiplier by starting mult = 1 and multiplying by
NAMED_CONCEPT_TITLE_BOOST if title.includes(phrase) and by
NAMED_CONCEPT_BODY_BOOST if narrative.includes(phrase), then return r unchanged
when mult === 1 else return { ...r, combinedScore: r.combinedScore * mult } so
dual matches get the product of both boosts (use the existing symbols title,
narrative, phrase, mult, NAMED_CONCEPT_TITLE_BOOST, NAMED_CONCEPT_BODY_BOOST, r,
combinedScore).
---
Nitpick comments:
In `@test/smart-search.test.ts`:
- Around line 331-335: The test fixture obsNamed created via makeObs currently
contains "careful generator" in both title and narrative, which makes the weak
assertion (score > 1.0) insufficient; either remove the phrase from the
narrative so the fixture is a title-only match and keep the simple assertion, or
tighten the assertion to check the full expected boosted score for a dual-match
(compute and assert the exact expected multiplier/threshold instead of >1.0).
Update the corresponding duplicate assertions mentioned (around the second
occurrence at lines 387-389) to use the same fix and reference obsNamed/makeObs
when locating the fixture and assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c24364d3-8993-4417-a12c-9c0c02cd7c30
📒 Files selected for processing (2)
src/functions/smart-search.tstest/smart-search.test.ts
…l content CodeRabbit caught two issues on rohitg00#571: 1. The boost branch used `if (title) ... else if (narrative) ...`, capping observations that contain the concept in BOTH fields at the title-only 2.0× multiplier. The feature is specified as multiplicative — title-and-narrative matches now compound to 2.0 × 1.3 = 2.6×. Single-field matches behave as before. 2. The lesson boost path was scanning the 240-char preview emitted by recallLessons, not the lesson's full pre-truncation content. Any concept that appeared past the preview boundary silently missed the boost. Fix: thread the concept phrase into recallLessons via a new `boostPhrase` parameter. The function now decides match against `content + context` BEFORE truncation, stamps each result with `boostMatched: boolean`, and the smart-search caller uses that flag instead of re-scanning the preview. `boostMatched` added as an optional field on CompactLessonResult. Callers that don't pass `boostPhrase` get `boostMatched: false` — the smart-search caller falls back to scanning the (truncated) content for the phrase if `boostMatched` is absent, preserving the pre-fix behavior for any non-smart-search caller of recallLessons. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
For named-concept queries ("who is the careful generator?", "what is a circuit breaker", "what does eventual consistency mean?"), the BM25 hybrid ranker scores busier observations above records that name the concept directly — question scaffolding tokens ("who", "is", "the") add noise that dilutes the true match signal. The record that defines the concept ranks below records that mention it incidentally.
What it does
Non-named-concept queries are untouched.
Why this lives in smart-search and not lineage
`mem::lineage` is chronologically-ordered and multi-channel; this is a ranking concern that affects the primary recall path (smart-search), which is what `memory_recall` / `memory_smart_search` MCP tools land on. Lineage benefits from upstream improvements in BM25 score, so this lift propagates.
Test plan
Related
🤖 Generated with Claude Code
Summary by CodeRabbit