fix(retrieval): rank MENTIONED_IN chunks by cosine in MultiPath Path C#259
Conversation
Path C in retrieve_chunks used `COLLECT(c)[..3]` with no ORDER BY, so hub entities (which can be MENTIONED_IN hundreds of chunks) returned an arbitrary 3 — almost never including the chunks most relevant to the current query. Add an ORDER BY on `vec.cosineDistance(c.embedding, query_vector)` before the COLLECT so per-entity chunk selection is query-aware. Refs #258 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR updates the ChangesMENTIONED_IN Chunk Ranking by Cosine Distance
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
retrieve_chunksPath C usesCOLLECT(c)[..3]with noORDER BY. For hub entities — the main product name, generic verbs like "install", high-frequency Cypher keywords — this picks an arbitrary 3 out of hundreds, almost never the chunks relevant to the current query.This PR adds an
ORDER BY vec.cosineDistance(c.embedding, query_vector)before theCOLLECTso per-entity chunk selection is query-aware. Closes the retrieval half of #258.Repro / Evidence
Tested against the FalkorDB docs knowledge graph (1,356 chunks, 2,945 entities). The
FalkorDBentity has 486 chunks linked viaMENTIONED_IN. Six of those are install-relevant (chunks 0, 1, 2, 3, 6, 7 ofgetting-started/index.md).P(an unordered 3 of 486 contains an install chunk) ≈ 4% per slot → ~12% chance overall. Measured behavior matches: standalone
How do I install FalkorDB?reliably failed.Before (arbitrary 3):
After (cosine-ranked 3):
Same query, same graph, same retrieval strategy — only
chunk_retrieval.py:Path Cchanged.The change
query_vectoris already passed intoretrieve_chunks— used by Path B (raw vector search) — so it's a parameter-add, not a new signature. Chunks without stored embeddings are excluded (rare;c.embedding IS NOT NULLguard).Tests
Added
test_mentioned_in_ranks_chunks_by_cosinetotests/test_multi_path_retrieval.py, asserting the emitted Cypher containsvec.cosineDistance,ORDER BY, and passes the query vector as a parameter. Full retrieval test suite passes (161/161, 1 skip).Test plan
pytest tests/test_multi_path_retrieval.py— 45/45)pytest -k 'retrieval or multi_path or chunk'— 161/161 + 1 skip)🤖 Generated with Claude Code
Summary by CodeRabbit
Improvements
Tests