fix(retrieval): replace window function with per-fact_type HNSW queries#540
Open
fabioscarsi wants to merge 2 commits intovectorize-io:mainfrom
Open
fix(retrieval): replace window function with per-fact_type HNSW queries#540fabioscarsi wants to merge 2 commits intovectorize-io:mainfrom
fabioscarsi wants to merge 2 commits intovectorize-io:mainfrom
Conversation
Replace ROW_NUMBER() OVER (PARTITION BY fact_type ...) with separate per-fact_type queries using ORDER BY embedding <=> $1 LIMIT n, enabling HNSW index scans instead of sequential scans. Key changes: - Per-fact_type semantic queries with HNSW-friendly ORDER BY ... LIMIT - Parallel execution via asyncio.gather() when pool is available - ef_search=200 and 5x overfetch for approximate recall compensation - New Alembic migration creates partial HNSW indexes per fact_type Reduces buffer reads from ~450MB to ~50MB per recall on 170K+ deployments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #539
fix(retrieval): replace window function with per-fact_type HNSW queries
The problem
retrieve_semantic_bm25_combined()uses a window function:This pattern prevents pgvector from using the HNSW index and forces a full sequential scan of all vectors. On databases with 100K+ memory_units, every recall scans the entire table: hundreds of MB of buffers per query (451 MB observed on our deployment).
The impact is not limited to specific configurations:
Technical cause
pgvector can only use the HNSW index when the query has the form:
The presence of
PARTITION BYin the window function forces the planner to execute a sequential scan to then sort and partition the results.A global HNSW index with post-filtering by
fact_typedoes not work: minority classes (e.g.,experiencewith ~3K nodes) receive near-zero results because the index returns the nearest nodes regardless of fact_type, and the WHERE filter discards them.Solution
The core change replaces the full-table vector scan with targeted HNSW index lookups, then applies the existing RRF fusion and graph retrieval pipeline unchanged.
Separate queries per
fact_typewithORDER BY embedding <=> $1 LIMIT n, which enables the HNSW index scan for each query.Key changes:
asyncio.gather()using separate pool connections, reducing total semantic retrieval time to the slowest single fact_type queryNote on
SET hnsw.ef_search: we useSET+RESETinstead ofSET LOCALbecause asyncpg in autocommit mode ignores transaction-local settings.Alignment with project design principles
Hindsight's recall architecture uses parallel multi-axis retrieval (semantic, BM25, graph, temporal) fused via RRF. This patch extends the same principle to the semantic axis itself: instead of one monolithic embedding scan across all fact_types, we run parallel per-fact_type HNSW traversals.
This is the same pattern already used in
_find_semantic_seeds(), which leveragesORDER BY embedding <=> $1 LIMIT nfor HNSW-accelerated retrieval. The patch applies existing patterns consistently to the main retrieval path — it does not introduce new architectural concepts.Prerequisites (migration note)
This PR includes an Alembic migration (
a3b4c5d6e7f8_add_partial_hnsw_indexes.py) that auto-creates the required partial HNSW indexes on upgrade — no manual intervention required. The migration runs automatically at startup and is idempotent: if the indexes already exist, the operation is a no-op.For reference, the indexes created are:
Note: The migration uses
CREATE INDEX IF NOT EXISTS(notCONCURRENTLY) because Alembic migrations run inside a transaction. For large existing deployments, operators may prefer to create the indexes manually withCONCURRENTLYbefore upgrading, to avoid blocking writes during index build.Validation data
Quality check (overlap with pre-patch results)
The patch has been running in production for 48h on a deployment with ~170K memory_units across two banks with no quality regressions observed. Pre-deploy formal validation:
The 95% minimum overlap is on
experience(3.3K nodes, sparsest HNSW graph).EXPLAIN ANALYZE post-deployment
Pre-patch: global index, 451 MB buffers, 1029 ms, sequential scan forced by window function.
Note on observation (429 ms): the value reflects a cold cache at measurement time (1,609 pages read from disk). The relevant data for comparison is the buffer count: 52 MB vs 451 MB pre-patch. With warm cache the execution time drops proportionally.
Real-world benchmark (bank with ~170K memory_units)
Under consolidation load the improvement is even more marked because the I/O cascade is eliminated.
Risks and rollback
poolis an optional parameter with sequential fallbackChanged files
hindsight-api/hindsight_api/engine/search/retrieval.py— per-fact_type HNSW queries inretrieve_semantic_bm25_combined()hindsight-api/hindsight_api/alembic/versions/a3b4c5d6e7f8_add_partial_hnsw_indexes.py— migration to create partial indexesNotes
retrieval.pyis identical between the two versions (empty diff), zero conflicts.