-
-
Notifications
You must be signed in to change notification settings - Fork 87
Description
Vector Index HNSW Graphs Not Persisting to Disk
Summary
HNSW graphs for vector indexes are never persisted to disk, causing them to be rebuilt from scratch on every database restart. This results in 40-230 seconds of warmup time per index (6+ minutes total for multiple indexes) before vector searches become performant.
Impact
Severity: High
Component: LSMVectorIndex (engine)
Affects: Production deployments with vector search
Performance Impact
- Cold start: 40-230 seconds per vector index to rebuild HNSW graph
- Warm performance: <0.01 seconds (as expected)
- Scales with: Number of vectors (213k vectors = 41s, 817k vectors = 230s)
User Experience
For a database with 4 vector indexes totaling 1.57M vectors:
- Initial session (creating indexes): Graphs built successfully in ~20 minutes ✓
- After database restart: 6 minutes warmup required - graphs rebuilt from scratch ✗
- Expected behavior: Should load persisted graphs in <5 seconds ✗
Root Cause
Two related bugs in LSMVectorIndex.java:
Bug 1: graphFile never closed (line 2131)
The close() method calls mutable.close() but never calls graphFile.close(), preventing graph data from being flushed to disk.
Bug 2: discoverAndLoadGraphFile() fails (lines 489-527)
When loading an existing index, discovery returns null (file not found in FileManager) even when graph files should exist.
Consequence: Database restart → discovery fails → graphFile = null → first search rebuilds graph (40-230s) → persistence skipped (line 1049) → close without flushing → cycle repeats
Observed Behavior
Test dataset: Stack Overflow medium (5.5M records, 1.6M embeddings across 4 indexes, 7GB database)
- Initial session: Graphs built successfully, vector searches <0.01s
- After restart: Discovery fails, graphs rebuilt (6 minutes total), then fast again
- Expected: Should load persisted graphs in <5 seconds
Log Evidence
On database restart (after closing and reopening):
2025-12-11 20:57:26.870 WARNI [LSMVectorIndex] Error discovering graph file for Question_0_562816397325031:
Cannot invoke "com.arcadedb.engine.ComponentFile.getFileExtension()" because "file" is null
2025-12-11 20:57:26.870 WARNI [LSMVectorIndex] Error discovering graph file for Answer_0_562819015335584:
Cannot invoke "com.arcadedb.engine.ComponentFile.getFileExtension()" because "file" is null
2025-12-11 20:57:26.870 WARNI [LSMVectorIndex] Error discovering graph file for Comment_0_562823873513569:
Cannot invoke "com.arcadedb.engine.ComponentFile.getFileExtension()" because "file" is null
2025-12-11 20:57:26.870 WARNI [LSMVectorIndex] Error discovering graph file for User_0_562829620264774:
Cannot invoke "com.arcadedb.engine.ComponentFile.getFileExtension()" because "file" is null
On first vector search:
2025-12-11 20:57:37.360 INFO [LSMVectorIndex] Building graph with 213730 vectors for index: Question_0_562816397325031
2025-12-11 20:58:17.950 INFO [LSMVectorIndex] JVector graph index built successfully
2025-12-11 20:58:17.952 SEVER [LSMVectorIndex] PERSIST: graphFile is NULL, cannot persist graph for index: Question_0_562816397325031
Suggested Fixes
Fix 1: Add graphFile.close() call at line 2131 after mutable.close()
Fix 2: Debug discovery logic - add logging to show expected vs actual file names in FileManager
Fix 3: Add fallback file system check if FileManager search fails
Workaround
Keep database connections open to maintain in-memory graphs (acceptable for long-running analytics, not suitable for microservices/serverless).
- Constructor (load existing): lines 305-320
- close(): lines 2100-2137
- discoverAndLoadGraphFile(): lines 489-527
- buildGraphFromScratch(): lines 950-1060