Skip to content

Vector Index HNSW Graphs Not Persisting to Disk #2915

@tae898

Description

@tae898

Vector Index HNSW Graphs Not Persisting to Disk

Summary

HNSW graphs for vector indexes are never persisted to disk, causing them to be rebuilt from scratch on every database restart. This results in 40-230 seconds of warmup time per index (6+ minutes total for multiple indexes) before vector searches become performant.

Impact

Severity: High
Component: LSMVectorIndex (engine)
Affects: Production deployments with vector search

Performance Impact

  • Cold start: 40-230 seconds per vector index to rebuild HNSW graph
  • Warm performance: <0.01 seconds (as expected)
  • Scales with: Number of vectors (213k vectors = 41s, 817k vectors = 230s)

User Experience

For a database with 4 vector indexes totaling 1.57M vectors:

  • Initial session (creating indexes): Graphs built successfully in ~20 minutes ✓
  • After database restart: 6 minutes warmup required - graphs rebuilt from scratch ✗
  • Expected behavior: Should load persisted graphs in <5 seconds ✗

Root Cause

Two related bugs in LSMVectorIndex.java:

Bug 1: graphFile never closed (line 2131)
The close() method calls mutable.close() but never calls graphFile.close(), preventing graph data from being flushed to disk.

Bug 2: discoverAndLoadGraphFile() fails (lines 489-527)
When loading an existing index, discovery returns null (file not found in FileManager) even when graph files should exist.

Consequence: Database restart → discovery fails → graphFile = null → first search rebuilds graph (40-230s) → persistence skipped (line 1049) → close without flushing → cycle repeats

Observed Behavior

Test dataset: Stack Overflow medium (5.5M records, 1.6M embeddings across 4 indexes, 7GB database)

  • Initial session: Graphs built successfully, vector searches <0.01s
  • After restart: Discovery fails, graphs rebuilt (6 minutes total), then fast again
  • Expected: Should load persisted graphs in <5 seconds

Log Evidence

On database restart (after closing and reopening):

2025-12-11 20:57:26.870 WARNI [LSMVectorIndex] Error discovering graph file for Question_0_562816397325031: 
  Cannot invoke "com.arcadedb.engine.ComponentFile.getFileExtension()" because "file" is null
2025-12-11 20:57:26.870 WARNI [LSMVectorIndex] Error discovering graph file for Answer_0_562819015335584: 
  Cannot invoke "com.arcadedb.engine.ComponentFile.getFileExtension()" because "file" is null
2025-12-11 20:57:26.870 WARNI [LSMVectorIndex] Error discovering graph file for Comment_0_562823873513569: 
  Cannot invoke "com.arcadedb.engine.ComponentFile.getFileExtension()" because "file" is null
2025-12-11 20:57:26.870 WARNI [LSMVectorIndex] Error discovering graph file for User_0_562829620264774: 
  Cannot invoke "com.arcadedb.engine.ComponentFile.getFileExtension()" because "file" is null

On first vector search:

2025-12-11 20:57:37.360 INFO  [LSMVectorIndex] Building graph with 213730 vectors for index: Question_0_562816397325031
2025-12-11 20:58:17.950 INFO  [LSMVectorIndex] JVector graph index built successfully
2025-12-11 20:58:17.952 SEVER [LSMVectorIndex] PERSIST: graphFile is NULL, cannot persist graph for index: Question_0_562816397325031

Suggested Fixes

Fix 1: Add graphFile.close() call at line 2131 after mutable.close()

Fix 2: Debug discovery logic - add logging to show expected vs actual file names in FileManager

Fix 3: Add fallback file system check if FileManager search fails

Workaround

Keep database connections open to maintain in-memory graphs (acceptable for long-running analytics, not suitable for microservices/serverless).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions