feat: cleanup partial idx files when merging distributed vector index#5729
Conversation
|
Code Review - Summary: This PR adds cleanup of temporary partial_* directories after distributed vector index merge operations. P1 Issues - 1. Missing Test Coverage: The new cleanup_partial_vector_dirs function lacks test coverage. This is a file deletion operation that affects the index directory structure. Consider adding a test that creates a mock index directory with partial_* subdirectories, calls finalize_distributed_merge, and verifies the partial directories are removed while other files remain. 2. Potential Race Condition: The cleanup happens after v2_writer.finish().await but before the function returns. If a concurrent process is still reading from partial_* directories (e.g., during centroid extraction at lines 1917-1957), deletion could cause read failures. Consider whether the centroids extraction should happen before cleanup or if there is coordination needed. Minor Observations - The function docstring says it always returns Ok but the signature returns Result. The outer caller ignores errors anyway via if let Err, so this is cosmetic but could be clarified. Overall the approach is reasonable - cleanup failures are logged but do not block index finalization, which is the right tradeoff. |
No description provided.