-
-
Notifications
You must be signed in to change notification settings - Fork 86
#2915 fix: ensure Jvector HNSW graph file is closed and flushed to disk on database close #2916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @robfrank, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves critical issues related to the persistence and discovery of HNSW graph files within the LSMVectorIndex implementation. By ensuring that graph data is correctly flushed to disk upon database closure and can be reliably reloaded, it significantly enhances the robustness and data integrity of the vector indexing subsystem. The changes also include improved logging for easier debugging of graph file discovery and comprehensive new tests to validate these fixes. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a critical issue by ensuring the JVector HNSW graph file is properly closed and flushed to disk, preventing data loss on database shutdown. The changes also improve the discovery of graph files and add comprehensive tests to verify the fixes. The implementation is solid, and the tests are thorough. I've provided a couple of suggestions to enhance logging and modernize a test utility method for better maintainability.
| private void deleteDirectory(java.io.File directory) { | ||
| if (directory.exists()) { | ||
| final java.io.File[] files = directory.listFiles(); | ||
| if (files != null) { | ||
| for (final java.io.File file : files) { | ||
| if (file.isDirectory()) { | ||
| deleteDirectory(file); | ||
| } else { | ||
| file.delete(); | ||
| } | ||
| } | ||
| } | ||
| directory.delete(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper method for recursively deleting a directory can be simplified and made more robust by using the modern java.nio.file.Files.walk API. This approach avoids manual recursion and is generally preferred for traversing file trees.
private void deleteDirectory(java.io.File directory) {
if (directory.exists()) {
try (java.util.stream.Stream<java.nio.file.Path> walk = java.nio.file.Files.walk(directory.toPath())) {
walk.sorted(java.util.Comparator.reverseOrder())
.map(java.nio.file.Path::toFile)
.forEach(java.io.File::delete);
} catch (java.io.IOException e) {
System.err.println("Error deleting directory " + directory.getAbsolutePath() + ": " + e.getMessage());
}
}
}
🧪 CI InsightsHere's what we observed from your CI run for ff6ea24. 🟢 All jobs passed!But CI Insights is watching 👀 |
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesFootnotes
|
Improvements to code quality and maintainability: * Enhanced exception logging: Include full stack trace when closing graph file for better debugging. Changed from logging only the error message to passing the exception object to LogManager for complete context. * Refactored deleteDirectory() helper: Replaced manual recursive directory traversal with modern java.nio.file.Files.walk() API. This approach is more robust, efficient, and follows Java best practices for file tree operations. - Uses try-with-resources for proper resource management - Sorts in reverse order to delete files before directories - Provides better exception handling with IOException All existing tests continue to pass (22/22). Addresses review comments from PR #2916: - #2916 (comment) - #2916 (comment)
…sk on database close
Improvements to code quality and maintainability: * Enhanced exception logging: Include full stack trace when closing graph file for better debugging. Changed from logging only the error message to passing the exception object to LogManager for complete context. * Refactored deleteDirectory() helper: Replaced manual recursive directory traversal with modern java.nio.file.Files.walk() API. This approach is more robust, efficient, and follows Java best practices for file tree operations. - Uses try-with-resources for proper resource management - Sorts in reverse order to delete files before directories - Provides better exception handling with IOException All existing tests continue to pass (22/22). Addresses review comments from PR #2916: - #2916 (comment) - #2916 (comment)
…sk on database close
ec05c5b to
ff6ea24
Compare
This pull request addresses critical issues with the persistence and discovery of HNSW graph files in the
LSMVectorIndeximplementation, ensuring that graph data is properly flushed to disk and can be reliably recovered after database restarts. It also adds comprehensive tests to verify these behaviors and improve the robustness of the vector index subsystem.Persistence and resource management improvements:
graphFile) is properly closed and flushed to disk when theLSMVectorIndexis closed, preventing data loss and resource leaks. Error handling was added to log any exceptions during the close operation.Logging and debugging enhancements:
discoverAndLoadGraphFile(), making it easier to trace file lookup issues and understand index initialization behavior. [1] [2]Testing and verification:
LSMVectorIndexTest.javato verify:graphFile.close()was not previously called.