feat(rust): add serialize/deserialize support for CAGRA index#1840
Merged
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom Apr 15, 2026
Merged
Conversation
Member
|
/ok to test b66d204 |
Member
|
/ok to test f530475 |
Member
|
/ok to test 2c5f58b |
Ludu-nuvai
added a commit
to Nuvai/cuvs
that referenced
this pull request
Mar 23, 2026
… indexes Cherry-picks CAGRA serialization from upstream PR rapidsai#1840 (zbennett10), adapted for the Index<'a> lifetime API from PR rapidsai#1869. Adds new Nuvai-authored serialize/deserialize methods for IVF-PQ and IVF-Flat, following the same pattern (CString filename, FFI call to cuvsIvfPqSerialize/Deserialize and cuvsIvfFlatSerialize/Deserialize). Includes round-trip tests for CAGRA and IVF-PQ serialization, plus hnswlib export test for CAGRA. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7 tasks
Contributor
|
Looks good to me otherwise, thank you for your contribution! |
Add idiomatic Rust bindings for CAGRA index serialization and deserialization, wrapping the existing C API functions: - `Index::serialize()` - Save index to file with optional dataset - `Index::serialize_to_hnswlib()` - Save in hnswlib-compatible format - `Index::deserialize()` - Load index from file All three methods accept `impl AsRef<Path>` for the filename, following common Rust filesystem-API conventions, and surface any CString conversion failure as a new `Error::InvalidArgument` variant rather than panicking via `expect`. Test helpers (`build_test_index`, `search_and_verify_self_neighbors`) are factored out so the index-build + self-neighbor search boilerplate is shared between `test_cagra`, `test_cagra_multiple_searches` and the three new serialization tests. A negative test asserts that a path containing an interior NUL byte is rejected with `InvalidArgument` instead of panicking. Closes rapidsai#1479 Signed-off-by: Zach Bennett <zach@worldflowai.com>
2c5f58b to
af67123
Compare
Contributor
Author
|
@yan-zaretskiy this one is ready for another pair of eyes when you're ready - thanks! |
zbennett10
commented
Apr 14, 2026
| /// * `res` - Resources to use | ||
| /// * `filename` - The path of the file that stores the index | ||
| pub fn deserialize<P: AsRef<Path>>(res: &Resources, filename: P) -> Result<Index> { | ||
| let c_filename = path_to_cstring(filename.as_ref())?; |
Contributor
Author
There was a problem hiding this comment.
As primarily a C programmer, it always warms my heart to see any mention of C strings. :)
yan-zaretskiy
approved these changes
Apr 14, 2026
Contributor
|
/ok to test af67123 |
cjnolet
approved these changes
Apr 15, 2026
Member
|
/merge |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1479
Description
Adds serialization and deserialization support to the Rust CAGRA index bindings. The underlying C API functions (
cuvsCagraSerialize,cuvsCagraDeserialize,cuvsCagraSerializeToHnswlib) already exist but were not exposed through the Rust crate.Methods added to
cagra::Index:serialize(&self, res, filename, include_dataset)— Save the CAGRA index to file, with option to include or exclude the datasetserialize_to_hnswlib(&self, res, filename)— Save the CAGRA index in hnswlib-compatible formatdeserialize(res, filename) -> Result<Index>— Load a CAGRA index from fileDesign decisions:
&self(notself) for serialize methods so the index remains usable after serialization. This differs from the Vamana pattern but is more ergonomic for the common case of serializing a hot index.CStringfor safe C string interop with the FFI layer.check_cuvs.Tests:
test_cagra_serialize_deserialize— Round-trip test: build → serialize (with dataset) → deserialize → search → verify nearest neighbors matchtest_cagra_serialize_without_dataset— Verifies serialization works withinclude_dataset=falsetest_cagra_serialize_to_hnswlib— Verifies hnswlib format export produces a non-empty fileDocumentation:
cagramodule docs inmod.rsChecklist