Skip to content

feat(rust): add serialize/deserialize support for CAGRA index#1840

Merged
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom
zbennett10:feat/rust-cagra-serialization
Apr 15, 2026
Merged

feat(rust): add serialize/deserialize support for CAGRA index#1840
rapids-bot[bot] merged 1 commit intorapidsai:mainfrom
zbennett10:feat/rust-cagra-serialization

Conversation

@zbennett10
Copy link
Copy Markdown
Contributor

Closes #1479

Description

Adds serialization and deserialization support to the Rust CAGRA index bindings. The underlying C API functions (cuvsCagraSerialize, cuvsCagraDeserialize, cuvsCagraSerializeToHnswlib) already exist but were not exposed through the Rust crate.

Methods added to cagra::Index:

  • serialize(&self, res, filename, include_dataset) — Save the CAGRA index to file, with option to include or exclude the dataset
  • serialize_to_hnswlib(&self, res, filename) — Save the CAGRA index in hnswlib-compatible format
  • deserialize(res, filename) -> Result<Index> — Load a CAGRA index from file

Design decisions:

  • Uses &self (not self) for serialize methods so the index remains usable after serialization. This differs from the Vamana pattern but is more ergonomic for the common case of serializing a hot index.
  • Uses CString for safe C string interop with the FFI layer.
  • Follows the existing error handling pattern via check_cuvs.

Tests:

  • test_cagra_serialize_deserialize — Round-trip test: build → serialize (with dataset) → deserialize → search → verify nearest neighbors match
  • test_cagra_serialize_without_dataset — Verifies serialization works with include_dataset=false
  • test_cagra_serialize_to_hnswlib — Verifies hnswlib format export produces a non-empty file

Documentation:

  • Added serialization example to cagra module docs in mod.rs

Checklist

  • I am familiar with the Contributing Guidelines
  • New or existing tests cover these changes
  • The documentation is up to date with these changes

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aamijar aamijar added Rust non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Feb 23, 2026
@aamijar aamijar moved this to In Progress in Unstructured Data Processing Feb 23, 2026
@aamijar
Copy link
Copy Markdown
Member

aamijar commented Feb 23, 2026

/ok to test b66d204

@aamijar
Copy link
Copy Markdown
Member

aamijar commented Feb 23, 2026

/ok to test f530475

@aamijar
Copy link
Copy Markdown
Member

aamijar commented Feb 25, 2026

/ok to test 2c5f58b

Ludu-nuvai added a commit to Nuvai/cuvs that referenced this pull request Mar 23, 2026
… indexes

Cherry-picks CAGRA serialization from upstream PR rapidsai#1840 (zbennett10),
adapted for the Index<'a> lifetime API from PR rapidsai#1869.

Adds new Nuvai-authored serialize/deserialize methods for IVF-PQ and
IVF-Flat, following the same pattern (CString filename, FFI call to
cuvsIvfPqSerialize/Deserialize and cuvsIvfFlatSerialize/Deserialize).

Includes round-trip tests for CAGRA and IVF-PQ serialization, plus
hnswlib export test for CAGRA.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread rust/cuvs/src/cagra/index.rs Outdated
Comment thread rust/cuvs/src/cagra/index.rs Outdated
@yan-zaretskiy
Copy link
Copy Markdown
Contributor

Looks good to me otherwise, thank you for your contribution!

Add idiomatic Rust bindings for CAGRA index serialization and
deserialization, wrapping the existing C API functions:

- `Index::serialize()` - Save index to file with optional dataset
- `Index::serialize_to_hnswlib()` - Save in hnswlib-compatible format
- `Index::deserialize()` - Load index from file

All three methods accept `impl AsRef<Path>` for the filename, following
common Rust filesystem-API conventions, and surface any CString conversion
failure as a new `Error::InvalidArgument` variant rather than panicking
via `expect`.

Test helpers (`build_test_index`, `search_and_verify_self_neighbors`)
are factored out so the index-build + self-neighbor search boilerplate
is shared between `test_cagra`, `test_cagra_multiple_searches` and the
three new serialization tests. A negative test asserts that a path
containing an interior NUL byte is rejected with `InvalidArgument`
instead of panicking.

Closes rapidsai#1479

Signed-off-by: Zach Bennett <zach@worldflowai.com>
@zbennett10
Copy link
Copy Markdown
Contributor Author

@yan-zaretskiy this one is ready for another pair of eyes when you're ready - thanks!

/// * `res` - Resources to use
/// * `filename` - The path of the file that stores the index
pub fn deserialize<P: AsRef<Path>>(res: &Resources, filename: P) -> Result<Index> {
let c_filename = path_to_cstring(filename.as_ref())?;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As primarily a C programmer, it always warms my heart to see any mention of C strings. :)

@yan-zaretskiy
Copy link
Copy Markdown
Contributor

/ok to test af67123

@cjnolet
Copy link
Copy Markdown
Member

cjnolet commented Apr 15, 2026

/merge

@rapids-bot rapids-bot bot merged commit 7a08cc7 into rapidsai:main Apr 15, 2026
51 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Unstructured Data Processing Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change Rust

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[FEA] Add support for index serialization/deserialization in rust

4 participants