Skip to content

feat: migrate vector SDKs to segment workflow#6271

Closed
Xuanwo wants to merge 1 commit intoxuanwo/remove-vector-index-stagingfrom
xuanwo/vector-sdk-segment-workflow
Closed

feat: migrate vector SDKs to segment workflow#6271
Xuanwo wants to merge 1 commit intoxuanwo/remove-vector-index-stagingfrom
xuanwo/vector-sdk-segment-workflow

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Mar 24, 2026

This follow-up PR moves the Python and Java vector distributed indexing entrypoints and tests to the segment workflow introduced in #6269.

It keeps the core staging-removal refactor in the base PR and limits this change to SDK-facing APIs, JNI/PyO3 glue, and binding-level tests.

@github-actions github-actions Bot added enhancement New feature or request python java labels Mar 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review

Clean migration of Java and Python vector SDK entrypoints to the segment workflow. The test simplification is a big win — removing the manual UUID/field-id/transaction boilerplate makes the distributed indexing tests much more readable.

Issues

P1: inner_commit_existing_index_segments doesn't reload the dataset after commit

In inner_commit_existing_index_segments (blocking_dataset.rs), after commit_existing_index_segments mutates the dataset, load_indices_by_name is called on the same guard. This seems correct for returning metadata. However, the Java-side test asserts dataset.listIndexes().contains(...) on the original dataset handle — does commit_existing_index_segments update the dataset in-place, or does the caller need to reload? If it doesn't update in-place, the final assertTrue in the tests would be relying on stale state (though it might pass if the Rust side mutates the inner Dataset). Worth confirming this is intentional.

P1: segment_template validation may be too strict or too lax

segment_template validates that all segments share the same name, fields, and dataset_version — but it doesn't validate index_version. If segments can have different index_version values, the index_segment_to_metadata function uses the per-segment value (correct), but it's worth documenting why index_version is intentionally excluded from the template consistency check.

Nit

The Python into_builderbuilder rename is correct (&self doesn't consume ownership), good catch.


🤖 Generated with Claude Code

@Xuanwo Xuanwo closed this Mar 24, 2026
@Xuanwo Xuanwo deleted the xuanwo/vector-sdk-segment-workflow branch March 27, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant