test: fix flaky distributed vector build results test#6268
test: fix flaky distributed vector build results test#6268Xuanwo merged 1 commit intolance-format:mainfrom
Conversation
ReviewLGTM. Clean one-line fix for test flakiness. Setting No issues found. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Hey @Xuanwo @wjones127 , could you take a look this PR when you have a chance? This makes the distributed vector build test probe all IVF partitions before comparing Top-K row ids, which should stabilize the test and unblock CI. |
|
@Xuanwo Thanks for the review — appreciate it!🙌 |
## Summary This fixes a flaky regression test added in #6220 (`b80fbb3231cf58dd50e5670f9c56d309999bbd73`). The affected test is: - `test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results` Recent failures showed up in both of these runs: - https://github.com/lance-format/lance/actions/runs/23450834811 `main` at `244c721504c6ef0b4c2f9700a342509976898d6e` - https://github.com/lance-format/lance/actions/runs/23460892697 #6263 In those failures, different platforms / index variants failed: - `linux-arm / case_1_ivf_flat` on `main` - `linux-build / case_2_ivf_pq` on #6263 That points to an existing flaky test. ## Root Cause The test compared the exact Top-K `_rowid` results between: - a single-segment index build, and - a distributed multi-segment index build However, the query path used by the test is ANN (`ANNIVFPartition`) under the default probing behavior. With partial probing, the candidate set can differ slightly between single-segment and multi-segment layouts, especially near the tail of Top-K. That makes exact `_rowid` equality too strict for this test and causes intermittent failures. ## Fix Make the test probe all IVF partitions before comparing Top-K row ids: - add `.minimum_nprobes(TWO_FRAG_NUM_PARTITIONS)` to the test query This keeps the existing strong assertion (`ids_single == ids_split`) but removes the probing-related source of nondeterminism. ## Testing Local verification: - `export PROTOC=/opt/homebrew/opt/protobuf@3/bin/protoc` - `cargo test -p lance test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results -- --nocapture` Observed result: - `case_1_ivf_flat ... ok` - `case_2_ivf_pq ... ok` - `case_3_ivf_sq ... ok` I also verified during debugging that with full probing enabled, repeated runs of the previously flaky `ivf_flat` / `ivf_pq` cases became stable.
) ## Summary This fixes a flaky regression test added in lance-format#6220 (`b80fbb3231cf58dd50e5670f9c56d309999bbd73`). The affected test is: - `test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results` Recent failures showed up in both of these runs: - https://github.com/lance-format/lance/actions/runs/23450834811 `main` at `244c721504c6ef0b4c2f9700a342509976898d6e` - https://github.com/lance-format/lance/actions/runs/23460892697 lance-format#6263 In those failures, different platforms / index variants failed: - `linux-arm / case_1_ivf_flat` on `main` - `linux-build / case_2_ivf_pq` on lance-format#6263 That points to an existing flaky test. ## Root Cause The test compared the exact Top-K `_rowid` results between: - a single-segment index build, and - a distributed multi-segment index build However, the query path used by the test is ANN (`ANNIVFPartition`) under the default probing behavior. With partial probing, the candidate set can differ slightly between single-segment and multi-segment layouts, especially near the tail of Top-K. That makes exact `_rowid` equality too strict for this test and causes intermittent failures. ## Fix Make the test probe all IVF partitions before comparing Top-K row ids: - add `.minimum_nprobes(TWO_FRAG_NUM_PARTITIONS)` to the test query This keeps the existing strong assertion (`ids_single == ids_split`) but removes the probing-related source of nondeterminism. ## Testing Local verification: - `export PROTOC=/opt/homebrew/opt/protobuf@3/bin/protoc` - `cargo test -p lance test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results -- --nocapture` Observed result: - `case_1_ivf_flat ... ok` - `case_2_ivf_pq ... ok` - `case_3_ivf_sq ... ok` I also verified during debugging that with full probing enabled, repeated runs of the previously flaky `ivf_flat` / `ivf_pq` cases became stable.
Summary
This fixes a flaky regression test added in #6220 (
b80fbb3231cf58dd50e5670f9c56d309999bbd73).The affected test is:
test_distributed_vector_build_commits_multiple_segments_and_preserves_query_resultsRecent failures showed up in both of these runs:
mainat244c721504c6ef0b4c2f9700a342509976898d6efix: preserve row last-updated metadata for rewrite-columns updates #6263
In those failures, different platforms / index variants failed:
linux-arm / case_1_ivf_flatonmainlinux-build / case_2_ivf_pqon fix: preserve row last-updated metadata for rewrite-columns updates #6263That points to an existing flaky test.
Root Cause
The test compared the exact Top-K
_rowidresults between:However, the query path used by the test is ANN (
ANNIVFPartition) under the default probing behavior. With partial probing, the candidate set can differ slightly between single-segment and multi-segment layouts, especially near the tail of Top-K. That makes exact_rowidequality too strict for this test and causes intermittent failures.Fix
Make the test probe all IVF partitions before comparing Top-K row ids:
.minimum_nprobes(TWO_FRAG_NUM_PARTITIONS)to the test queryThis keeps the existing strong assertion (
ids_single == ids_split) but removes the probing-related source of nondeterminism.Testing
Local verification:
export PROTOC=/opt/homebrew/opt/protobuf@3/bin/protoccargo test -p lance test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results -- --nocaptureObserved result:
case_1_ivf_flat ... okcase_2_ivf_pq ... okcase_3_ivf_sq ... okI also verified during debugging that with full probing enabled, repeated runs of the previously flaky
ivf_flat/ivf_pqcases became stable.