perf: optimize stable row_id index build from O(rows) to O(fragments)#6310
Merged
BubbleCal merged 4 commits intolance-format:mainfrom Mar 27, 2026
Merged
Conversation
Fix two performance bottlenecks in RowIdIndex construction that caused extreme cold start latency when `enable_stable_row_id` is enabled: 1. `decompose_sequence` (lance-table/src/rowids/index.rs): Previously expanded every Range segment element-by-element to check deletions, then re-compressed back. For Range segments with no deletions, this is pure waste — input is Range, output is Range. Add O(1) fast path that constructs index chunks directly. Complexity: O(total_rows) → O(num_fragments). 2. `load_row_id_index` (lance/src/dataset/rowids.rs): Used linear search (.find()) over all fragments for each fragment_id, giving O(N²) total. Also spawned all N futures via try_join_all and called get_deletion_vector() even when no deletion file exists. Fix: HashMap for O(1) lookup, buffer_unordered for controlled concurrency, skip deletion vector load when unnecessary. Benchmarks on real S3 datasets: - 968M rows, 3540 fragments: 18.9s → 150ms (126x faster) - 4.26B rows, 18243 fragments: 220s → 89ms (2471x faster) All existing tests pass. Correctness verified against production data with boundary checks, random sampling, and data consistency validation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract shared build_chunk_from_pairs helper to eliminate duplication between decompose_segment_no_deletions and decompose_segment_with_deletions - Use .unzip() instead of manual map/collect for splitting pairs - Return Option<IndexChunk> instead of Option<Vec<IndexChunk>> to avoid unnecessary single-element Vec allocations - Fix clippy warnings: redundant .into_iter(), len() == 0 -> is_empty() - Remove redundant comments that restate what the code does - Remove OPTIMIZATION.md (content moved to PR description) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
67846a2 to
a42f16c
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BubbleCal
reviewed
Mar 27, 2026
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
BubbleCal
approved these changes
Mar 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When
enable_stable_row_idis enabled, the firsttake_rows()call triggers a fullRowIdIndexbuild. On large datasets this cold start was extremely slow (18.9s for 968M rows, 220s for 4.26B rows).Root Causes
1. O(total_rows) segment expansion in
decompose_sequenceThe original code expanded every
U64Segmentelement-by-element, even forRangesegments with no deletions. For aRange(0..273711)with no deletions, this meant 273K iterations, deletion vector checks, temporary allocations, and re-compression — only to produce the same Range back. Across 18,243 fragments averaging 233K rows, this totaled 4.26 billion iterations with ~32 GB of temporary allocations.2. O(N²) fragment lookup in
load_row_id_indexThe original code called
fragments.iter().find()(O(N) linear search) for each of N fragments, resulting in O(N²) comparisons.try_join_allspawned all N futures at once, overwhelming the async runtime.get_deletion_vector()was called unconditionally even for fragments without deletion files.Solution
Fix 1: O(1) fast path for Range segments without deletions. When a fragment has no deletions and its row_id sequence is a
Range, construct the index chunk directly without iterating.Fix 2: HashMap lookup + conditional deletion vector loading. Use a
HashMap<u32, &FileFragment>for O(1) lookup,buffer_unorderedfor controlled concurrency, and skipget_deletion_vector()when there's no deletion file.Results
Test plan
test_large_range_segments_no_deletionsvalidates fast path correctness at boundaries and performance (100 fragments × 250K rows completes < 1s)🤖 Generated with Claude Code