Skip to content

emilk/fix write starvation#12

Closed
andrea-reale wants to merge 699 commits intorerunfrom
emilk/fix-write-starvation
Closed

emilk/fix write starvation#12
andrea-reale wants to merge 699 commits intorerunfrom
emilk/fix-write-starvation

Conversation

@andrea-reale
Copy link
Copy Markdown
Member

jackye1995 and others added 30 commits February 9, 2026 16:16
The files are generated with `make licenses`, currently expected to run
manually. In the future, some automations could be built.
lance-format#5867)

closes lance-format#5682

changes:
- Treat element-level NULLs in LABEL_LIST as non-matches so
array_has_any/array_has_all return TRUE/FALSE when the list itself is
non-NULL.
- Allow nullable list literals in `LabelListQuery::to_expr` to prevent
`explain_plan()` panics.
- Add Python tests covering element-level NULLs, list-level NULLs,
NULL-literal filters and explain behavior.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Introduce 2 CodeX workflows that could be commonly used:
1. patch a merged PR to a specific release branch
2. fix a CI workflow that is currently breaking main branch
`array_has_any/all(labels, [])` yields an empty label-list query;
LabelList index merge (`set_union` / `set_intersection`) `unwraps` the
first element and panics on an empty iterator.

Skip LABEL_LIST index parsing for empty lists to avoid the panic and
fall back to normal execution.
…ance-format#5913)

Fix two copy-paste typos in `OrderableScalarValue::cmp` panic messages
During IVF shuffle, we have a FileWriter per partition and each
accumulates page metadata in memory over the course of the shuffle. With
large datasets and large numbers of partitions, this memory grows over
time to dominate the memory cost of IVF shuffle.

This patch adds optional functionality to the FileWriter that serializes
page metadata to a spill file and enables it by default in the IVF
shuffler.
This removes a buffer in the shuffler that accumulated batches for
batched writes to temporary storage. This was configured with a public
buffer_size parameter hence the breaking change.

Previously, when we shuffled data we accumulated this many batches for
each partition in memory and then flushed them all to disk at once. This
may have been intended as an optimization in the original implementation
of the shuffler, which supported external shuffling through arbitrary
object storage. However, the shuffler was subsequently hardcoded to use
local disk (where this kind of buffering serves no benefit) and even on
remote object storage, we already have a layer of buffering in the
storage writer.

Instead of buffering batches, just write them directly to the
FileWriter. This results in much more predictable memory usage and also
faster index builds.
There are various ways to write data and several of them are failing for
JSON data at the moment because it requires a conversion from logical to
physical representation. I'd like to rework this more generally but I
want to get the current implementation working first. This fixes one of
the merge_insert paths which is currently failing because the field
metadata is lost as part of the operation (and so the data appears to be
a normal string column and is not converted)
This PR also adds support for `mtune=apple-a13` when building
lance-linalg for iOS, which is the earliest supported iOS architecture
at time of writing.

This is pulled out of lance-format#5866
after the initial bug was fixed in
lance-format#5747 and lancedb updated to
Lance 2.x
)

This adds support for progress reporting on index builds, allowing
callers to determine whether a build is progressing, what stage it is
in, and in some cases how far into the stage.

Breaking: updates some index builder function signatures to include a
progress callback implementation. A noop implementation is included in
the patch.
This will ensure JSON columns are properly converted in paths like
add_columns
Adding python api to support using fts as a filter for vector search, or
using vector_query as a filter for fts search. Related to
lance-format#4928.
closes lance-format#5895

`Allow|Block OR` previously assumed `NULL` rows were always included in
the `block.selected` set; when they were not, `NULL`s could be dropped
and `FALSE`/`NULL` were mixed, leading to incorrect results.

This change computes `TRUE/FALSE/NULL` explicitly for `Allow|Block OR`
and derives `NULL/selected` sets from those.
This allows Java to also pass in a Session shared across datasets
similar to python and rust. Session can then be used for engine side
caching implementation in Spark and Trino
…subproject list (lance-format#5847)

See lance-format#5848 for more
background and vote

---------

Co-authored-by: Weston Pace <weston.pace@gmail.com>
Link: https://github.com/lance-format/lance/actions/runs/21917803271

Summary of failure:
- build-no-lock failed because rustc 1.90.0 is too old for updated
aws-smithy dependencies (requires 1.91).

Fixes applied:
- Bumped the pinned Rust toolchain to 1.91.0 to satisfy dependency MSRV
in no-lock builds.

---------

Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
We lack docs for how to create and query an FTS index in Lance. This PR
adds detailed docs, based on the latest v1.0.4 `pylance` API. All code
snippets have been tested with the latest stable release and I gave it
another pass with GPT 5.2 to ensure accuracy per the latest Rust code in
[tokenizer.rs](https://github.com/lance-format/lance/blob/f77769778aa93f47572187b83ccd7b6638dc39a3/rust/lance-index/src/scalar/inverted/tokenizer.rs#L180).
1. allow backporting multiple PRs in the same run
2. use a more meaningful PR title for CI fixes
…5908)

## Summary

- Expose `enable_stable_row_ids` parameter in `LanceDataset.commit()`
and `commit_transaction()`, allowing atomic creation of datasets with
stable row IDs via the commit path.
- Thread the parameter through Python → PyO3 →
`CommitBuilder.use_stable_row_ids()`.

Closes lance-format#5906

## Test plan

- [x] `cargo check -p pylance` passes
- [x] `cargo clippy -p pylance` passes with no warnings
- [x] New test `test_commit_with_stable_row_ids` verifies that
`commit(Overwrite, enable_stable_row_ids=True)` creates a dataset with
sequential stable row IDs across append
)

`BlockList|BlockList` OR handled NULLs incorrectly: when one side was
TRUE and the other was NULL, the result could stay NULL, leading to
wrong query results.

Fix by computing FALSE rows explicitly and deriving NULL rows from
three‑valued logic.
The main use case of this PR is to allow engines like Spark and Trino to
pushdown an aggregate into Lance scanner in distributed worker when
possible. Today we technically already supports `COUNT(*)` pushdown
through `scanner.count_rows()` to count rows of each fragment
distributedly, this is a more generic version of that. My plan is to
allow an engine to pass a Substrait Aggregate expression to scanner in
the worker to support pushdown other aggregations like `SUM`, `MAX`,
`MIN`.

Another alternative I have thought about is to just update the
`dataset.sql()` API to accept a full Substrait plan so we can execute a
plan with aggregate, and update distributed worker to run a SQL
statement instead of running the scanner. But doing this feature in
scanner feels more aligned with how engines implement the distributed
execution. Basically whatever that could be executed by a single worker
in a distributed environment (predicate pushdown, column projection,
aggregate pushdown) should be supported by the scanner.

Note that with this change, we can technically remove
`create_count_plan` since it's just a subcase of
`create_aggregate_plan`, but we are not doing it in this PR. Once we
agree upon this direction, I will do a separated PR to refactor that.

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This PR should be rebased after
lance-format#5664

---------

Co-authored-by: majin.nathan <majin.nathan@bytedance.com>
1. use relative release note for beta releases. Example fix:
https://github.com/jackye1995/lance/releases/tag/v4.0.0-beta.5
2. ensure RC voting period is correct. Example fix:
jackye1995#73
3. implement minor release in release branch: 
- Example proper failure when trying to create minor release in release
branch when main branch is not at major version:
https://github.com/jackye1995/lance/actions/runs/21975317405/job/63485677361
- Example successful minor release in release branch:
jackye1995#80,
https://github.com/jackye1995/lance/releases/edit/v3.1.0-rc.1,
https://github.com/jackye1995/lance/releases/edit/v3.1.0
…#5915)

This changes Dataset::sample to sort its random indices. Supplying
sorted inputs to take results in a 50% reduction in peak memory
consumption. This change causes the IVF training stage of IVF-PQ index
builds to take approximately half as much memory.
…format#5950)

Make sure we always only do a metadata projection to avoid scanning data
when doing a count.

Also:
1. remove `create_count_plan` since `count_rows` is now just a case of
aggregate, no longer need a dedicated query plan.
2. remove duplicated calls to `aggregate_required_columns` by storing
required columns directly at aggregate construction time.
This creates a new LocalWriter that wraps tokio::fs::File in a BufWriter
for local file writes. ObjectStore::create() now returns one of these
when working against local storage, and an ObjectWriter for remote
storage.

Prior to this commit, local writes (e.g for shuffling) went through a
local object writer implementation that required a 5MB buffer per writer
and also simulated multipart upload machinery. For local writing, this
is slower than necessary and uses a lot of memory in situations where
many writers are open at once.

This change results in a substantial memory reduction and incremental
speedup for IVF shuffle.

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
…rmat#5935)

When we sample data for IVF training we go down one of two paths,
depending on whether the target column is nullable or not. For the
not-nullable path, we use dataset.sample. For the nullable path, we do a
block-level sampling of batches, then filter out all batches that
contain all-null rows, and interleave the result into the output.

The interleaving step requires two copies of the filtered batches to be
held in RAM.

This commit adds a specialization for the case of nullable fixed-length
array columns. We now pre-size an output vector, and process the input
scan as a stream. For each batch, we copy not-null values into the
output vector, and drop the batch. This saves one copy of the filtered
output and halves the memory consumption of the sampling step.
Xuanwo and others added 27 commits March 25, 2026 00:03
…mat#6269)

This refactors distributed vector indexing to remove the staging-root
workflow and treat worker outputs as segments written directly under
`indices/<segment_uuid>/`. Segment planning and build now operate on
segments directly, and vector indexing no longer uses
`merge_index_metadata`.

The API and docs are updated around `with_segments(...)` /
`plan_segments(...)`, and the focused distributed vector tests were
updated to cover the new workflow.

Java's API will be affected
…nce-format#6278)

Worker processes were opening the dataset without dataset_options,
silently dropping any options (storage_options, version,
index_cache_size, etc.) set by the caller. Also remove the debug print
statement.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This is basically the same problem encountered in
lance-format#5929

However, the solution there (and in other indices fixed since then) has
been to prune indices during the update phase to remove old fragments.
Unfortunately, FTS indices are maybe too large for this pruning to make
sense (it would require a complete scan through of the old index) though
I haven't verified how bad the performance would be.

This PR instead keeps track of which fragments have been invalidated. It
then uses this as a block-list at search time.
…tions (lance-format#6196)

#### Index and transaction
- `create_table_index`
- `list_table_indices`
- `describe_table_index_stats`
- `describe_transaction`
- `create_table_scalar_index`
- `drop_table_index`

---------

Co-authored-by: zhangyue19921010 <zhangyue.1010@bytedance.com>
This does not hook the throttle up anywhere yet, that will come in a
future PR.

Closes lance-format#6237
Closes lance-format#6238
This change enables phrase queries to match across stop-word gaps.

Example:
For `doc="love the format"` indexed with `remove_stop_words=True`, the
index does not store the stop word the.

With this change, users can still match the document with the phrase
query `q="love the format"`. In this mode, all stop words are treated as
equivalent placeholders for phrase matching, so `q="love a format"` will
also match the same document.

This makes queries that containing stop words 3x~10x faster in the cost
of a lit bit accuracy

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
## Summary
- `IndicesBuilder` rejected uint8 vector columns and didn't include
"hamming" in its allowed distance types, even though the underlying Rust
`train_ivf_model` supports hamming via k-modes
- Relaxes `_normalize_column` to accept unsigned integer value types
alongside floats
- Adds "hamming" to `_normalize_distance_type`'s allowed list
- Adds `test_ivf_centroids_hamming` test with uint8 vectors


## Test plan
- [ ] `test_ivf_centroids_hamming` — end-to-end IVF training with uint8
vectors and hamming distance
- [ ] Existing `test_ivf_centroids` tests still pass (float path
unchanged)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…format#6176)

This adds a dedicated benchmark for distributed vector index
finalization and a small query-side metric to count `find_partitions`
calls. Together they give us a baseline for analyzing the current
single-node merge bottleneck and for evaluating future segmented-index
work.

As context, the new `distributed_merge_only_ivf_pq` benchmark already
shows that finalize cost grows much faster than input bytes as shard
count and partition count increase. In the local filesystem benchmark,
the mean finalize time grows from about `64 ms` at `8 shards / 256
partitions` to about `2.87 s` at `128 shards / 1024 partitions`.

---

Based on this benchmark, I noticed that our current logic performs
poorly as the number of shards increases.

<img width="2750" height="1584"
alt="2026-03-12-distributed-merge-trend-matplotlib"
src="https://github.com/user-attachments/assets/361aa371-5941-431d-964a-8ea1e2a086d4"
/>
This moves `DatasetIndexExt` and the dataset-facing index segment types
out of `lance-index` and into `lance`, so the public dataset index
management API lives in the dataset layer instead of the lower-level
index implementation crate.


Close lance-format#6221
…#6302)

This fixes the main-branch CI failure introduced after lance-format#6280 moved
`DatasetIndexExt` into `lance`. `rust/lance-namespace-impls/src/dir.rs`
still imported and referenced the old `lance_index::DatasetIndexExt`
path, which broke the Java JNI workflow when it compiled the namespace
implementation.
In the rust SDK we have a `NamespaceError` that contains a message and
an error code. Additionally, in both the Java and python SDKs we have
logic to parse this type (and error codes) to produce native Java
exceptions. The problem, is that in the `namespace-impls` we are never
using them. Therefore the rust `NamespaceError` type is never used and
the Java / python SDK conversion code the parse and throw native errors
is essentially dead code. In this PR, we update the `namespace-impls` to
actually throw the correct error types.

Closes lance-format#6240

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>
…ance-format#6102)

## Summary

Implements a parallel async scanner alongside the existing blocking
`LanceScanner` to prevent thread starvation in Java query engines like
Presto and Trino.

This PR adds **AsyncScanner** - a non-blocking alternative to
`LanceScanner` that uses `CompletableFuture` for true async I/O
operations.

## Motivation

Query engines like Presto and Trino rely on non-blocking I/O to
efficiently multiplex thousands of concurrent queries on a limited
thread pool. The current `LanceScanner` blocks Java threads during Rust
I/O operations, causing thread starvation and poor performance in these
environments.

## Key Features

✅ **Non-blocking I/O**: Spawns Tokio tasks instead of blocking Java
threads
✅ **CompletableFuture API**: Native Java async patterns for seamless
integration
✅ **Persistent JNI dispatcher**: Single thread attached to JVM for
zero-overhead callbacks
✅ **Task-based architecture**: Uses task IDs instead of JNI refs to
prevent memory leaks
✅ **Full feature parity**: Supports filters, projections, vector search,
FTS, aggregates
✅ **Clean cancellation**: Proper task cleanup without resource leaks  
✅ **Parallel to existing API**: No breaking changes, `LanceScanner`
unchanged

## Architecture

The implementation uses the **"Task ID + Dispatcher" pattern**:

1. **Java manages futures**: `ConcurrentHashMap<taskId,
CompletableFuture<Long>>` maps task IDs to pending requests
2. **Rust spawns async tasks**: Returns immediately while Tokio handles
I/O in background
3. **Lock-free completion channel**: Carries `(taskId, resultPointer)`
from Tokio to dispatcher
4. **Persistent dispatcher thread**: Attaches to JVM once, completes
Java futures via cached JNI method IDs

### Components

**Rust (`java/lance-jni/src/`):**
- `dispatcher.rs` - Persistent JNI thread with cached method IDs for
callbacks
- `task_tracker.rs` - Thread-safe task registry using `RwLock<HashMap>`
- `async_scanner.rs` - AsyncScanner with Tokio task spawning and JNI
exports
- `lib.rs` - Modified to add `JNI_OnLoad` hook for dispatcher
initialization

**Java (`java/src/main/java/org/lance/ipc/`):**
- `AsyncScanner.java` - CompletableFuture-based async API with task
management

**Tests (`java/src/test/java/org/lance/`):**
- `AsyncScannerTest.java` - 6 comprehensive examples demonstrating usage
patterns

## Usage Examples

### Basic async scan
```java
ScanOptions options = new ScanOptions.Builder().batchSize(20L).build();

try (AsyncScanner scanner = AsyncScanner.create(dataset, options, allocator)) {
    CompletableFuture<ArrowReader> future = scanner.scanBatchesAsync();
    ArrowReader reader = future.get(10, TimeUnit.SECONDS);
    
    while (reader.loadNextBatch()) {
        // Process batches without blocking
    }
}
```

### Multiple concurrent scans (key benefit for Presto/Trino)
```java
List<CompletableFuture<Integer>> futures = new ArrayList<>();

for (int i = 0; i < 100; i++) {
    AsyncScanner scanner = AsyncScanner.create(dataset, options, allocator);
    futures.add(scanner.scanBatchesAsync()
        .thenApply(reader -> processInBackground(reader)));
}

// All scans run in parallel without blocking threads!
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
    .get(30, TimeUnit.SECONDS);
```

## Testing

```bash
cd java
./mvnw test -Dtest=AsyncScannerTest
./mvnw compile  # Full build verification
```

All tests pass ✅

## Compatibility

- ✅ No breaking changes to existing APIs
- ✅ `LanceScanner` remains unchanged
- ✅ Uses same `ScanOptions` for consistency
- ✅ Opt-in: users choose blocking or async based on their needs

## Performance Benefits

For query engines running hundreds of concurrent queries:
- **Before**: Thread pool exhaustion as threads block on I/O
- **After**: Threads immediately return, I/O happens in background

## Checklist

- [x] Rust code compiles (`cargo check`)
- [x] Java code compiles (`./mvnw compile`)
- [x] Code formatted (`./mvnw spotless:apply && cargo fmt`)
- [x] Comprehensive tests added (`AsyncScannerTest.java`)
- [x] Documentation in code examples
- [x] No breaking changes

## Related Issues

Addresses the need for non-blocking I/O in Java query engines that
integrate with LanceDB.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…e-format#6300)

`convert_json_arrow_type` only handled scalar types
(int/float/utf8/binary), causing deserialization failures for any schema
containing list, struct, large_binary, large_utf8, fixed_size_list, map,
decimal, or date/time types.

This made `arrow_type_to_json` and `convert_json_arrow_type` asymmetric:
serialization worked for all types but deserialization rejected most of
them with "Unsupported Arrow type".

In practice this broke the DuckDB lance extension's fast
schema-from-REST path — tables with list/struct columns fell back to
opening the S3 dataset for every DESCRIBE, making SHOW ALL TABLES ~20x
slower than necessary.

Add support for: float16, large_utf8, large_binary, fixed_size_binary,
decimal32/64/128/256, date32, date64, timestamp, duration, list,
large_list, fixed_size_list, struct, and map.

Add a roundtrip test covering all supported types.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary
- add `Dataset::version_id()` in Rust to return the checked-out manifest
version without building full version metadata
- route Python `dataset.version` and default checkout refs through the
new fast path
- route Java `Dataset.version()` through a new JNI version-id accessor
while keeping `getVersion()` unchanged
- extend Rust, Python, and Java tests to cover current, updated, and
historical version reads

## Testing
- `cargo test -p lance test_version_id_fast_path`
- `cargo check --manifest-path java/lance-jni/Cargo.toml`
- `cargo check --manifest-path python/Cargo.toml`
- Python pytest target not run in this environment (`pytest`
unavailable)
- Java Maven test target not run in this environment (JRE unavailable)
…at#6270)

This change makes the logical-index and physical-segment split explicit
in the user-facing index APIs without breaking existing behavior.
`describe_indices` remains the logical view, `describe_index_segments`
becomes the explicit physical-segment view, and index statistics now
expose `num_segments` / `segments` alongside the legacy fields for
compatibility.

The Rust, Python, and Java bindings now use the same model so
segment-aware callers do not need to infer semantics from raw manifest
metadata. I validated the Rust path with `cargo test -p lance
test_optimize_delta_indices -- --nocapture` and the Java path with
`./mvnw -q -Dtest=DatasetTest#testDescribeIndicesByName test`.
This updates the documented minimum voting period for core stable major
releases from 1 week to 3 days. It also updates the RC voting discussion
automation so generated vote windows stay aligned with the documented
policy.

This follows the discussion in
lance-format#6147, where feedback
converged on narrowing the proposal to core major releases only while
keeping patch and subproject stable releases unchanged.
…format#6251)

This tightens the new multi-segment vector index path added in lance-format#6220.

It enforces disjoint fragment coverage when committing a segment set,
adds regression coverage that grouped segment coverage matches the union
of its source shard coverage, and verifies that remap only touches
segments covering affected fragments.

It also adds cleanup coverage for both replaced committed segments and
stale uncommitted `_indices/<uuid>` artifacts, and documents these
contracts in the distributed indexing guide.
)

This PR builds on lance-format#6294 and exposes the remaining pieces needed to
construct non-shared centroid vector index builds.

It adds fragment-scoped IVF/PQ training in Rust and exports the same
training flow to Python, so users can train per-segment artifacts and
feed them into the existing distributed build path.
…ance-format#6312)

Closes lance-format#6311 

## What

Add an optional `storage_options` keyword argument to `IvfModel.save()`,
`IvfModel.load()`, `PqModel.save()`, and `PqModel.load()`, and forward
it to the underlying `LanceFileWriter` / `LanceFileReader`.

## Why

These methods accept cloud storage URIs (e.g. `s3://`, `gs://`) but
previously had no way to pass credentials or backend-specific
configuration. Users were forced to rely on environment variables for
authentication, which breaks down when dealing with multiple storage
backends that require different credentials.

Both `LanceFileWriter` and `LanceFileReader` already support
`storage_options` — this change simply threads the parameter through.

## Change Summary

- **`python/python/lance/indices/ivf.py`**: Add `storage_options`
parameter to `IvfModel.save()` and `IvfModel.load()`, forward to
`LanceFileWriter` / `LanceFileReader`.
- **`python/python/lance/indices/pq.py`**: Same change for
`PqModel.save()` and `PqModel.load()`.

## Usage

```python
from lance.indices.ivf import IvfModel
from lance.indices.pq import PqModel

opts = {"aws_access_key_id": "...", "aws_secret_access_key": "...", "region": "us-east-1"}

# Save
ivf_model.save("s3://bucket/ivf.lance", storage_options=opts)
pq_model.save("s3://bucket/pq.lance", storage_options=opts)

# Load
ivf_model = IvfModel.load("s3://bucket/ivf.lance", storage_options=opts)
pq_model = PqModel.load("s3://bucket/pq.lance", storage_options=opts)
…lance-format#6310)

## Summary

When `enable_stable_row_id` is enabled, the first `take_rows()` call
triggers a full `RowIdIndex` build. On large datasets this cold start
was extremely slow (18.9s for 968M rows, 220s for 4.26B rows).

### Root Causes

**1. O(total_rows) segment expansion in `decompose_sequence`**

The original code expanded every `U64Segment` element-by-element, even
for `Range` segments with no deletions. For a `Range(0..273711)` with no
deletions, this meant 273K iterations, deletion vector checks, temporary
allocations, and re-compression — only to produce the same Range back.
Across 18,243 fragments averaging 233K rows, this totaled **4.26 billion
iterations** with ~32 GB of temporary allocations.

**2. O(N²) fragment lookup in `load_row_id_index`**

The original code called `fragments.iter().find()` (O(N) linear search)
for each of N fragments, resulting in O(N²) comparisons. `try_join_all`
spawned all N futures at once, overwhelming the async runtime.
`get_deletion_vector()` was called unconditionally even for fragments
without deletion files.

### Solution

**Fix 1: O(1) fast path for Range segments without deletions.** When a
fragment has no deletions and its row_id sequence is a `Range`,
construct the index chunk directly without iterating.

**Fix 2: HashMap lookup + conditional deletion vector loading.** Use a
`HashMap<u32, &FileFragment>` for O(1) lookup, `buffer_unordered` for
controlled concurrency, and skip `get_deletion_vector()` when there's no
deletion file.

### Results

| Dataset | Before | After | Speedup |
|---------|--------|-------|---------|
| 968M rows, 3,540 fragments | 18.9s | 150ms | **126x** |
| 4.26B rows, 18,243 fragments | 220s | 89ms | **2,471x** |

## Test plan

- [x] All existing rowids tests pass (45 tests)
- [x] Clippy clean
- [x] New test `test_large_range_segments_no_deletions` validates fast
path correctness at boundaries and performance (100 fragments × 250K
rows completes < 1s)
- [ ] Verify on real datasets with deletions to ensure slow path
correctness

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes lance-format#6239

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Add a "Conflict Handling" section to the performance guide explaining
how concurrent operation conflicts affect throughput, with common
conflict examples and a link to the transaction spec
- Add a "Fragment Reuse Index" subsection to the performance guide
describing how the FRI avoids compaction/index conflicts, with a Python
API snippet
- Add an "Impacts" section to the FRI spec covering conflict resolution
changes, index load cost, and FRI growth/cleanup

## Test plan
- [ ] Verify doc links resolve correctly (conflict resolution anchor,
FRI spec link)
- [ ] Review rendered markdown for formatting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@andrea-reale andrea-reale marked this pull request as draft March 30, 2026 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.