Closed
Conversation
The files are generated with `make licenses`, currently expected to run manually. In the future, some automations could be built.
lance-format#5867) closes lance-format#5682 changes: - Treat element-level NULLs in LABEL_LIST as non-matches so array_has_any/array_has_all return TRUE/FALSE when the list itself is non-NULL. - Allow nullable list literals in `LabelListQuery::to_expr` to prevent `explain_plan()` panics. - Add Python tests covering element-level NULLs, list-level NULLs, NULL-literal filters and explain behavior. --------- Co-authored-by: Will Jones <willjones127@gmail.com>
Introduce 2 CodeX workflows that could be commonly used: 1. patch a merged PR to a specific release branch 2. fix a CI workflow that is currently breaking main branch
`array_has_any/all(labels, [])` yields an empty label-list query; LabelList index merge (`set_union` / `set_intersection`) `unwraps` the first element and panics on an empty iterator. Skip LABEL_LIST index parsing for empty lists to avoid the panic and fall back to normal execution.
…ance-format#5913) Fix two copy-paste typos in `OrderableScalarValue::cmp` panic messages
During IVF shuffle, we have a FileWriter per partition and each accumulates page metadata in memory over the course of the shuffle. With large datasets and large numbers of partitions, this memory grows over time to dominate the memory cost of IVF shuffle. This patch adds optional functionality to the FileWriter that serializes page metadata to a spill file and enables it by default in the IVF shuffler.
This removes a buffer in the shuffler that accumulated batches for batched writes to temporary storage. This was configured with a public buffer_size parameter hence the breaking change. Previously, when we shuffled data we accumulated this many batches for each partition in memory and then flushed them all to disk at once. This may have been intended as an optimization in the original implementation of the shuffler, which supported external shuffling through arbitrary object storage. However, the shuffler was subsequently hardcoded to use local disk (where this kind of buffering serves no benefit) and even on remote object storage, we already have a layer of buffering in the storage writer. Instead of buffering batches, just write them directly to the FileWriter. This results in much more predictable memory usage and also faster index builds.
There are various ways to write data and several of them are failing for JSON data at the moment because it requires a conversion from logical to physical representation. I'd like to rework this more generally but I want to get the current implementation working first. This fixes one of the merge_insert paths which is currently failing because the field metadata is lost as part of the operation (and so the data appears to be a normal string column and is not converted)
This PR also adds support for `mtune=apple-a13` when building lance-linalg for iOS, which is the earliest supported iOS architecture at time of writing. This is pulled out of lance-format#5866 after the initial bug was fixed in lance-format#5747 and lancedb updated to Lance 2.x
) This adds support for progress reporting on index builds, allowing callers to determine whether a build is progressing, what stage it is in, and in some cases how far into the stage. Breaking: updates some index builder function signatures to include a progress callback implementation. A noop implementation is included in the patch.
This will ensure JSON columns are properly converted in paths like add_columns
Adding python api to support using fts as a filter for vector search, or using vector_query as a filter for fts search. Related to lance-format#4928.
closes lance-format#5895 `Allow|Block OR` previously assumed `NULL` rows were always included in the `block.selected` set; when they were not, `NULL`s could be dropped and `FALSE`/`NULL` were mixed, leading to incorrect results. This change computes `TRUE/FALSE/NULL` explicitly for `Allow|Block OR` and derives `NULL/selected` sets from those.
This allows Java to also pass in a Session shared across datasets similar to python and rust. Session can then be used for engine side caching implementation in Spark and Trino
…subproject list (lance-format#5847) See lance-format#5848 for more background and vote --------- Co-authored-by: Weston Pace <weston.pace@gmail.com>
Link: https://github.com/lance-format/lance/actions/runs/21917803271 Summary of failure: - build-no-lock failed because rustc 1.90.0 is too old for updated aws-smithy dependencies (requires 1.91). Fixes applied: - Bumped the pinned Rust toolchain to 1.91.0 to satisfy dependency MSRV in no-lock builds. --------- Co-authored-by: Jack Ye <yezhaoqin@gmail.com>
We lack docs for how to create and query an FTS index in Lance. This PR adds detailed docs, based on the latest v1.0.4 `pylance` API. All code snippets have been tested with the latest stable release and I gave it another pass with GPT 5.2 to ensure accuracy per the latest Rust code in [tokenizer.rs](https://github.com/lance-format/lance/blob/f77769778aa93f47572187b83ccd7b6638dc39a3/rust/lance-index/src/scalar/inverted/tokenizer.rs#L180).
1. allow backporting multiple PRs in the same run 2. use a more meaningful PR title for CI fixes
…5908) ## Summary - Expose `enable_stable_row_ids` parameter in `LanceDataset.commit()` and `commit_transaction()`, allowing atomic creation of datasets with stable row IDs via the commit path. - Thread the parameter through Python → PyO3 → `CommitBuilder.use_stable_row_ids()`. Closes lance-format#5906 ## Test plan - [x] `cargo check -p pylance` passes - [x] `cargo clippy -p pylance` passes with no warnings - [x] New test `test_commit_with_stable_row_ids` verifies that `commit(Overwrite, enable_stable_row_ids=True)` creates a dataset with sequential stable row IDs across append
The main use case of this PR is to allow engines like Spark and Trino to pushdown an aggregate into Lance scanner in distributed worker when possible. Today we technically already supports `COUNT(*)` pushdown through `scanner.count_rows()` to count rows of each fragment distributedly, this is a more generic version of that. My plan is to allow an engine to pass a Substrait Aggregate expression to scanner in the worker to support pushdown other aggregations like `SUM`, `MAX`, `MIN`. Another alternative I have thought about is to just update the `dataset.sql()` API to accept a full Substrait plan so we can execute a plan with aggregate, and update distributed worker to run a SQL statement instead of running the scanner. But doing this feature in scanner feels more aligned with how engines implement the distributed execution. Basically whatever that could be executed by a single worker in a distributed environment (predicate pushdown, column projection, aggregate pushdown) should be supported by the scanner. Note that with this change, we can technically remove `create_count_plan` since it's just a subcase of `create_aggregate_plan`, but we are not doing it in this PR. Once we agree upon this direction, I will do a separated PR to refactor that. --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
This PR should be rebased after lance-format#5664 --------- Co-authored-by: majin.nathan <majin.nathan@bytedance.com>
1. use relative release note for beta releases. Example fix: https://github.com/jackye1995/lance/releases/tag/v4.0.0-beta.5 2. ensure RC voting period is correct. Example fix: jackye1995#73 3. implement minor release in release branch: - Example proper failure when trying to create minor release in release branch when main branch is not at major version: https://github.com/jackye1995/lance/actions/runs/21975317405/job/63485677361 - Example successful minor release in release branch: jackye1995#80, https://github.com/jackye1995/lance/releases/edit/v3.1.0-rc.1, https://github.com/jackye1995/lance/releases/edit/v3.1.0
…#5915) This changes Dataset::sample to sort its random indices. Supplying sorted inputs to take results in a 50% reduction in peak memory consumption. This change causes the IVF training stage of IVF-PQ index builds to take approximately half as much memory.
…format#5950) Make sure we always only do a metadata projection to avoid scanning data when doing a count. Also: 1. remove `create_count_plan` since `count_rows` is now just a case of aggregate, no longer need a dedicated query plan. 2. remove duplicated calls to `aggregate_required_columns` by storing required columns directly at aggregate construction time.
This creates a new LocalWriter that wraps tokio::fs::File in a BufWriter for local file writes. ObjectStore::create() now returns one of these when working against local storage, and an ObjectWriter for remote storage. Prior to this commit, local writes (e.g for shuffling) went through a local object writer implementation that required a 5MB buffer per writer and also simulated multipart upload machinery. For local writing, this is slower than necessary and uses a lot of memory in situations where many writers are open at once. This change results in a substantial memory reduction and incremental speedup for IVF shuffle. --------- Co-authored-by: Will Jones <willjones127@gmail.com>
…rmat#5935) When we sample data for IVF training we go down one of two paths, depending on whether the target column is nullable or not. For the not-nullable path, we use dataset.sample. For the nullable path, we do a block-level sampling of batches, then filter out all batches that contain all-null rows, and interleave the result into the output. The interleaving step requires two copies of the filtered batches to be held in RAM. This commit adds a specialization for the case of nullable fixed-length array columns. We now pre-size an output vector, and process the input scan as a stream. For each batch, we copy not-null values into the output vector, and drop the batch. This saves one copy of the filtered output and halves the memory consumption of the sampling step.
…mat#6269) This refactors distributed vector indexing to remove the staging-root workflow and treat worker outputs as segments written directly under `indices/<segment_uuid>/`. Segment planning and build now operate on segments directly, and vector indexing no longer uses `merge_index_metadata`. The API and docs are updated around `with_segments(...)` / `plan_segments(...)`, and the focused distributed vector tests were updated to cover the new workflow. Java's API will be affected
…nce-format#6278) Worker processes were opening the dataset without dataset_options, silently dropping any options (storage_options, version, index_cache_size, etc.) set by the caller. Also remove the debug print statement. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This is basically the same problem encountered in lance-format#5929 However, the solution there (and in other indices fixed since then) has been to prune indices during the update phase to remove old fragments. Unfortunately, FTS indices are maybe too large for this pruning to make sense (it would require a complete scan through of the old index) though I haven't verified how bad the performance would be. This PR instead keeps track of which fragments have been invalidated. It then uses this as a block-list at search time.
…tions (lance-format#6196) #### Index and transaction - `create_table_index` - `list_table_indices` - `describe_table_index_stats` - `describe_transaction` - `create_table_scalar_index` - `drop_table_index` --------- Co-authored-by: zhangyue19921010 <zhangyue.1010@bytedance.com>
This does not hook the throttle up anywhere yet, that will come in a future PR. Closes lance-format#6237 Closes lance-format#6238
This change enables phrase queries to match across stop-word gaps. Example: For `doc="love the format"` indexed with `remove_stop_words=True`, the index does not store the stop word the. With this change, users can still match the document with the phrase query `q="love the format"`. In this mode, all stop words are treated as equivalent placeholders for phrase matching, so `q="love a format"` will also match the same document. This makes queries that containing stop words 3x~10x faster in the cost of a lit bit accuracy --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>
## Summary - `IndicesBuilder` rejected uint8 vector columns and didn't include "hamming" in its allowed distance types, even though the underlying Rust `train_ivf_model` supports hamming via k-modes - Relaxes `_normalize_column` to accept unsigned integer value types alongside floats - Adds "hamming" to `_normalize_distance_type`'s allowed list - Adds `test_ivf_centroids_hamming` test with uint8 vectors ## Test plan - [ ] `test_ivf_centroids_hamming` — end-to-end IVF training with uint8 vectors and hamming distance - [ ] Existing `test_ivf_centroids` tests still pass (float path unchanged) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…format#6176) This adds a dedicated benchmark for distributed vector index finalization and a small query-side metric to count `find_partitions` calls. Together they give us a baseline for analyzing the current single-node merge bottleneck and for evaluating future segmented-index work. As context, the new `distributed_merge_only_ivf_pq` benchmark already shows that finalize cost grows much faster than input bytes as shard count and partition count increase. In the local filesystem benchmark, the mean finalize time grows from about `64 ms` at `8 shards / 256 partitions` to about `2.87 s` at `128 shards / 1024 partitions`. --- Based on this benchmark, I noticed that our current logic performs poorly as the number of shards increases. <img width="2750" height="1584" alt="2026-03-12-distributed-merge-trend-matplotlib" src="https://github.com/user-attachments/assets/361aa371-5941-431d-964a-8ea1e2a086d4" />
This moves `DatasetIndexExt` and the dataset-facing index segment types out of `lance-index` and into `lance`, so the public dataset index management API lives in the dataset layer instead of the lower-level index implementation crate. Close lance-format#6221
…#6302) This fixes the main-branch CI failure introduced after lance-format#6280 moved `DatasetIndexExt` into `lance`. `rust/lance-namespace-impls/src/dir.rs` still imported and referenced the old `lance_index::DatasetIndexExt` path, which broke the Java JNI workflow when it compiled the namespace implementation.
In the rust SDK we have a `NamespaceError` that contains a message and an error code. Additionally, in both the Java and python SDKs we have logic to parse this type (and error codes) to produce native Java exceptions. The problem, is that in the `namespace-impls` we are never using them. Therefore the rust `NamespaceError` type is never used and the Java / python SDK conversion code the parse and throw native errors is essentially dead code. In this PR, we update the `namespace-impls` to actually throw the correct error types. Closes lance-format#6240 --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Esteban Gutierrez <esteban@lancedb.com>
…ance-format#6102) ## Summary Implements a parallel async scanner alongside the existing blocking `LanceScanner` to prevent thread starvation in Java query engines like Presto and Trino. This PR adds **AsyncScanner** - a non-blocking alternative to `LanceScanner` that uses `CompletableFuture` for true async I/O operations. ## Motivation Query engines like Presto and Trino rely on non-blocking I/O to efficiently multiplex thousands of concurrent queries on a limited thread pool. The current `LanceScanner` blocks Java threads during Rust I/O operations, causing thread starvation and poor performance in these environments. ## Key Features ✅ **Non-blocking I/O**: Spawns Tokio tasks instead of blocking Java threads ✅ **CompletableFuture API**: Native Java async patterns for seamless integration ✅ **Persistent JNI dispatcher**: Single thread attached to JVM for zero-overhead callbacks ✅ **Task-based architecture**: Uses task IDs instead of JNI refs to prevent memory leaks ✅ **Full feature parity**: Supports filters, projections, vector search, FTS, aggregates ✅ **Clean cancellation**: Proper task cleanup without resource leaks ✅ **Parallel to existing API**: No breaking changes, `LanceScanner` unchanged ## Architecture The implementation uses the **"Task ID + Dispatcher" pattern**: 1. **Java manages futures**: `ConcurrentHashMap<taskId, CompletableFuture<Long>>` maps task IDs to pending requests 2. **Rust spawns async tasks**: Returns immediately while Tokio handles I/O in background 3. **Lock-free completion channel**: Carries `(taskId, resultPointer)` from Tokio to dispatcher 4. **Persistent dispatcher thread**: Attaches to JVM once, completes Java futures via cached JNI method IDs ### Components **Rust (`java/lance-jni/src/`):** - `dispatcher.rs` - Persistent JNI thread with cached method IDs for callbacks - `task_tracker.rs` - Thread-safe task registry using `RwLock<HashMap>` - `async_scanner.rs` - AsyncScanner with Tokio task spawning and JNI exports - `lib.rs` - Modified to add `JNI_OnLoad` hook for dispatcher initialization **Java (`java/src/main/java/org/lance/ipc/`):** - `AsyncScanner.java` - CompletableFuture-based async API with task management **Tests (`java/src/test/java/org/lance/`):** - `AsyncScannerTest.java` - 6 comprehensive examples demonstrating usage patterns ## Usage Examples ### Basic async scan ```java ScanOptions options = new ScanOptions.Builder().batchSize(20L).build(); try (AsyncScanner scanner = AsyncScanner.create(dataset, options, allocator)) { CompletableFuture<ArrowReader> future = scanner.scanBatchesAsync(); ArrowReader reader = future.get(10, TimeUnit.SECONDS); while (reader.loadNextBatch()) { // Process batches without blocking } } ``` ### Multiple concurrent scans (key benefit for Presto/Trino) ```java List<CompletableFuture<Integer>> futures = new ArrayList<>(); for (int i = 0; i < 100; i++) { AsyncScanner scanner = AsyncScanner.create(dataset, options, allocator); futures.add(scanner.scanBatchesAsync() .thenApply(reader -> processInBackground(reader))); } // All scans run in parallel without blocking threads! CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])) .get(30, TimeUnit.SECONDS); ``` ## Testing ```bash cd java ./mvnw test -Dtest=AsyncScannerTest ./mvnw compile # Full build verification ``` All tests pass ✅ ## Compatibility - ✅ No breaking changes to existing APIs - ✅ `LanceScanner` remains unchanged - ✅ Uses same `ScanOptions` for consistency - ✅ Opt-in: users choose blocking or async based on their needs ## Performance Benefits For query engines running hundreds of concurrent queries: - **Before**: Thread pool exhaustion as threads block on I/O - **After**: Threads immediately return, I/O happens in background ## Checklist - [x] Rust code compiles (`cargo check`) - [x] Java code compiles (`./mvnw compile`) - [x] Code formatted (`./mvnw spotless:apply && cargo fmt`) - [x] Comprehensive tests added (`AsyncScannerTest.java`) - [x] Documentation in code examples - [x] No breaking changes ## Related Issues Addresses the need for non-blocking I/O in Java query engines that integrate with LanceDB. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…e-format#6300) `convert_json_arrow_type` only handled scalar types (int/float/utf8/binary), causing deserialization failures for any schema containing list, struct, large_binary, large_utf8, fixed_size_list, map, decimal, or date/time types. This made `arrow_type_to_json` and `convert_json_arrow_type` asymmetric: serialization worked for all types but deserialization rejected most of them with "Unsupported Arrow type". In practice this broke the DuckDB lance extension's fast schema-from-REST path — tables with list/struct columns fell back to opening the S3 dataset for every DESCRIBE, making SHOW ALL TABLES ~20x slower than necessary. Add support for: float16, large_utf8, large_binary, fixed_size_binary, decimal32/64/128/256, date32, date64, timestamp, duration, list, large_list, fixed_size_list, struct, and map. Add a roundtrip test covering all supported types. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Summary - add `Dataset::version_id()` in Rust to return the checked-out manifest version without building full version metadata - route Python `dataset.version` and default checkout refs through the new fast path - route Java `Dataset.version()` through a new JNI version-id accessor while keeping `getVersion()` unchanged - extend Rust, Python, and Java tests to cover current, updated, and historical version reads ## Testing - `cargo test -p lance test_version_id_fast_path` - `cargo check --manifest-path java/lance-jni/Cargo.toml` - `cargo check --manifest-path python/Cargo.toml` - Python pytest target not run in this environment (`pytest` unavailable) - Java Maven test target not run in this environment (JRE unavailable)
…at#6270) This change makes the logical-index and physical-segment split explicit in the user-facing index APIs without breaking existing behavior. `describe_indices` remains the logical view, `describe_index_segments` becomes the explicit physical-segment view, and index statistics now expose `num_segments` / `segments` alongside the legacy fields for compatibility. The Rust, Python, and Java bindings now use the same model so segment-aware callers do not need to infer semantics from raw manifest metadata. I validated the Rust path with `cargo test -p lance test_optimize_delta_indices -- --nocapture` and the Java path with `./mvnw -q -Dtest=DatasetTest#testDescribeIndicesByName test`.
This updates the documented minimum voting period for core stable major releases from 1 week to 3 days. It also updates the RC voting discussion automation so generated vote windows stay aligned with the documented policy. This follows the discussion in lance-format#6147, where feedback converged on narrowing the proposal to core major releases only while keeping patch and subproject stable releases unchanged.
…format#6251) This tightens the new multi-segment vector index path added in lance-format#6220. It enforces disjoint fragment coverage when committing a segment set, adds regression coverage that grouped segment coverage matches the union of its source shard coverage, and verifies that remap only touches segments covering affected fragments. It also adds cleanup coverage for both replaced committed segments and stale uncommitted `_indices/<uuid>` artifacts, and documents these contracts in the distributed indexing guide.
) This PR builds on lance-format#6294 and exposes the remaining pieces needed to construct non-shared centroid vector index builds. It adds fragment-scoped IVF/PQ training in Rust and exports the same training flow to Python, so users can train per-segment artifacts and feed them into the existing distributed build path.
…ance-format#6312) Closes lance-format#6311 ## What Add an optional `storage_options` keyword argument to `IvfModel.save()`, `IvfModel.load()`, `PqModel.save()`, and `PqModel.load()`, and forward it to the underlying `LanceFileWriter` / `LanceFileReader`. ## Why These methods accept cloud storage URIs (e.g. `s3://`, `gs://`) but previously had no way to pass credentials or backend-specific configuration. Users were forced to rely on environment variables for authentication, which breaks down when dealing with multiple storage backends that require different credentials. Both `LanceFileWriter` and `LanceFileReader` already support `storage_options` — this change simply threads the parameter through. ## Change Summary - **`python/python/lance/indices/ivf.py`**: Add `storage_options` parameter to `IvfModel.save()` and `IvfModel.load()`, forward to `LanceFileWriter` / `LanceFileReader`. - **`python/python/lance/indices/pq.py`**: Same change for `PqModel.save()` and `PqModel.load()`. ## Usage ```python from lance.indices.ivf import IvfModel from lance.indices.pq import PqModel opts = {"aws_access_key_id": "...", "aws_secret_access_key": "...", "region": "us-east-1"} # Save ivf_model.save("s3://bucket/ivf.lance", storage_options=opts) pq_model.save("s3://bucket/pq.lance", storage_options=opts) # Load ivf_model = IvfModel.load("s3://bucket/ivf.lance", storage_options=opts) pq_model = PqModel.load("s3://bucket/pq.lance", storage_options=opts)
…lance-format#6310) ## Summary When `enable_stable_row_id` is enabled, the first `take_rows()` call triggers a full `RowIdIndex` build. On large datasets this cold start was extremely slow (18.9s for 968M rows, 220s for 4.26B rows). ### Root Causes **1. O(total_rows) segment expansion in `decompose_sequence`** The original code expanded every `U64Segment` element-by-element, even for `Range` segments with no deletions. For a `Range(0..273711)` with no deletions, this meant 273K iterations, deletion vector checks, temporary allocations, and re-compression — only to produce the same Range back. Across 18,243 fragments averaging 233K rows, this totaled **4.26 billion iterations** with ~32 GB of temporary allocations. **2. O(N²) fragment lookup in `load_row_id_index`** The original code called `fragments.iter().find()` (O(N) linear search) for each of N fragments, resulting in O(N²) comparisons. `try_join_all` spawned all N futures at once, overwhelming the async runtime. `get_deletion_vector()` was called unconditionally even for fragments without deletion files. ### Solution **Fix 1: O(1) fast path for Range segments without deletions.** When a fragment has no deletions and its row_id sequence is a `Range`, construct the index chunk directly without iterating. **Fix 2: HashMap lookup + conditional deletion vector loading.** Use a `HashMap<u32, &FileFragment>` for O(1) lookup, `buffer_unordered` for controlled concurrency, and skip `get_deletion_vector()` when there's no deletion file. ### Results | Dataset | Before | After | Speedup | |---------|--------|-------|---------| | 968M rows, 3,540 fragments | 18.9s | 150ms | **126x** | | 4.26B rows, 18,243 fragments | 220s | 89ms | **2,471x** | ## Test plan - [x] All existing rowids tests pass (45 tests) - [x] Clippy clean - [x] New test `test_large_range_segments_no_deletions` validates fast path correctness at boundaries and performance (100 fragments × 250K rows completes < 1s) - [ ] Verify on real datasets with deletions to ensure slow path correctness 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes lance-format#6239 --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary - Add a "Conflict Handling" section to the performance guide explaining how concurrent operation conflicts affect throughput, with common conflict examples and a link to the transaction spec - Add a "Fragment Reuse Index" subsection to the performance guide describing how the FRI avoids compaction/index conflicts, with a Python API snippet - Add an "Impacts" section to the FRI spec covering conflict resolution changes, index load cost, and FRI growth/cleanup ## Test plan - [ ] Verify doc links resolve correctly (conflict resolution anchor, FRI spec link) - [ ] Review rendered markdown for formatting 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
nprobesmethod spelling (chore: fixnprobesmethod spelling lance-format/lance#5147)previousmod (refactor!: move all previous code intopreviousmod lance-format/lance#5217)NOTwith scalar indices (fix!: null handling when usingNOTwith scalar indices lance-format/lance#5270)py.typedmarker file (feat: addpy.typedmarker file lance-format/lance#5479)distance_rangeparam in the Python scannernearestconfig (feat(python): expose thedistance_rangeparam in the Python scannernearestconfig lance-format/lance#5486)geofeature flag (feat: make geodatafusion/geoarrow optional viageofeature flag lance-format/lance#5934)InvalidInput(feat: surface ambiguous merge insert error asInvalidInputlance-format/lance#6048)DatasetPreFilter(#6083)use_scalar_indexparam in Java scanner (#5487)indexestoindicesfor uniformity (#6113)fetch_arrow_tablewithto_arrow_table(#6146)