feat: add lance_dataset_write for create/append/overwrite from ArrowArrayStream#16
Open
LuciferYang wants to merge 2 commits intolance-format:mainfrom
Open
feat: add lance_dataset_write for create/append/overwrite from ArrowArrayStream#16LuciferYang wants to merge 2 commits intolance-format:mainfrom
LuciferYang wants to merge 2 commits intolance-format:mainfrom
Conversation
…rrayStream
Writes an ArrowArrayStream into a Lance dataset with a committed manifest.
A mode enum (CREATE / APPEND / OVERWRITE) and an optional out_dataset that
returns the open dataset at the new version (so callers don't need to
reopen). Structure follows fragment_writer.rs: schema fail-fast, storage
options pass-through, thread-local errors. C++ gets a scoped WriteMode
enum and a lance::Dataset::write() static method that reads the stream's
schema automatically.
Two FFI details:
- mode is received as i32 and validated via LanceWriteMode::from_raw;
accepting the enum directly would be UB for out-of-range values.
- The stream is consumed via ArrowArrayStreamReader::from_raw right
after the NULL check so the "consumed on any return" contract holds.
Tests: 11 new Rust unit tests (CREATE/APPEND/OVERWRITE happy paths,
OVERWRITE-on-missing, CREATE-on-existing, schema mismatches, empty
stream, NULL args, invalid mode, out_dataset propagation). The ignored
C and C++ integration tests now do a scan->write round-trip.
Closes lance-format#14.
b3e813c to
8a9325e
Compare
The LanceWriteMode doc referenced `LanceWriteMode::from_raw`, which is private. rustdoc -D warnings (the Rustdoc CI job) flags this as `private_intra_doc_links`. Rewording to describe the validation behavior without naming the private function. No API changes.
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lance_dataset_write(uri, schema, stream, mode, storage_opts, out_dataset)— writes anArrowArrayStreaminto a Lance dataset with a committed manifestLanceWriteModecoversCREATE/APPEND/OVERWRITEout_datasethands back an openLanceDataset*at the new version so callers don't need to reopenlance::Dataset::write(...)static method inlance.hppMotivation
Until now the C/C++ path only produced uncommitted fragment files (#5).
lance_dataset_writecloses the primary write path and unblocks the rest of Phase 3 (delete, update, merge-insert, schema evolution), which all need a way to create a dataset first.Notes
modeis received asint32_tand validated viaLanceWriteMode::from_raw. Accepting the enum directly would be UB for out-of-range values from C.ArrowArrayStreamReader::from_rawimmediately after the NULL check, so the "consumed on any return" contract holds on every error path.Test plan
cargo test— 55 integration tests, 11 new (CREATE/APPEND/OVERWRITE happy paths, OVERWRITE on a missing path, CREATE on an existing path, schema mismatches, empty stream, NULL args, invalid mode,out_datasetpropagation)cargo clippy --all-targets -- -D warningscleancargo fmt --checkcleancargo test --test compile_and_run_test -- --ignored— C and C++ scan→write round-trips passCloses #14.