Skip to content

perf: create local writer for efficient local writes#5939

Merged
wjones127 merged 10 commits intolance-format:mainfrom
wkalt:task/add-local-writer
Feb 16, 2026
Merged

perf: create local writer for efficient local writes#5939
wjones127 merged 10 commits intolance-format:mainfrom
wkalt:task/add-local-writer

Conversation

@wkalt
Copy link
Copy Markdown
Contributor

@wkalt wkalt commented Feb 11, 2026

This creates a new LocalWriter that wraps tokio::fs::File in a BufWriter for local file writes. ObjectStore::create() now returns one of these when working against local storage, and an ObjectWriter for remote storage.

Prior to this commit, local writes (e.g for shuffling) went through a local object writer implementation that required a 5MB buffer per writer and also simulated multipart upload machinery. For local writing, this is slower than necessary and uses a lot of memory in situations where many writers are open at once.

This change results in a substantial memory reduction and incremental speedup for IVF shuffle.

@github-actions github-actions Bot added the enhancement New feature or request label Feb 11, 2026
@wkalt
Copy link
Copy Markdown
Contributor Author

wkalt commented Feb 11, 2026

This supersedes #5907. It maintains the same performance WRT memory and yields an incremental speedup.
progress

@wkalt wkalt changed the title feat: create local writer for efficient local writes perf: create local writer for efficient local writes Feb 11, 2026
Comment thread rust/lance-io/src/object_writer.rs Outdated
Comment on lines +578 to +590
temp_path.persist(&final_path).map_err(|e| {
Error::io(
format!("failed to persist temp file to {}: {}", final_path, e.error),
location!(),
)
})?;

let metadata = std::fs::metadata(&final_path).map_err(|e| {
Error::io(
format!("failed to read metadata for {}: {}", self.path, e),
location!(),
)
})?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be wrapped in a tokio::task::spawn_blocking() since they will both make blocking sys calls, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, updated + one more spot

Comment thread rust/lance-io/Cargo.toml Outdated
Comment thread rust/lance-io/benches/write.rs Outdated
wjones127 and others added 10 commits February 16, 2026 21:45
This creates a new LocalWriter that wraps tokio::fs::File in a BufWriter
for local file writes. ObjectStore::create() now returns one of these
when working against local storage, and an ObjectWriter for remote
storage.

Prior to this commit, local writes (e.g for shuffling) went through a
local object writer implementation that required a 5MB buffer per writer
and also simulated multipart upload machinery. For local writing, this
is slower than necessary and uses a lot of memory in situations where
many writers are open at once.

This change results in a substantial memory reduction and incremental
speedup for IVF shuffle.
@wkalt wkalt force-pushed the task/add-local-writer branch from dca2234 to f47a40e Compare February 16, 2026 21:46
@wkalt wkalt requested a review from wjones127 February 16, 2026 22:25
@wjones127 wjones127 merged commit c878af4 into lance-format:main Feb 16, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants