Skip to content

feat: add configurable blob v2 pack file size#6508

Merged
Xuanwo merged 2 commits intolance-format:mainfrom
hamersaw:feature/configure-blobv2-file-sizes
Apr 14, 2026
Merged

feat: add configurable blob v2 pack file size#6508
Xuanwo merged 2 commits intolance-format:mainfrom
hamersaw:feature/configure-blobv2-file-sizes

Conversation

@hamersaw
Copy link
Copy Markdown
Contributor

@hamersaw hamersaw commented Apr 14, 2026

Summary

  • Add blob_max_pack_file_bytes to WriteParams, allowing users to override the default 1 GiB maximum pack (.blob) sidecar file size
  • Thread the configuration through the full write path: WriteParams -> WriterGenerator -> WriterOptions -> BlobPreprocessor -> PackWriter
  • Expose the option in Python (write_dataset) and Java (WriteParams.Builder) bindings

Test plan

  • All 37 existing blob tests pass (cargo test -p lance blob)
  • Clippy clean on lance and lance-jni crates
  • Verify Python binding works end-to-end with blob_max_pack_file_bytes kwarg
  • Verify Java binding compiles with ./mvnw compile

🤖 Generated with Claude Code

Add blob_max_pack_file_bytes to WriteParams, allowing users to override
the default 1 GiB maximum pack (.blob) sidecar file size. Exposed in
Rust, Python, and Java APIs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added enhancement New feature or request python java labels Apr 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 59.09091% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/write.rs 62.50% 6 Missing ⚠️
rust/lance/src/dataset/blob.rs 50.00% 1 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

Comment thread python/python/lance/dataset.py Outdated
"target_bases": target_bases,
"external_blob_mode": external_blob_mode,
"allow_external_blob_outside_bases": allow_external_blob_outside_bases,
"blob_max_pack_file_bytes": blob_max_pack_file_bytes,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using options like blob_pack_file_size_threshold? Align better with existing naming.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I updated it!

Comment thread java/lance-jni/src/blocking_dataset.rs Outdated
&initial_bases,
&target_bases,
&JObject::null(), // allow_external_blob_outside_bases not used for Dataset.write()
&JObject::null(), // blob_max_pack_file_bytes not used for Dataset.write()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we never pass it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we were not passing the allow_external_blob_outside_bases option either. So I updated this to plumb both of them through so this is available using Dataset.write and not just through the fragment creation APIs in Java.

…hrough Dataset.write()

Rename blob_max_pack_file_bytes to blob_pack_file_size_threshold across
Rust, Python, and Java. Wire both allow_external_blob_outside_bases and
blob_pack_file_size_threshold through the Java Dataset.write() JNI path,
which previously hardcoded them to null/defaults.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Xuanwo Xuanwo merged commit 9672573 into lance-format:main Apr 14, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants