Skip to content

feat: add range to External blob#5765

Merged
Xuanwo merged 4 commits intolance-format:mainfrom
wojiaodoubao:external-packed
Feb 7, 2026
Merged

feat: add range to External blob#5765
Xuanwo merged 4 commits intolance-format:mainfrom
wojiaodoubao:external-packed

Conversation

@wojiaodoubao
Copy link
Copy Markdown
Contributor

A common practice for organizing multimodal data is to package a large number of scattered files into several large archives (such as tar or zip files), which can prevent excessive inode consumption, achieve higher read speeds via streaming, and avoid inconsistencies between metadata and the scattered stored files.

To better support such archives, I suggest we extend the ExternalPacked format for Lance BlobV2. For example linking images from the archives to a Lance table.

@github-actions github-actions Bot added enhancement New feature or request python labels Jan 21, 2026
@wojiaodoubao wojiaodoubao marked this pull request as draft January 21, 2026 03:59
@wojiaodoubao wojiaodoubao marked this pull request as ready for review January 21, 2026 06:03
@wojiaodoubao wojiaodoubao force-pushed the external-packed branch 2 times, most recently from e8038ba to ad3fcb1 Compare January 21, 2026 06:51
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 21, 2026

Codecov Report

❌ Patch coverage is 75.82418% with 22 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/blob.rs 33.33% 16 Missing and 4 partials ⚠️
rust/lance-encoding/src/encodings/logical/blob.rs 96.72% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Xuanwo
Copy link
Copy Markdown
Collaborator

Xuanwo commented Jan 21, 2026

This idea is kinda interesting. So your idea focuses more on reading an external packed blob, and Lance won't generate such a blob itself, is that correct?

@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

So your idea focuses more on reading an external packed blob, and Lance won't generate such a blob itself, is that correct?

Yes.

@Xuanwo
Copy link
Copy Markdown
Collaborator

Xuanwo commented Jan 22, 2026

Yes.

Then I think we don't need to add a new type. We can simply enable reading a range from an external blob.

@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

We can simply enable reading a range from an external blob.

Do you mean we could enable External BlobFile to handle any arbitrary range(currently the position is always 0), thus eliminating the need to introduce ExternalPacked as an extra component?

It makes sense to me, let me update the pr.

@Xuanwo
Copy link
Copy Markdown
Collaborator

Xuanwo commented Jan 22, 2026

Do you mean we could enable External BlobFile to handle any arbitrary range(currently the position is always 0), thus eliminating the need to introduce ExternalPacked as an extra component?

Yes!

@wojiaodoubao
Copy link
Copy Markdown
Contributor Author

Hi @Xuanwo , this pr is ready for review now, please help when you have time, thanks very much.

Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! It's a nice addition to the blob v2.

Comment thread rust/lance-core/src/datatypes.rs Outdated
let data_field = Arc::new(ArrowField::new("data", DataType::LargeBinary, true));
let uri_field = Arc::new(ArrowField::new("uri", DataType::Utf8, true));
let blob_id_field = Arc::new(ArrowField::new("blob_id", DataType::UInt32, true));
let blob_size_field = Arc::new(ArrowField::new("blob_size", DataType::UInt64, true));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field should be size?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is blob_size like test_blob_v2_dedicated_round_trip. It tests BlobV2StructuralEncoder encoding External with position and blob_size.

Comment thread rust/lance/src/dataset/blob.rs Outdated
Comment thread python/python/lance/blob.py Outdated
@wojiaodoubao wojiaodoubao changed the title feat: add ExternalPacked blob kind for blob v2 feat: add range to External blob Jan 28, 2026
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thank you!

@Xuanwo
Copy link
Copy Markdown
Collaborator

Xuanwo commented Feb 6, 2026

Test encodings::logical::blob::tests::test_blob_v2_external_with_range_round_trip failed.

@Xuanwo Xuanwo merged commit e1489ab into lance-format:main Feb 7, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants