feat: add range to External blob#5765
Conversation
6973c77 to
ad116b8
Compare
e8038ba to
ad3fcb1
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
This idea is kinda interesting. So your idea focuses more on reading an external packed blob, and Lance won't generate such a blob itself, is that correct? |
Yes. |
Then I think we don't need to add a new type. We can simply enable reading a range from an external blob. |
Do you mean we could enable External BlobFile to handle any arbitrary range(currently the position is always 0), thus eliminating the need to introduce ExternalPacked as an extra component? It makes sense to me, let me update the pr. |
Yes! |
ad3fcb1 to
9f40f16
Compare
|
Hi @Xuanwo , this pr is ready for review now, please help when you have time, thanks very much. |
Xuanwo
left a comment
There was a problem hiding this comment.
Thank you for working on this! It's a nice addition to the blob v2.
| let data_field = Arc::new(ArrowField::new("data", DataType::LargeBinary, true)); | ||
| let uri_field = Arc::new(ArrowField::new("uri", DataType::Utf8, true)); | ||
| let blob_id_field = Arc::new(ArrowField::new("blob_id", DataType::UInt32, true)); | ||
| let blob_size_field = Arc::new(ArrowField::new("blob_size", DataType::UInt64, true)); |
There was a problem hiding this comment.
This field should be size?
There was a problem hiding this comment.
This is blob_size like test_blob_v2_dedicated_round_trip. It tests BlobV2StructuralEncoder encoding External with position and blob_size.
9f40f16 to
006d613
Compare
006d613 to
88b9e18
Compare
|
Test |
88b9e18 to
bf599ca
Compare
bf599ca to
b5b1bca
Compare
A common practice for organizing multimodal data is to package a large number of scattered files into several large archives (such as tar or zip files), which can prevent excessive inode consumption, achieve higher read speeds via streaming, and avoid inconsistencies between metadata and the scattered stored files.
To better support such archives, I suggest we extend the ExternalPacked format for Lance BlobV2. For example linking images from the archives to a Lance table.