feat(blob_v2): add external blob support#5385
Conversation
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
…review Signed-off-by: Xuanwo <github@xuanwo.io>
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
| for i in 0..binary_array.len() { | ||
| let is_null_row = match array.data_type() { | ||
| DataType::Struct(_) => array.is_null(i), | ||
| _ => binary_array.is_null(i), | ||
| }; | ||
| if is_null_row { | ||
| kind_builder.append_null(); | ||
| position_builder.append_null(); | ||
| size_builder.append_null(); | ||
| blob_id_builder.append_null(); | ||
| uri_builder.append_null(); |
There was a problem hiding this comment.
Are these changes forwards / backwards compatible?
There was a problem hiding this comment.
Yes, I believe so. No blob v2 files have been written so far. It's part of file format 2.2
There was a problem hiding this comment.
Ah, I had forgotten we added a dedicated encoded / decoder for v2.
| let data_is_set = !data_col.is_null(i); | ||
| if data_is_set { | ||
| let value = binary_array.value(i); | ||
| kind_builder.append_value(0); |
There was a problem hiding this comment.
Can we use constants for kinds? or an enum that converts to/from an integer?
There was a problem hiding this comment.
Makes sense to me, will update.
| uri_builder.append_value(""); | ||
| } else { | ||
| // external uri | ||
| let uri = uri_col.value(i); |
There was a problem hiding this comment.
Do we need to check if the URI is empty?
There was a problem hiding this comment.
Good idea, we can check before writing.
| kind_builder.append_value(0); | ||
| position_builder.append_value(0); | ||
| size_builder.append_value(0); | ||
| blob_id_builder.append_value(0); |
There was a problem hiding this comment.
It looks like blob_id is always set to zero. Can you remind me what this is for again?
There was a problem hiding this comment.
blob_id == 0 means it's empty (the valid value starts from 1).
And blob_id reprents the id used by packed & dedicated blobs which are both not implemented yet.
There was a problem hiding this comment.
What is the !data_is_set branch then? I thought an inline blob was "data column is not null and uri is null" and a dedicated / packed blob is "data column is null and uri is not null"?
There was a problem hiding this comment.
Ah, I see the enum now. So the !data_is_set approach is external.
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
This PR will add external blob support: users can now write an external blob in lance and read it back. In this PR, I changed the blob struct's fields to be `nullable: false` because our packed struct doesn't allow nullable fields. --- **Parts of this PR were drafted with assistance from Codex (with `gpt-5.1-codex-max`) and fully reviewed and edited by me. I take full responsibility for all changes.** --------- Signed-off-by: Xuanwo <github@xuanwo.io>
This PR will add external blob support: users can now write an external blob in lance and read it back.
In this PR, I changed the blob struct's fields to be
nullable: falsebecause our packed struct doesn't allow nullable fields.Parts of this PR were drafted with assistance from Codex (with
gpt-5.1-codex-max) and fully reviewed and edited by me. I take full responsibility for all changes.