Skip to content

feat(blob_v2): add Python API for Blob v2#5491

Merged
Xuanwo merged 9 commits intomainfrom
xuanwo/blobv2-py-api
Dec 18, 2025
Merged

feat(blob_v2): add Python API for Blob v2#5491
Xuanwo merged 9 commits intomainfrom
xuanwo/blobv2-py-api

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Dec 16, 2025

This PR will expose blob v2 to python API, allow users to write blob v2 data.

Well, I don't have much experience in designing Python APIs. @westonpace, could you please take a look at this shape?


Parts of this PR were drafted with assistance from Codex (with gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.

@github-actions github-actions Bot added enhancement New feature or request python labels Dec 16, 2025
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread python/src/dataset.rs Outdated
@Xuanwo Xuanwo requested a review from westonpace December 16, 2025 15:46
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine as input. However, I think the weirdest thing about blobs is going to be the fact that the output is not the same as the input. Right now the input is:

{
  "data": large_binary,
  "uri": utf8,
}

However, the output is the descriptions and the descriptions contain a number of fields (e.g. blob_id) which are internal details that don't make sense to the user. I'm wondering if we can, at some point, unify these two things. For example, when the user reads blob data, we can convert the blob_id, data file, etc. into a URI. So what if we add to the blob array position and length (with length=-1 meaning the whole file) and so the input and output are the same? The only difference is that the input might be a mix of data and uri but the output would always have data be null and uri be set?

Comment thread rust/lance-core/src/datatypes/field.rs Outdated
Comment on lines +685 to +687
// Blob v2 columns are special: they can have different struct layouts
// (logical input vs. descriptor struct). We treat blob v2 structs as opaque
// during schema set operations (union/subtract).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'd use the word "opaque" here. Maybe "we treat blob v2 structs as primitive fields (like a binary column) during schema set operations?

@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Dec 18, 2025

So what if we add to the blob array position and length (with length=-1 meaning the whole file) and so the input and output are the same? The only difference is that the input might be a mix of data and uri but the output would always have data be null and uri be set?

Seems like a nice idea to me, will create a follow up issue for this idea.

@Xuanwo Xuanwo merged commit c7dd850 into main Dec 18, 2025
26 checks passed
@Xuanwo Xuanwo deleted the xuanwo/blobv2-py-api branch December 18, 2025 14:27
wjones127 pushed a commit to wjones127/lance that referenced this pull request Dec 30, 2025
This PR will expose blob v2 to python API, allow users to write blob v2
data.

Well, I don't have much experience in designing Python APIs.
@westonpace, could you please take a look at this shape?

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
This PR will expose blob v2 to python API, allow users to write blob v2
data.

Well, I don't have much experience in designing Python APIs.
@westonpace, could you please take a look at this shape?

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants