Skip to content

feat: expose blob handling APIs to python#5790

Merged
Xuanwo merged 7 commits intomainfrom
luban/found-sick
Jan 23, 2026
Merged

feat: expose blob handling APIs to python#5790
Xuanwo merged 7 commits intomainfrom
luban/found-sick

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Jan 22, 2026

This PR will expose blob handling APIs to python so that users can just scan all blobs as binary.


Parts of this PR were drafted with assistance from Codex (with gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.

@github-actions github-actions Bot added enhancement New feature or request python labels Jan 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review

This PR exposes the blob_handling scanner option to Python, enabling users to control how blob columns are returned when scanning.

Summary

The implementation correctly mirrors the existing Rust BlobHandling enum API and follows established patterns in the codebase.

Feedback

P1 - Missing variant in Python API: The Rust BlobHandling enum has 5 variants (AllBinary, BlobsDescriptions, AllDescriptions, SomeBlobsBinary, SomeBinary), but only 3 are exposed to Python. Consider documenting this limitation or adding support for the SomeBlobsBinary/SomeBinary variants (which take field id sets) if there are use cases.

P1 - Test coverage is minimal: The new test only covers the all_binary case. Consider adding tests for:

  • blobs_descriptions (the default behavior)
  • all_descriptions
  • Invalid input handling (the validation error path)

The implementation itself looks correct and follows the existing patterns.

Comment thread python/python/lance/dataset.py Outdated
Xuanwo and others added 2 commits January 23, 2026 02:05
@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Jan 22, 2026

Thank you @wjones127 for the help of review!

@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Jan 22, 2026

The python lint job should be fixed by #5788

@Xuanwo Xuanwo merged commit dd55d07 into main Jan 23, 2026
12 checks passed
@Xuanwo Xuanwo deleted the luban/found-sick branch January 23, 2026 04:20
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 23, 2026
This PR will expose blob handling APIs to python so that users can just
scan all blobs as binary.

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
majin1102 pushed a commit to majin1102/lance that referenced this pull request Jan 23, 2026
This PR will expose blob handling APIs to python so that users can just
scan all blobs as binary.

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
jackye1995 pushed a commit that referenced this pull request Jan 23, 2026
This PR will expose blob handling APIs to python so that users can just
scan all blobs as binary.

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Xuanwo added a commit that referenced this pull request Jan 26, 2026
This PR will expose blob handling APIs to python so that users can just
scan all blobs as binary.

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
jackye1995 pushed a commit that referenced this pull request Jan 26, 2026
This PR will expose blob handling APIs to python so that users can just
scan all blobs as binary.

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
vivek-bharathan pushed a commit to vivek-bharathan/lance that referenced this pull request Feb 2, 2026
This PR will expose blob handling APIs to python so that users can just
scan all blobs as binary.

---

**Parts of this PR were drafted with assistance from Codex (with
`gpt-5.2`) and fully reviewed and edited by me. I take full
responsibility for all changes.**

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants