Skip to content

refactor: introduce SchemaAdapter to perform logical/physical transform#5096

Merged
Xuanwo merged 2 commits intomainfrom
refactor-jsonb
Nov 3, 2025
Merged

refactor: introduce SchemaAdapter to perform logical/physical transform#5096
Xuanwo merged 2 commits intomainfrom
refactor-jsonb

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Oct 29, 2025

Close #4595

This PR introduces a SchemaAdapter to handle logical and physical transformations.

I’m wondering if this is the right place, or if we need a proper logical type system. I recall @westonpace saying we should remove the logical type concept entirely. So what’s our plan for handling conversions like jsonb to json or large_string to binary?


This PR was primarily authored with Codex using GPT-5-Codex and then hand-reviewed by me. I AM responsible for every change made in this PR. I aimed to keep it aligned with our goals, though I may have missed minor issues. Please flag anything that feels off, I'll fix it quickly.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 85.39326% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.69%. Comparing base (e3cf56a) to head (88e4cc9).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rust/lance/src/dataset/utils.rs 84.14% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5096      +/-   ##
==========================================
- Coverage   81.72%   81.69%   -0.04%     
==========================================
  Files         340      340              
  Lines      138875   138886      +11     
  Branches   138875   138886      +11     
==========================================
- Hits       113498   113461      -37     
- Misses      21636    21685      +49     
+ Partials     3741     3740       -1     
Flag Coverage Δ
unittests 81.69% <85.39%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. In a perfect world I think maybe we would have something like...

fn to_physical_stream(&self, stream: SendableRecordBatchStream) -> PhysicalRecordBatchStream;

This way the compiler could make sure that we aren't accidentally using the logical stream in places we should be using the physical stream (or vice-versa).

However, we can think about this as a todo for now.

@Xuanwo Xuanwo merged commit b9bfa56 into main Nov 3, 2025
39 of 40 checks passed
@Xuanwo Xuanwo deleted the refactor-jsonb branch November 3, 2025 17:58
Xuanwo added a commit that referenced this pull request Nov 4, 2025
Our users hit a bug where appending data with a casted schema that
includes JSON results in "invalid jsonb". I believe
#5096 is the correct fix, but it
introduces a whole new concept and needs more time to review. So I
created this PR as a hotfix.

This PR updates the schema on the Python side to ensure users see the
correct schema.


---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

---------

Signed-off-by: Xuanwo <github@xuanwo.io>
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
…rm (lance-format#5096)

Close lance-format#4595

This PR introduces a SchemaAdapter to handle logical and physical
transformations.

I’m wondering if this is the right place, or if we need a proper logical
type system. I recall @westonpace saying we should remove the logical
type concept entirely. So what’s our plan for handling conversions like
jsonb to json or large_string to binary?

---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

Signed-off-by: Xuanwo <github@xuanwo.io>
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
Our users hit a bug where appending data with a casted schema that
includes JSON results in "invalid jsonb". I believe
lance-format#5096 is the correct fix, but it
introduces a whole new concept and needs more time to review. So I
created this PR as a hotfix.

This PR updates the schema on the Python side to ensure users see the
correct schema.


---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

---------

Signed-off-by: Xuanwo <github@xuanwo.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor JSONB and JSON conversion logic

3 participants