Skip to content

fix: don't drop field metadata on merge insert path#5927

Merged
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/json-merge-insert
Feb 10, 2026
Merged

fix: don't drop field metadata on merge insert path#5927
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/json-merge-insert

Conversation

@westonpace
Copy link
Copy Markdown
Member

There are various ways to write data and several of them are failing for JSON data at the moment because it requires a conversion from logical to physical representation. I'd like to rework this more generally but I want to get the current implementation working first. This fixes one of the merge_insert paths which is currently failing because the field metadata is lost as part of the operation (and so the data appears to be a normal string column and is not converted)

@github-actions github-actions Bot added bug Something isn't working python labels Feb 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review

Summary: This PR fixes a bug where field metadata was lost during merge_insert operations by cloning the entire field instead of reconstructing it without metadata. The fix is minimal and correct.

✅ Review Result: Approved

The fix is straightforward and correct:

// Before: Field metadata (including JSON extension type info) was lost
Arc::new(arrow_schema::Field::new(field.name(), field.data_type().clone(), field.is_nullable()))

// After: Full field clone preserves metadata
Arc::new(field.clone())

Test coverage: Comprehensive tests added for JSON data across multiple operations (append, compaction, merge_insert).

No blocking issues found.

@westonpace
Copy link
Copy Markdown
Member Author

Possibly related to #4831 but I don't think this closes that as I don't have all merge insert paths tested yet (and I know some of them use the Updater which also does not properly convert schemas yet)

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@westonpace westonpace merged commit aa449e4 into lance-format:main Feb 10, 2026
29 checks passed
github-actions Bot pushed a commit that referenced this pull request Feb 11, 2026
There are various ways to write data and several of them are failing for
JSON data at the moment because it requires a conversion from logical to
physical representation. I'd like to rework this more generally but I
want to get the current implementation working first. This fixes one of
the merge_insert paths which is currently failing because the field
metadata is lost as part of the operation (and so the data appears to be
a normal string column and is not converted)
jackye1995 pushed a commit that referenced this pull request Feb 12, 2026
There are various ways to write data and several of them are failing for
JSON data at the moment because it requires a conversion from logical to
physical representation. I'd like to rework this more generally but I
want to get the current implementation working first. This fixes one of
the merge_insert paths which is currently failing because the field
metadata is lost as part of the operation (and so the data appears to be
a normal string column and is not converted)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants