Skip to content

fix: apply SchemaAdapter in Updater#5928

Merged
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/json-add-columns
Feb 11, 2026
Merged

fix: apply SchemaAdapter in Updater#5928
westonpace merged 2 commits intolance-format:mainfrom
westonpace:fix/json-add-columns

Conversation

@westonpace
Copy link
Copy Markdown
Member

This will ensure JSON columns are properly converted in paths like add_columns

@westonpace westonpace marked this pull request as draft February 10, 2026 17:27
@westonpace
Copy link
Copy Markdown
Member Author

Leaving as draft until #5927 merges

@github-actions github-actions Bot added bug Something isn't working python labels Feb 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review

Summary: This PR fixes JSON column handling in the add_columns path by applying SchemaAdapter in the Updater and preserving field metadata in merge_insert.

Analysis

The fix is correct: The changes address two issues where JSON extension metadata was being lost:

  1. updater.rs: The Updater now uses SchemaAdapter.to_physical_batch() to convert Arrow JSON extension types to Lance's internal JSON format before writing. This ensures JSON columns added via add_columns are properly stored.

  2. merge_insert/exec/write.rs: Changed from Field::new(name, type, nullable) to field.clone(). The original code discarded field metadata (including extension type info), causing JSON columns to lose their type information during merge_insert operations.

Notes

  • The lazy initialization of schema_adapter (created on first batch) is appropriate since the data schema isn't known until the first batch arrives.
  • The to_physical_batch method is a simple helper that checks requires_physical_conversion() before calling convert_json_columns(), avoiding unnecessary work for non-JSON data.
  • Test coverage is comprehensive, covering append, compaction, add_columns, and merge_insert operations with JSON data.

No P0/P1 issues identified.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/updater.rs 87.50% 0 Missing and 1 partial ⚠️
rust/lance/src/dataset/utils.rs 80.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@westonpace westonpace marked this pull request as ready for review February 10, 2026 18:57
@westonpace westonpace merged commit dcb5bd8 into lance-format:main Feb 11, 2026
29 checks passed
jackye1995 pushed a commit that referenced this pull request Feb 11, 2026
This will ensure JSON columns are properly converted in paths like
add_columns
jackye1995 pushed a commit that referenced this pull request Feb 12, 2026
This will ensure JSON columns are properly converted in paths like
add_columns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants