Skip to content

iceberg: fix schema column ordering#4373

Merged
josephwoodward merged 4 commits intomainfrom
jw/iceberg-schema-column-ordering
May 1, 2026
Merged

iceberg: fix schema column ordering#4373
josephwoodward merged 4 commits intomainfrom
jw/iceberg-schema-column-ordering

Conversation

@josephwoodward
Copy link
Copy Markdown
Contributor

@josephwoodward josephwoodward commented Apr 30, 2026

Preserve schema registry column order when creating Iceberg tables. Column order was lost because buildSchemaWithResolver iterated over a map[string]any, which Go randomises. The fix reads the ordered field list from schema.Common metadata when present, falling back to sorted keys for determinism when it isn't.

image

Preserve schema registry column order when creating Iceberg tables. Column order was lost because buildSchemaWithResolver iterated over a map[string]any, which Go randomises. The fix reads the ordered field list from schema.Common metadata when present, falling back to sorted keys for determinism when it isn't.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Commits
LGTM

Review
Small, focused change that preserves schema-registry-defined column order at table creation by reading schema.Common top-level field names from message metadata, falling back to sorted keys when metadata is absent or unparseable. The new topLevelFieldOrder helper is well-scoped, and the unit test exercises the ordering path with a deliberately non-alphabetical schema.

LGTM

Jeffail added 2 commits April 30, 2026 16:11
…ering

Build-time column ordering now parses the schema metadata once at the
call site and threads the parsed common through both ordering and
per-field type resolution. The merge keeps metadata-declared fields
first, falling back to a case-folded record key lookup so canonical
lowercase metadata still matches arbitrarily-cased record keys, then
appends any record keys that aren't in the metadata in sorted order so
they're never silently dropped from create-table.

Top-level metadata that declares two children differing only in case
under case-insensitive mode is now rejected at create time, matching
the existing guard for nested struct metadata.
In case-insensitive mode the build path now uses the metadata's casing
for any top-level column whose record key only matched by case-folding.
The record key is still used to look up the value, but the iceberg
column lands with the metadata's name — matching the rule already in
effect for nested struct fields supplied via schema metadata.

The schema_metadata field description now spells out that record
presence drives schema shape: metadata-only fields are not added to
the table, and metadata controls ordering, naming, and types only for
fields the record carries.

Adds a CHANGELOG entry covering the column-ordering work on this branch.
@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Commits
LGTM

Review
The PR fixes non-deterministic column ordering in buildSchemaWithResolver by parsing schema.Common metadata once and threading the parsed result through ordering and per-field type resolution. Metadata-declared fields come first in their declared order; record-only fields get appended in sorted order so nothing is silently dropped. In case-insensitive mode, the metadata's casing wins for the column name while the record's key is still used for value lookup, mirroring how nested struct metadata already behaves. The case-only duplicate guard on top-level metadata, plus the existing record-side guard, jointly prevent ambiguous column creation. Tests cover ordering preservation, record-only-field appending, case-insensitive matching, metadata duplicate rejection, type override across case mismatch, and bloblang mapping casing.

LGTM

@claude
Copy link
Copy Markdown

claude Bot commented Apr 30, 2026

Commits
LGTM

Review
Fixes a real determinism bug where Go's randomized map iteration in buildSchemaWithResolver caused unstable column ordering at create-table. Refactor parses schema metadata once per build, threads through ordering and per-field type resolution, falls back to sorted keys when metadata is absent, and applies metadata casing for matched columns. Test coverage is thorough across the metadata-driven and record-only paths.

LGTM

@josephwoodward josephwoodward merged commit 74f7d8b into main May 1, 2026
6 of 7 checks passed
@josephwoodward josephwoodward deleted the jw/iceberg-schema-column-ordering branch May 1, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants