feat(reader): null struct default values in create_column by mbutrovich · Pull Request #1847 · apache/iceberg-rust

mbutrovich · 2025-11-12T16:35:01Z

Fixes TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups in Iceberg Java 1.10 with DataFusion Comet.

Which issue does this PR close?

Partially address ArrowReader enhancements for Apache DataFusion Comet #1749.

What changes are included in this PR?

While RecordBatchTransformer does not have exhaustive nested type support yet, this adds logic to create_column in the specific scenario for a schema evolution with a new struct column that uses the default NULL value.
If the column has a default value other than NULL defined, it will fall into the existing match arm and say it is unsupported.

Are these changes tested?

New test to reflect what happens with Iceberg Java 1.10's TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups. The test is misleading, since I figured testing positional deletes would just be a delete vector and be schema agnostic, but it includes schema change with binary and struct types so we need default NULL values.

…etes.testPosDeletesOnParquetFileWithMultipleRowGroups in Iceberg Java 1.10 with DataFusion Comet.

liurenjie1024

Thanks @mbutrovich for this fix!

Fixes `TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups` in Iceberg Java 1.10 with DataFusion Comet. ## Which issue does this PR close? - Partially address apache#1749. ## What changes are included in this PR? - While `RecordBatchTransformer` does not have exhaustive nested type support yet, this adds logic to `create_column` in the specific scenario for a schema evolution with a new struct column that uses the default NULL value. - If the column has a default value other than NULL defined, it will fall into the existing match arm and say it is unsupported. ## Are these changes tested? New test to reflect what happens with Iceberg Java 1.10's `TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`. The test is misleading, since I figured testing positional deletes would just be a delete vector and be schema agnostic, but [it includes schema change with binary and struct types so we need default NULL values](https://github.com/apache/iceberg/blob/53c046efda5d6c6ac67caf7de29849ab7ac6d406/data/src/test/java/org/apache/iceberg/data/DeleteReadTests.java#L65). (cherry picked from commit 12c4c21)

… (#127) * feat(reader): null struct default values in create_column (apache#1847) Fixes `TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups` in Iceberg Java 1.10 with DataFusion Comet. ## Which issue does this PR close? - Partially address apache#1749. ## What changes are included in this PR? - While `RecordBatchTransformer` does not have exhaustive nested type support yet, this adds logic to `create_column` in the specific scenario for a schema evolution with a new struct column that uses the default NULL value. - If the column has a default value other than NULL defined, it will fall into the existing match arm and say it is unsupported. ## Are these changes tested? New test to reflect what happens with Iceberg Java 1.10's `TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`. The test is misleading, since I figured testing positional deletes would just be a delete vector and be schema agnostic, but [it includes schema change with binary and struct types so we need default NULL values](https://github.com/apache/iceberg/blob/53c046efda5d6c6ac67caf7de29849ab7ac6d406/data/src/test/java/org/apache/iceberg/data/DeleteReadTests.java#L65). (cherry picked from commit 12c4c21) * fix tests * fmt * fix ci * fix ci * fix ci * clippy * fix ci * fix ci * fix ci * fix ci * fix --------- Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>

… (#127) * feat(reader): null struct default values in create_column (apache#1847) Fixes `TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups` in Iceberg Java 1.10 with DataFusion Comet. - Partially address apache#1749. - While `RecordBatchTransformer` does not have exhaustive nested type support yet, this adds logic to `create_column` in the specific scenario for a schema evolution with a new struct column that uses the default NULL value. - If the column has a default value other than NULL defined, it will fall into the existing match arm and say it is unsupported. New test to reflect what happens with Iceberg Java 1.10's `TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`. The test is misleading, since I figured testing positional deletes would just be a delete vector and be schema agnostic, but [it includes schema change with binary and struct types so we need default NULL values](https://github.com/apache/iceberg/blob/53c046efda5d6c6ac67caf7de29849ab7ac6d406/data/src/test/java/org/apache/iceberg/data/DeleteReadTests.java#L65). (cherry picked from commit 12c4c21) * fix tests * fmt * fix ci * fix ci * fix ci * clippy * fix ci * fix ci * fix ci * fix ci * fix --------- Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>

## Which issue does this PR close? Similar to #1847 - Closes #. ## What changes are included in this PR? - RecordBatchTransformer does not support timestamp type. This PR adds logic to create_column in the specific scenario for a schema evolution with a new timestamp column. ## Are these changes tested?  2 unit tests test_create_timestamp_microsecond_with_timezone_array_repeated and test_create_timestamp_microsecond_array_repeated are added.

## Which issue does this PR close? Similar to apache#1847 - Closes #. ## What changes are included in this PR? - RecordBatchTransformer does not support timestamp type. This PR adds logic to create_column in the specific scenario for a schema evolution with a new timestamp column. ## Are these changes tested?  2 unit tests test_create_timestamp_microsecond_with_timezone_array_repeated and test_create_timestamp_microsecond_array_repeated are added.

Null struct default values in create_column. Fixes TestSparkReaderDel…

66fed22

…etes.testPosDeletesOnParquetFileWithMultipleRowGroups in Iceberg Java 1.10 with DataFusion Comet.

mbutrovich mentioned this pull request Nov 12, 2025

ArrowReader enhancements for Apache DataFusion Comet #1749

Open

15 tasks

mbutrovich changed the title ~~fix(reader): null struct default values in create_column~~ feat(reader): null struct default values in create_column Nov 12, 2025

Fix clippy.

37b048a

mbutrovich mentioned this pull request Nov 12, 2025

Tracking issues of Iceberg Rust 0.8 Release #1850

Closed

17 tasks

liurenjie1024 approved these changes Nov 13, 2025

View reviewed changes

liurenjie1024 merged commit 12c4c21 into apache:main Nov 13, 2025
18 checks passed

chenzl25 mentioned this pull request Feb 26, 2026

pick feat(reader): null struct default values in create_column (#1847) risingwavelabs/iceberg-rust#127

Merged

This was referenced Feb 26, 2026

feat(reader): support timestamp type in create_column #2180

Merged

[Cherry-pick Planning] Commit inventory for dev_rebase_main_20251111 against main (43 commits) risingwavelabs/iceberg-rust#129

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reader): null struct default values in create_column#1847

feat(reader): null struct default values in create_column#1847
liurenjie1024 merged 2 commits intoapache:mainfrom
mbutrovich:null_struct_add_column

mbutrovich commented Nov 12, 2025

Uh oh!

liurenjie1024 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mbutrovich commented Nov 12, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants