Skip to content

feat(reader): null struct default values in create_column#1847

Merged
liurenjie1024 merged 2 commits intoapache:mainfrom
mbutrovich:null_struct_add_column
Nov 13, 2025
Merged

feat(reader): null struct default values in create_column#1847
liurenjie1024 merged 2 commits intoapache:mainfrom
mbutrovich:null_struct_add_column

Conversation

@mbutrovich
Copy link
Copy Markdown
Collaborator

Fixes TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups in Iceberg Java 1.10 with DataFusion Comet.

Which issue does this PR close?

What changes are included in this PR?

  • While RecordBatchTransformer does not have exhaustive nested type support yet, this adds logic to create_column in the specific scenario for a schema evolution with a new struct column that uses the default NULL value.
  • If the column has a default value other than NULL defined, it will fall into the existing match arm and say it is unsupported.

Are these changes tested?

New test to reflect what happens with Iceberg Java 1.10's TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups. The test is misleading, since I figured testing positional deletes would just be a delete vector and be schema agnostic, but it includes schema change with binary and struct types so we need default NULL values.

…etes.testPosDeletesOnParquetFileWithMultipleRowGroups in Iceberg Java 1.10 with DataFusion Comet.
@mbutrovich mbutrovich changed the title fix(reader): null struct default values in create_column feat(reader): null struct default values in create_column Nov 12, 2025
Copy link
Copy Markdown
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich for this fix!

@liurenjie1024 liurenjie1024 merged commit 12c4c21 into apache:main Nov 13, 2025
18 checks passed
chenzl25 pushed a commit to risingwavelabs/iceberg-rust that referenced this pull request Feb 26, 2026
Fixes
`TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`
in Iceberg Java 1.10 with DataFusion Comet.

## Which issue does this PR close?

- Partially address apache#1749.

## What changes are included in this PR?

- While `RecordBatchTransformer` does not have exhaustive nested type
support yet, this adds logic to `create_column` in the specific scenario
for a schema evolution with a new struct column that uses the default
NULL value.
- If the column has a default value other than NULL defined, it will
fall into the existing match arm and say it is unsupported.

## Are these changes tested?

New test to reflect what happens with Iceberg Java 1.10's
`TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`.
The test is misleading, since I figured testing positional deletes would
just be a delete vector and be schema agnostic, but [it includes schema
change with binary and struct types so we need default NULL
values](https://github.com/apache/iceberg/blob/53c046efda5d6c6ac67caf7de29849ab7ac6d406/data/src/test/java/org/apache/iceberg/data/DeleteReadTests.java#L65).

(cherry picked from commit 12c4c21)
chenzl25 added a commit to risingwavelabs/iceberg-rust that referenced this pull request Feb 26, 2026
… (#127)

* feat(reader): null struct default values in create_column (apache#1847)

Fixes
`TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`
in Iceberg Java 1.10 with DataFusion Comet.

## Which issue does this PR close?

- Partially address apache#1749.

## What changes are included in this PR?

- While `RecordBatchTransformer` does not have exhaustive nested type
support yet, this adds logic to `create_column` in the specific scenario
for a schema evolution with a new struct column that uses the default
NULL value.
- If the column has a default value other than NULL defined, it will
fall into the existing match arm and say it is unsupported.

## Are these changes tested?

New test to reflect what happens with Iceberg Java 1.10's
`TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`.
The test is misleading, since I figured testing positional deletes would
just be a delete vector and be schema agnostic, but [it includes schema
change with binary and struct types so we need default NULL
values](https://github.com/apache/iceberg/blob/53c046efda5d6c6ac67caf7de29849ab7ac6d406/data/src/test/java/org/apache/iceberg/data/DeleteReadTests.java#L65).

(cherry picked from commit 12c4c21)

* fix tests

* fmt

* fix ci

* fix ci

* fix ci

* clippy

* fix ci

* fix ci

* fix ci

* fix ci

* fix

---------

Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
chenzl25 added a commit to risingwavelabs/iceberg-rust that referenced this pull request Mar 3, 2026
… (#127)

* feat(reader): null struct default values in create_column (apache#1847)

Fixes
`TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`
in Iceberg Java 1.10 with DataFusion Comet.

- Partially address apache#1749.

- While `RecordBatchTransformer` does not have exhaustive nested type
support yet, this adds logic to `create_column` in the specific scenario
for a schema evolution with a new struct column that uses the default
NULL value.
- If the column has a default value other than NULL defined, it will
fall into the existing match arm and say it is unsupported.

New test to reflect what happens with Iceberg Java 1.10's
`TestSparkReaderDeletes.testPosDeletesOnParquetFileWithMultipleRowGroups`.
The test is misleading, since I figured testing positional deletes would
just be a delete vector and be schema agnostic, but [it includes schema
change with binary and struct types so we need default NULL
values](https://github.com/apache/iceberg/blob/53c046efda5d6c6ac67caf7de29849ab7ac6d406/data/src/test/java/org/apache/iceberg/data/DeleteReadTests.java#L65).

(cherry picked from commit 12c4c21)

* fix tests

* fmt

* fix ci

* fix ci

* fix ci

* clippy

* fix ci

* fix ci

* fix ci

* fix ci

* fix

---------

Co-authored-by: Matt Butrovich <mbutrovich@users.noreply.github.com>
blackmwk pushed a commit that referenced this pull request Mar 10, 2026
## Which issue does this PR close?

Similar to #1847

- Closes #.

## What changes are included in this PR?

- RecordBatchTransformer does not support timestamp type. This PR adds
logic to create_column in the specific scenario for a schema evolution
with a new timestamp column.

## Are these changes tested?

<!--
Specify what test covers (unit test, integration test, etc.).

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

2 unit tests
test_create_timestamp_microsecond_with_timezone_array_repeated and
test_create_timestamp_microsecond_array_repeated are added.
gbrgr pushed a commit to RelationalAI/iceberg-rust that referenced this pull request Mar 10, 2026
## Which issue does this PR close?

Similar to apache#1847

- Closes #.

## What changes are included in this PR?

- RecordBatchTransformer does not support timestamp type. This PR adds
logic to create_column in the specific scenario for a schema evolution
with a new timestamp column.

## Are these changes tested?

<!--
Specify what test covers (unit test, integration test, etc.).

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

2 unit tests
test_create_timestamp_microsecond_with_timezone_array_repeated and
test_create_timestamp_microsecond_array_repeated are added.
big-mac-slice pushed a commit to perpetualsystems/iceberg-rust that referenced this pull request Apr 2, 2026
## Which issue does this PR close?

Similar to apache#1847

- Closes #.

## What changes are included in this PR?

- RecordBatchTransformer does not support timestamp type. This PR adds
logic to create_column in the specific scenario for a schema evolution
with a new timestamp column.

## Are these changes tested?

<!--
Specify what test covers (unit test, integration test, etc.).

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

2 unit tests
test_create_timestamp_microsecond_with_timezone_array_repeated and
test_create_timestamp_microsecond_array_repeated are added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants