feat(utilities): add option to make all schema columns nullable for backwards compatibility #17777

prashantwason · 2026-01-04T07:54:23Z

Describe the issue this Pull Request addresses

This PR adds an option to ensure all columns in the schema are nullable when using HoodieStreamer with row-based sources like SQLSource or SQLFileBasedSource.

When new columns are added via SQL queries, the schema must be backwards compatible. New columns added to a table must be nullable because existing records don't have values for them. This change provides a configuration option to automatically make all columns nullable, ensuring smooth schema evolution.

Summary and Changelog

What users gain: Users can now set hoodie.deltastreamer.transformed.row.nullable=true to automatically make all columns in the incoming schema nullable, preventing schema compatibility issues during schema evolution.

Changes:

Added new configuration constants in HoodieStreamer.java:
- ENSURE_ALL_COLUMNS_NULLABLE_KEY = "hoodie.deltastreamer.transformed.row.nullable"
- ENSURE_ALL_COLUMNS_NULLABLE_DEFAULT = false
Added extractSchemaFromDataset() method in UtilHelpers.java that optionally converts schema to nullable using Spark's StructType.asNullable()
Updated RowSource.java to use the new schema extraction method
Updated StreamSync.java to use the new schema extraction method for transformed datasets

Impact

New configuration option hoodie.deltastreamer.transformed.row.nullable (default: false)
No breaking changes - existing behavior is preserved when config is not set
When enabled, all columns in the dataset schema are converted to nullable before being used

Risk Level

low - The feature is disabled by default and only affects schema handling when explicitly enabled. The implementation uses Spark's built-in asNullable() method which is well-tested.

Documentation Update

The new config hoodie.deltastreamer.transformed.row.nullable should be documented:

Key: hoodie.deltastreamer.transformed.row.nullable
Default: false
Description: When set to true, all columns in the incoming dataset schema are made nullable. This is useful for maintaining backwards compatibility when new columns are added via SQL queries.

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

This is required to keep the schema backwards compatible when new columns are added via SQL queries (e.g. When using SQLSource or SQLFileBasedSource as source of records written into a table).

hudi-bot · 2026-01-04T09:38:28Z

CI report:

cd331db Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

Ensure that all columns in the schema are nullable

cd331db

This is required to keep the schema backwards compatible when new columns are added via SQL queries (e.g. When using SQLSource or SQLFileBasedSource as source of records written into a table).

github-actions bot added the size:S PR with lines of changes in (10, 100] label Jan 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(utilities): add option to make all schema columns nullable for backwards compatibility #17777

feat(utilities): add option to make all schema columns nullable for backwards compatibility #17777

prashantwason commented Jan 4, 2026 •

edited

Loading

Uh oh!

hudi-bot commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(utilities): add option to make all schema columns nullable for backwards compatibility #17777

Are you sure you want to change the base?

feat(utilities): add option to make all schema columns nullable for backwards compatibility #17777

Conversation

prashantwason commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

hudi-bot commented Jan 4, 2026

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prashantwason commented Jan 4, 2026 •

edited

Loading