Refactor schema rewriter: remove lifetimes, extract column/cast helpers, add mismatch coverage#20166
Merged
adriangb merged 3 commits intoapache:mainfrom Feb 5, 2026
Merged
Conversation
adriangb
approved these changes
Feb 5, 2026
Contributor
|
Thanks @kosiew ! |
Contributor
Author
de-bgunter
pushed a commit
to de-bgunter/datafusion
that referenced
this pull request
Mar 24, 2026
…rs, add mismatch coverage (apache#20166) ## Which issue does this PR close? * Closes apache#20161. ## Rationale for this change This change is a focused refactor of the `PhysicalExprAdapter` schema rewriter to improve readability and maintainability while preserving behavior. Key motivations: * Reduce complexity from explicit lifetimes by storing schema references as `SchemaRef`. * Make column/index/type handling easier to follow by extracting helper functions. * Strengthen the test suite to ensure refactors do not alter adapter output. ## What changes are included in this PR? * Refactored `DefaultPhysicalExprAdapterRewriter` to own `SchemaRef` values instead of borrowing `&Schema`. * Simplifies construction and avoids lifetime plumbing. * Simplified column rewrite logic by: * Early-exiting when both the physical index and data type already match. * Extracting `resolve_column` to handle physical index/name resolution. * Extracting `create_cast_column_expr` to validate cast compatibility (including nested structs) and build `CastColumnExpr`. * Minor cleanups in struct compatibility validation and field selection to ensure the cast checks are performed against the *actual* physical field resolved by the final column index. * Test updates and additions: * Simplified construction of expected struct `Field`s in tests for clarity. * Added `test_rewrite_column_index_and_type_mismatch` to validate the combined case where the logical column index differs from the physical schema *and* the data type requires casting. ## Are these changes tested? Yes. * Existing unit tests continue to pass. * Added a new unit test to cover the index-and-type mismatch scenario for column rewriting, asserting: * The inner `Column` points to the correct physical index. * The resulting expression is a `CastColumnExpr` producing the expected logical type. ## Are there any user-facing changes? No. * This is a refactor/cleanup intended to preserve existing behavior. * No public API changes, no behavioral changes expected in query results. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
This change is a focused refactor of the
PhysicalExprAdapterschema rewriter to improve readability and maintainability while preserving behavior.Key motivations:
SchemaRef.What changes are included in this PR?
Refactored
DefaultPhysicalExprAdapterRewriterto ownSchemaRefvalues instead of borrowing&Schema.Simplified column rewrite logic by:
resolve_columnto handle physical index/name resolution.create_cast_column_exprto validate cast compatibility (including nested structs) and buildCastColumnExpr.Minor cleanups in struct compatibility validation and field selection to ensure the cast checks are performed against the actual physical field resolved by the final column index.
Test updates and additions:
Fields in tests for clarity.test_rewrite_column_index_and_type_mismatchto validate the combined case where the logical column index differs from the physical schema and the data type requires casting.Are these changes tested?
Yes.
Existing unit tests continue to pass.
Added a new unit test to cover the index-and-type mismatch scenario for column rewriting, asserting:
Columnpoints to the correct physical index.CastColumnExprproducing the expected logical type.Are there any user-facing changes?
No.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.