fix: handle type mismatches in native c2r conversion#3583
Merged
andygrove merged 6 commits intoapache:mainfrom Feb 24, 2026
Merged
fix: handle type mismatches in native c2r conversion#3583andygrove merged 6 commits intoapache:mainfrom
andygrove merged 6 commits intoapache:mainfrom
Conversation
When Spark generates default column values, it can produce Arrow arrays with physical types (e.g. Int32) that differ from the logical schema type (e.g. Date32). The c2r converter's maybe_cast_to_schema_type previously passed these through silently, causing downcast failures. Now the fallback arm attempts an Arrow cast for any type mismatch, fixing the immediate Date32 bug and preventing similar issues for other data types. Closes apache#3482
andygrove
commented
Feb 24, 2026
| /** | ||
| * Checks if native columnar to row conversion is enabled. | ||
| */ | ||
| def isEnabled: Boolean = CometConf.COMET_NATIVE_COLUMNAR_TO_ROW_ENABLED.get() |
comphead
reviewed
Feb 24, 2026
| // This handles cases like Int32 → Date32 (which can happen when Spark | ||
| // generates default column values using the physical storage type rather | ||
| // than the logical type). | ||
| let options = CastOptions::default(); |
Contributor
There was a problem hiding this comment.
this might be expensive to create each time, especially the formatter factory?
can it be const?
Member
Author
There was a problem hiding this comment.
Claude says that this is cheap and that there are no heap allocations.
CastOptions::default() is just setting a bool and a struct of Option::None fields on the stack. There are no heap allocations. The actual cost is entirely in cast_with_options itself (which processes the array data). Creating the options struct is negligible.
Contributor
There was a problem hiding this comment.
Btw, DF comes with predefined const
https://github.com/apache/datafusion/blob/387e20cc58ae91da3902b58438a0684998d7b45b/datafusion/common/src/format.rs#L33
comphead
approved these changes
Feb 24, 2026
Contributor
comphead
left a comment
There was a problem hiding this comment.
Thanks @andygrove left a small comment
Member
Author
|
Thanks @comphead |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3482
Summary
Int32) that differ from logical schema types (e.g.Date32)maybe_cast_to_schema_typeto attempt an Arrow cast for any type mismatch, rather than silently passing through the mismatched array