Skip to content

fix: handle type mismatches in native c2r conversion#3583

Merged
andygrove merged 6 commits intoapache:mainfrom
andygrove:fix/c2r-type-mismatch
Feb 24, 2026
Merged

fix: handle type mismatches in native c2r conversion#3583
andygrove merged 6 commits intoapache:mainfrom
andygrove:fix/c2r-type-mismatch

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Feb 24, 2026

Closes #3482

Summary

  • Fixes native columnar-to-row conversion crash when Spark generates default column values with physical types (e.g. Int32) that differ from logical schema types (e.g. Date32)
  • Changes the fallback arm in maybe_cast_to_schema_type to attempt an Arrow cast for any type mismatch, rather than silently passing through the mismatched array
  • This prevents similar issues for other data types in the future
  • Confirmed that the fuzz test suite now passes when native C2R is enabled

When Spark generates default column values, it can produce Arrow arrays
with physical types (e.g. Int32) that differ from the logical schema
type (e.g. Date32). The c2r converter's maybe_cast_to_schema_type
previously passed these through silently, causing downcast failures.

Now the fallback arm attempts an Arrow cast for any type mismatch,
fixing the immediate Date32 bug and preventing similar issues for
other data types.

Closes apache#3482
/**
* Checks if native columnar to row conversion is enabled.
*/
def isEnabled: Boolean = CometConf.COMET_NATIVE_COLUMNAR_TO_ROW_ENABLED.get()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was unused

@andygrove andygrove marked this pull request as ready for review February 24, 2026 19:59
// This handles cases like Int32 → Date32 (which can happen when Spark
// generates default column values using the physical storage type rather
// than the logical type).
let options = CastOptions::default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be expensive to create each time, especially the formatter factory?
can it be const?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude says that this is cheap and that there are no heap allocations.

CastOptions::default() is just setting a bool and a struct of Option::None fields on the stack. There are no heap allocations. The actual cost is entirely in cast_with_options itself (which processes the array data). Creating the options struct is negligible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove left a small comment

@andygrove andygrove merged commit ea5dc31 into apache:main Feb 24, 2026
108 checks passed
@andygrove andygrove deleted the fix/c2r-type-mismatch branch February 24, 2026 23:02
@andygrove
Copy link
Member Author

Thanks @comphead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native c2r bugs

2 participants