Summary
When casting a string to TimestampNTZ, Comet incorrectly produces a timestamp with UTC timezone instead of a timestamp without timezone. This fundamentally changes the semantics of the value.
Root Cause
In native/spark-expr/src/conversion_funcs/cast.rs, the cast_string_to_timestamp function uses the pattern DataType::Timestamp(_, _) which matches both:
Timestamp(Microsecond, Some("UTC")) - timestamp with timezone
Timestamp(Microsecond, None) - TimestampNTZ (no timezone)
The cast_utf8_to_timestamp! macro (lines 416-434) unconditionally creates a timestamp WITH timezone:
let mut cast_array = PrimitiveArray::<$array_type>::builder(len).with_timezone("UTC");
Expected Behavior
For TimestampNTZ (Timestamp without timezone):
- The result should be
Timestamp(Microsecond, None) - no timezone
- Values should be stored as-is without timezone conversion
For Timestamp (with timezone):
- The result should be
Timestamp(Microsecond, Some("UTC")) - with UTC timezone
- Values should be converted to UTC
Suggested Fix
Separate the handling for Timestamp and TimestampNTZ:
match to_type {
DataType::Timestamp(unit, Some(tz)) => {
// Timestamp with timezone - apply timezone conversion
cast_utf8_to_timestamp!(... with_timezone(tz) ...)
}
DataType::Timestamp(unit, None) => {
// TimestampNTZ - no timezone, store as-is
cast_utf8_to_timestamp_ntz!(... no timezone ...)
}
}
Impact
This bug affects any operation that involves casting strings to TimestampNTZ in Comet. Currently, TimestampNTZ casts are marked as Incompatible and fall back to Spark, so this bug is not exposed in production. However, it would be a blocker for enabling full TimestampNTZ support.
Related
Note: This issue was generated with AI assistance.
Summary
When casting a string to
TimestampNTZ, Comet incorrectly produces a timestamp with UTC timezone instead of a timestamp without timezone. This fundamentally changes the semantics of the value.Root Cause
In
native/spark-expr/src/conversion_funcs/cast.rs, thecast_string_to_timestampfunction uses the patternDataType::Timestamp(_, _)which matches both:Timestamp(Microsecond, Some("UTC"))- timestamp with timezoneTimestamp(Microsecond, None)- TimestampNTZ (no timezone)The
cast_utf8_to_timestamp!macro (lines 416-434) unconditionally creates a timestamp WITH timezone:Expected Behavior
For TimestampNTZ (Timestamp without timezone):
Timestamp(Microsecond, None)- no timezoneFor Timestamp (with timezone):
Timestamp(Microsecond, Some("UTC"))- with UTC timezoneSuggested Fix
Separate the handling for Timestamp and TimestampNTZ:
Impact
This bug affects any operation that involves casting strings to TimestampNTZ in Comet. Currently, TimestampNTZ casts are marked as
Incompatibleand fall back to Spark, so this bug is not exposed in production. However, it would be a blocker for enabling full TimestampNTZ support.Related
supportedTypesCometCast.scala