feat: Cast string to timestamp_ntz#4034
Conversation
7a43dac to
61d2cd0
Compare
61d2cd0 to
af117c7
Compare
| "2024-11-03 01:30:00", | ||
| "not a timestamp", | ||
| "", | ||
| "T12:34:56", |
There was a problem hiding this comment.
how come this test value passes, but the same test value expects an error in sql tests? 🤔
There was a problem hiding this comment.
This test is running in legacy mode which returns null. The sql test is specifically for the ansi mode which returns error instead of null (as expected).
There was a problem hiding this comment.
Thanks for this @parthchandra!
One thing I noticed: the ANSI mode path for invalid-but-parseable dates like '2023-02-29' may return null instead of throwing CAST_INVALID_INPUT. In Spark, LocalDate.of(2023, 2, 29) throws, which propagates through getOrElse to the ANSI error. In Comet, local_datetime_to_micros returns Ok(None) for this case, and the macro treats Ok(None) as null regardless of eval mode.
The existing castTimestampTest does exercise ANSI mode, but the test values include "not a timestamp" and "" earlier in the list, which throw before "2023-02-29" is reached, so the invalid-date case is masked.
A test like this would isolate it:
-- in cast_timestamp_ntz_ansi.sql
query expect_error(CAST_INVALID_INPUT)
SELECT cast('2023-02-29' as timestamp_ntz)To fix the code, timestamp_ntz_parser_inner could check for Ok(None) after a successful regex match and convert it to an error in ANSI mode:
for (re, ts_type) in patterns {
if re.is_match(value) {
return match parse_to_timestamp_info(value, ts_type)? {
Some(info) => match local_datetime_to_micros(&info)? {
some @ Some(_) => Ok(some),
None if eval_mode == EvalMode::Ansi => {
Err(SparkError::InvalidInputInCastToDatetime {
value: value.to_string(),
from_type: "STRING".to_string(),
to_type: "TIMESTAMP_NTZ".to_string(),
})
}
None => Ok(None),
},
None => Ok(None),
};
}
}(The same pattern exists in parse_timestamp_to_micros for regular timestamps, but that's pre-existing and out of scope for this PR.)
|
Superb catch! Addressed (and thank you for the recommended fix!)
|
mbutrovich
left a comment
There was a problem hiding this comment.
Thanks @parthchandra for the quick revision!
|
Merged. Thank you @comphead @mbutrovich ! |
Which issue does this PR close?
Part of #286
Part of #378
Rationale for this change
Adds the last remaining cast for timestamp_ntz type
What changes are included in this PR?
Cast(StringType -> TimestampNTZType)in all three eval modes (Legacy, ANSI, Try)timestamp_ntz_parser) that differs from the existing timestamp parser: rejects time-only strings, silently discards timezone suffixes, and converts to local-epoch microseconds via pure arithmetic (no DST handling)String -> TimestampNTZasCompatibleinCometCast, removing the previousIncompatiblestatusKey semantic differences from String -> Timestamp
T12:34,12:34)How are these changes tested?
CometCastSuite.scala— Enabled test with expanded inputs covering TZ suffixes, DST transitions, time-only rejection, invalid inputs, leap day edge cases.cast_timestamp_ntz.sql— SQL filetests for String -> NTZcast_timestamp_ntz_ansi.sql— ANSI mode SQL filetests (expect_error(CAST_INVALID_INPUT)for invalid inputs,try_castreturning null )