Skip to content

feat: Cast string to timestamp_ntz#4034

Merged
parthchandra merged 5 commits intoapache:mainfrom
parthchandra:cast_ntz_to_string
Apr 24, 2026
Merged

feat: Cast string to timestamp_ntz#4034
parthchandra merged 5 commits intoapache:mainfrom
parthchandra:cast_ntz_to_string

Conversation

@parthchandra
Copy link
Copy Markdown
Contributor

@parthchandra parthchandra commented Apr 22, 2026

Which issue does this PR close?

Part of #286
Part of #378

Rationale for this change

Adds the last remaining cast for timestamp_ntz type

What changes are included in this PR?

  • Implements native Cast(StringType -> TimestampNTZType) in all three eval modes (Legacy, ANSI, Try)
  • Adds a dedicated NTZ parser (timestamp_ntz_parser) that differs from the existing timestamp parser: rejects time-only strings, silently discards timezone suffixes, and converts to local-epoch microseconds via pure arithmetic (no DST handling)
  • Marks String -> TimestampNTZ as Compatible in CometCast, removing the previous Incompatible status

Key semantic differences from String -> Timestamp

Aspect String -> Timestamp String -> TimestampNTZ
TZ in string Used for UTC conversion Silently discarded
Time-only (T12:34, 12:34) Accepted Rejected (null)
DST handling Yes None (pure arithmetic)
Session TZ role Fallback when no TZ in string Irrelevant
Result semantics UTC epoch micros Local-epoch micros

How are these changes tested?

CometCastSuite.scala — Enabled test with expanded inputs covering TZ suffixes, DST transitions, time-only rejection, invalid inputs, leap day edge cases.
cast_timestamp_ntz.sql — SQL filetests for String -> NTZ
cast_timestamp_ntz_ansi.sql — ANSI mode SQL filetests (expect_error(CAST_INVALID_INPUT) for invalid inputs, try_cast returning null )

@parthchandra parthchandra marked this pull request as draft April 22, 2026 16:53
@parthchandra parthchandra force-pushed the cast_ntz_to_string branch 2 times, most recently from 7a43dac to 61d2cd0 Compare April 23, 2026 21:55
@parthchandra parthchandra marked this pull request as ready for review April 23, 2026 22:01
@parthchandra parthchandra requested a review from andygrove April 23, 2026 22:01
"2024-11-03 01:30:00",
"not a timestamp",
"",
"T12:34:56",
Copy link
Copy Markdown
Contributor

@comphead comphead Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come this test value passes, but the same test value expects an error in sql tests? 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is running in legacy mode which returns null. The sql test is specifically for the ansi mode which returns error instead of null (as expected).

Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @parthchandra!

One thing I noticed: the ANSI mode path for invalid-but-parseable dates like '2023-02-29' may return null instead of throwing CAST_INVALID_INPUT. In Spark, LocalDate.of(2023, 2, 29) throws, which propagates through getOrElse to the ANSI error. In Comet, local_datetime_to_micros returns Ok(None) for this case, and the macro treats Ok(None) as null regardless of eval mode.

The existing castTimestampTest does exercise ANSI mode, but the test values include "not a timestamp" and "" earlier in the list, which throw before "2023-02-29" is reached, so the invalid-date case is masked.

A test like this would isolate it:

-- in cast_timestamp_ntz_ansi.sql
query expect_error(CAST_INVALID_INPUT)
SELECT cast('2023-02-29' as timestamp_ntz)

To fix the code, timestamp_ntz_parser_inner could check for Ok(None) after a successful regex match and convert it to an error in ANSI mode:

for (re, ts_type) in patterns {
    if re.is_match(value) {
        return match parse_to_timestamp_info(value, ts_type)? {
            Some(info) => match local_datetime_to_micros(&info)? {
                some @ Some(_) => Ok(some),
                None if eval_mode == EvalMode::Ansi => {
                    Err(SparkError::InvalidInputInCastToDatetime {
                        value: value.to_string(),
                        from_type: "STRING".to_string(),
                        to_type: "TIMESTAMP_NTZ".to_string(),
                    })
                }
                None => Ok(None),
            },
            None => Ok(None),
        };
    }
}

(The same pattern exists in parse_timestamp_to_micros for regular timestamps, but that's pre-existing and out of scope for this PR.)

@parthchandra
Copy link
Copy Markdown
Contributor Author

Superb catch! Addressed (and thank you for the recommended fix!)

Thanks for this @parthchandra!

One thing I noticed: the ANSI mode path for invalid-but-parseable dates like '2023-02-29' may return null instead of throwing CAST_INVALID_INPUT. In Spark, LocalDate.of(2023, 2, 29) throws, which propagates through getOrElse to the ANSI error. In Comet, local_datetime_to_micros returns Ok(None) for this case, and the macro treats Ok(None) as null regardless of eval mode.

The existing castTimestampTest does exercise ANSI mode, but the test values include "not a timestamp" and "" earlier in the list, which throw before "2023-02-29" is reached, so the invalid-date case is masked.

A test like this would isolate it:

-- in cast_timestamp_ntz_ansi.sql
query expect_error(CAST_INVALID_INPUT)
SELECT cast('2023-02-29' as timestamp_ntz)

To fix the code, timestamp_ntz_parser_inner could check for Ok(None) after a successful regex match and convert it to an error in ANSI mode:

for (re, ts_type) in patterns {
    if re.is_match(value) {
        return match parse_to_timestamp_info(value, ts_type)? {
            Some(info) => match local_datetime_to_micros(&info)? {
                some @ Some(_) => Ok(some),
                None if eval_mode == EvalMode::Ansi => {
                    Err(SparkError::InvalidInputInCastToDatetime {
                        value: value.to_string(),
                        from_type: "STRING".to_string(),
                        to_type: "TIMESTAMP_NTZ".to_string(),
                    })
                }
                None => Ok(None),
            },
            None => Ok(None),
        };
    }
}

(The same pattern exists in parse_timestamp_to_micros for regular timestamps, but that's pre-existing and out of scope for this PR.)

Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthchandra for the quick revision!

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthchandra

@parthchandra parthchandra merged commit beecc2d into apache:main Apr 24, 2026
190 of 192 checks passed
@parthchandra
Copy link
Copy Markdown
Contributor Author

Merged. Thank you @comphead @mbutrovich !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants