fix: hour/minute/second handling for TimestampNTZ#3265
fix: hour/minute/second handling for TimestampNTZ#3265vigneshsiva11 wants to merge 5 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Aligns Comet’s hour/minute/second behavior with Spark semantics by avoiding session-timezone conversion for TimestampNTZ (timestamp without timezone), while preserving the existing conversion behavior for timezone-aware timestamps.
Changes:
- Bypass timezone conversion for
DataType::Timestamp(_, None)(TimestampNTZ) inputs. - Preserve timezone conversion for
DataType::Timestamp(_, Some(_))inputs viaarray_with_timezone. - Add an explicit execution error for unsupported (non-timestamp) input types.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3265 +/- ##
============================================
+ Coverage 56.12% 60.14% +4.02%
- Complexity 976 1451 +475
============================================
Files 119 175 +56
Lines 11743 16067 +4324
Branches 2251 2663 +412
============================================
+ Hits 6591 9664 +3073
- Misses 4012 5059 +1047
- Partials 1140 1344 +204 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
86ba72d to
c5cf290
Compare
andygrove
left a comment
There was a problem hiding this comment.
Thanks for fixing this! The TimestampNTZ handling looks correct - skipping timezone conversion for TimestampNTZ while keeping it for regular timestamps makes sense.
One thing I wanted to check: the new match pattern handles Timestamp(_, None) and Timestamp(_, Some(_)) directly, but I'm wondering about dictionary-encoded timestamps. Looking at array_with_timezone in utils.rs, it has explicit handling for DataType::Dictionary(_, value_type) where the value type is a timestamp. With the new code, dictionary-encoded timestamps would fall into the error branch. Could this be a problem for Iceberg tables where timestamp columns are often dictionary-encoded?
Also, it might be worth adding a Scala test that exercises the TimestampNTZ path specifically, maybe something that verifies hour(timestamp_ntz_column) returns the correct value in a non-UTC session timezone. The bug this fixes is subtle and having a regression test would help catch it if anything changes in the future.
This review was generated with AI assistance.
|
I’ve added the regression test for |
@vigneshsiva11 I don't see the test |
|
@vigneshsiva11 I have moved this to draft for now. Please mark as ready for review once feedback has been addressed. Thank you. |
|
i have been marked as ready for review. |
…sion test - Add support for dictionary-encoded timestamps in extract_date_part - Add comprehensive test for hour/minute/second with TimestampNTZ in non-UTC timezones - Addresses reviewer feedback on PR apache#3265 for issue apache#3180
0b498d2 to
4979e81
Compare
|
@andygrove Thank you for the detailed review and feedback! I've updated the PR to address all your concerns: Changes Made:
Code Changes:
The PR now focuses solely on fixing issue #3180. Ready for another review when you have time! |
andygrove
left a comment
There was a problem hiding this comment.
LGTM pending CI. Thanks @vigneshsiva11
|
Hi @andygrove, Thank you for approving the changes! 🙏 I notice 94 tests are failing in CI. I checked a few logs, and they don't appear to be related to the TimestampNTZ/hour/minute/second changes in this PR. Could you help me understand if these failures are:
Please let me know if there's anything specific I need to address. Thank you! |
|
You need to run |
|
Hi @parthchandra, Thanks for pointing this out. I ran |
|
Tests are failing |
andygrove
left a comment
There was a problem hiding this comment.
Tests are currently failing
Enable native planning for Hour/Minute/Second after TimestampNTZ handling fix in spark-expr. This removes fallback to Spark caused by incompatible gating and unblocks expression CI checks.
|
@andygrove Thanks for the heads-up. I found the failing case and pushed a fix. Root cause: What I changed:
Could you please take another look once CI completes? If anything still fails, I will address it immediately. |
Which issue does this PR close?
Closes #3180
Rationale for this change
Spark
TimestampNTZvalues represent local time without any timezoneinformation. However, the current Comet implementation applies timezone
conversion when evaluating
hour,minute, andsecond, which leads toincorrect results for
TimestampNTZinputs in non-UTC session timezones.This change aligns Comet behavior with Spark semantics by ensuring that
timezone conversion is only applied to
Timestampvalues with an explicittimezone, and not to
TimestampNTZ.What changes are included in this PR?
extract_date_partused byhour,minute, andsecondto:TimestampNTZinputsTimestampinputs with timezoneHow are these changes tested?
cargo test -p datafusion-comet-spark-expr