Temporal datatype support for interval arithmetic#5971
Temporal datatype support for interval arithmetic#5971mustafasrepo merged 67 commits intoapache:mainfrom synnada-ai:feature/sym-join-temporal-columns
Conversation
add tests macro will replace matches inside evaluate ready for review
You can add a millisecond array as well, but I used Nano.
|
I plan to merge this PR, If there is no additional review from the community. if there is anyone want to review, please let me know. |
alamb
left a comment
There was a problem hiding this comment.
Thank you @berkaysynnada and @mustafasrepo -- I didn't review this PR fully but I think the structure looks good to me.
My biggest concern in the entire exercise is that the arithmetic that is supported for ScalarValue and what is supported for Arrays (and what is supported upstream in arrow-rs) will diverge. However I think we are all agreed on the desired end state (kernels in arrow-rs and the same math / support for scalars in Datafusion)
| let secs = ts_ms / 1000; | ||
| let nsecs = ((ts_ms % 1000) * 1_000_000) as u32; | ||
| do_date_time_math(secs, nsecs, scalar, sign).map(|dt| dt.timestamp_millis()) | ||
| let secs = ts_ms.div_euclid(1000); |
Thank you for the review. Once you review and approve the structure, I will move the other arithmetic codes that are for Arrays, and finally, I believe we can unify all arithmetics in kernels_arrow.rs |
Which issue does this PR close?
Closes #5844.
Rationale for this change
Currently, interval arithmetic operations are not supported for temporal datatypes. It is especially necessary to be able to apply symmetric hash join over ordered timestamp columns, probably with some interval differences.
With this PR, we can make use of interval arithmetic with Interval-Interval and Timestamp-Timestamp intervals. Also, evalute_array function (where array vs scalar operations go to in datetime.rs) had a limited evaluation ability. It has a similar behavior now with that feature .
What changes are included in this PR?
evaluate_array is extended to make timestamp vs timestamp, timestamp vs interval, and interval vs interval calculations, and dangerous unwraps are removed. min/max of different interval types can be found now.
evaluate_boundsandpropagate_constraintsare implemented forDateTimeIntervalExpr.Are these changes tested?
Yes, symmetric hash join tests are extended with temporal datatypes.
Are there any user-facing changes?