feat(spark): add trunc, date_trunc and time_trunc functions#19829
feat(spark): add trunc, date_trunc and time_trunc functions#19829Jefffrey merged 10 commits intoapache:mainfrom
Conversation
|
|
||
| Ok(ExprSimplifyResult::Simplified(Expr::ScalarFunction( | ||
| ScalarFunction::new_udf( | ||
| datafusion_functions::datetime::date_trunc(), |
There was a problem hiding this comment.
Just concerned about if matching return field nullability here is something we should watch for?
There was a problem hiding this comment.
yep DF's date_trunc returm field will be nullable.. if #19511 goes through it should fix this issue
There was a problem hiding this comment.
I don't see #19511 landing anytime soon so we might need to fix this in the DF date_trunc to ensure consistency here
| args: Vec<Expr>, | ||
| _info: &SimplifyContext, | ||
| ) -> Result<ExprSimplifyResult> { | ||
| let fmt_expr = &args[0]; |
There was a problem hiding this comment.
| let fmt_expr = &args[0]; | |
| let [fmt_expr, time_expr] = take_function_args(self.name(), args)?; |
| } | ||
|
|
||
| Ok(ExprSimplifyResult::Simplified(Expr::ScalarFunction( | ||
| ScalarFunction::new_udf(datafusion_functions::datetime::date_trunc(), args), |
There was a problem hiding this comment.
fmt is normalized (lowercased) above and validated but here you pass the original args (non-normalized).
Maybe it will be better to pass the fmt:
let fmt_expr = Expr::Literal(ScalarValue::new_utf8(fmt.as_str()), None);
...
ScalarFunction::new_udf(
datafusion_functions::datetime::date_trunc(),
vec![fmt_expr, time_expr],
),There was a problem hiding this comment.
shouldn't matter, DF will handle the original argument as well and lowercase it
|
@cht42 could you add tests for DST handling with different time zones? I am not convinced that the current PR handles this correctly. Here is an AI-generated test that highlights the issue: Spark repro: $ /opt/spark-3.5.7-bin-hadoop3/bin/spark-sql --conf spark.sql.session.timeZone=America/New_York -e "SELECT date_trunc('DAY', timestamp'2024-07-15T03:30:00Z');"
...
Spark master: local[*], Application Id: local-1768499563499
2024-07-14 00:00:00 |
|
good catch @andygrove , I added the tests and some logic to handle those cases. let me know what you think |
| &self.signature | ||
| } | ||
|
|
||
| // keep return_type implementation for information schema generation |
There was a problem hiding this comment.
There was a problem hiding this comment.
| // Timestamp without timezone: use as-is | ||
| _ => ts_expr, |
There was a problem hiding this comment.
The above arm catches all timestamps though? Since it doesn't match on Some(tz); so when would this arm take effect?
There was a problem hiding this comment.
yes, updated to throw an error if we go in this arm, should not be happening
Looks good. Thanks for addressing that. |
…ent is not a Timestamp
|
Looks like some conflicts to fix |
| } | ||
|
|
||
| pub fn functions() -> Vec<Arc<ScalarUDF>> { | ||
| vec![ |
There was a problem hiding this comment.
changes are because i sorted alphabetically
fixed |
|
Thanks @cht42, @andygrove & @martin-g |
…9829) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#19828. - Part of apache#15914 ## Rationale for this change implement spark: - https://spark.apache.org/docs/latest/api/sql/index.html#trunc - https://spark.apache.org/docs/latest/api/sql/index.html#date_trunc - https://spark.apache.org/docs/latest/api/sql/index.html#time_trunc ## What changes are included in this PR? Add spark compatible wrappers around datafusion date_trunc function to handle spark specificities. ## Are these changes tested? Yes in SLT ## Are there any user-facing changes? Yes
Which issue does this PR close?
datafusion-sparkSpark Compatible Functions #15914Rationale for this change
implement spark:
What changes are included in this PR?
Add spark compatible wrappers around datafusion date_trunc function to handle spark specificities.
Are these changes tested?
Yes in SLT
Are there any user-facing changes?
Yes