Skip to content

feat(spark): add trunc, date_trunc and time_trunc functions#19829

Merged
Jefffrey merged 10 commits intoapache:mainfrom
cht42:spark-trunc
Jan 19, 2026
Merged

feat(spark): add trunc, date_trunc and time_trunc functions#19829
Jefffrey merged 10 commits intoapache:mainfrom
cht42:spark-trunc

Conversation

@cht42
Copy link
Copy Markdown
Contributor

@cht42 cht42 commented Jan 15, 2026

Which issue does this PR close?

Rationale for this change

implement spark:

What changes are included in this PR?

Add spark compatible wrappers around datafusion date_trunc function to handle spark specificities.

Are these changes tested?

Yes in SLT

Are there any user-facing changes?

Yes

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) spark labels Jan 15, 2026

Ok(ExprSimplifyResult::Simplified(Expr::ScalarFunction(
ScalarFunction::new_udf(
datafusion_functions::datetime::date_trunc(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just concerned about if matching return field nullability here is something we should watch for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep DF's date_trunc returm field will be nullable.. if #19511 goes through it should fix this issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see #19511 landing anytime soon so we might need to fix this in the DF date_trunc to ensure consistency here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread datafusion/spark/src/function/datetime/date_trunc.rs Outdated
Comment thread datafusion/spark/src/function/datetime/time_trunc.rs Outdated
Comment thread datafusion/sqllogictest/test_files/spark/datetime/date_trunc.slt Outdated
Comment thread datafusion/spark/src/function/datetime/trunc.rs Outdated
Comment thread datafusion/sqllogictest/test_files/spark/datetime/time_trunc.slt Outdated
Comment thread datafusion/spark/src/function/datetime/trunc.rs Outdated
args: Vec<Expr>,
_info: &SimplifyContext,
) -> Result<ExprSimplifyResult> {
let fmt_expr = &args[0];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let fmt_expr = &args[0];
let [fmt_expr, time_expr] = take_function_args(self.name(), args)?;

}

Ok(ExprSimplifyResult::Simplified(Expr::ScalarFunction(
ScalarFunction::new_udf(datafusion_functions::datetime::date_trunc(), args),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fmt is normalized (lowercased) above and validated but here you pass the original args (non-normalized).
Maybe it will be better to pass the fmt:

let fmt_expr = Expr::Literal(ScalarValue::new_utf8(fmt.as_str()), None);
...
ScalarFunction::new_udf(
    datafusion_functions::datetime::date_trunc(),
    vec![fmt_expr, time_expr],
),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't matter, DF will handle the original argument as well and lowercase it

Comment thread datafusion/sqllogictest/test_files/spark/datetime/trunc.slt Outdated
Comment thread datafusion/spark/src/function/datetime/date_trunc.rs
@andygrove
Copy link
Copy Markdown
Member

andygrove commented Jan 15, 2026

@cht42 could you add tests for DST handling with different time zones? I am not convinced that the current PR handles this correctly.

Here is an AI-generated test that highlights the issue:

# Test with explicit timezone in timestamp
statement ok
SET datafusion.execution.time_zone = 'America/New_York';

# Cross-day boundary: 03:30 UTC on July 15 = 23:30 EDT on July 14
# TODO: Spark returns 2024-07-14 (converts to session tz first)
# DataFusion returns 2024-07-15 (truncates in UTC)
query P
SELECT date_trunc('DAY', '2024-07-15T03:30:00Z'::timestamp);
----
2024-07-15T00:00:00

# Reset timezone for other tests
statement ok
RESET datafusion.execution.time_zone

Spark repro:

$ /opt/spark-3.5.7-bin-hadoop3/bin/spark-sql --conf spark.sql.session.timeZone=America/New_York -e "SELECT date_trunc('DAY', timestamp'2024-07-15T03:30:00Z');"
...
Spark master: local[*], Application Id: local-1768499563499
2024-07-14 00:00:00

@cht42
Copy link
Copy Markdown
Contributor Author

cht42 commented Jan 15, 2026

good catch @andygrove , I added the tests and some logic to handle those cases. let me know what you think

@github-actions github-actions Bot added the functions Changes to functions implementation label Jan 16, 2026
&self.signature
}

// keep return_type implementation for information schema generation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bug

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// only handle the function which implemented [`ScalarUDFImpl::return_type`] method

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +155 to +156
// Timestamp without timezone: use as-is
_ => ts_expr,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above arm catches all timestamps though? Since it doesn't match on Some(tz); so when would this arm take effect?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, updated to throw an error if we go in this arm, should not be happening

@andygrove
Copy link
Copy Markdown
Member

good catch @andygrove , I added the tests and some logic to handle those cases. let me know what you think

Looks good. Thanks for addressing that.

@Jefffrey
Copy link
Copy Markdown
Contributor

Looks like some conflicts to fix

}

pub fn functions() -> Vec<Arc<ScalarUDF>> {
vec![
Copy link
Copy Markdown
Contributor Author

@cht42 cht42 Jan 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes are because i sorted alphabetically

@cht42
Copy link
Copy Markdown
Contributor Author

cht42 commented Jan 18, 2026

Looks like some conflicts to fix

fixed

@Jefffrey Jefffrey added this pull request to the merge queue Jan 19, 2026
Merged via the queue into apache:main with commit 4ed808a Jan 19, 2026
28 checks passed
@Jefffrey
Copy link
Copy Markdown
Contributor

Thanks @cht42, @andygrove & @martin-g

de-bgunter pushed a commit to de-bgunter/datafusion that referenced this pull request Mar 24, 2026
…9829)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes apache#19828.
- Part of apache#15914 

## Rationale for this change

implement spark:
- https://spark.apache.org/docs/latest/api/sql/index.html#trunc
- https://spark.apache.org/docs/latest/api/sql/index.html#date_trunc
- https://spark.apache.org/docs/latest/api/sql/index.html#time_trunc

## What changes are included in this PR?

Add spark compatible wrappers around datafusion date_trunc function to
handle spark specificities.

## Are these changes tested?

Yes in SLT

## Are there any user-facing changes?

Yes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[datafusion-spark] Add date_trunc, time_trunc and trunc functions

4 participants