feat: add support for date_from_unix_date expression by andygrove · Pull Request #3144 · apache/datafusion-comet

andygrove · 2026-01-14T22:48:19Z

Summary

Adds native Comet support for Spark's date_from_unix_date function
The function converts an integer (days since Unix epoch 1970-01-01) to a Date32 value
Implementation is straightforward since Arrow's Date32 already stores days since epoch

Test Plan

Added unit tests in CometTemporalExpressionSuite
Tests cover: epoch date, positive/negative days, null handling
All existing tests pass

Note: This PR was generated with AI assistance.

Adds native Comet support for Spark's last_day function, which returns the last day of the month for a given date. Uses the SparkLastDay implementation from datafusion-spark crate. Closes apache#3090 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Adds native Comet support for Spark's date_from_unix_date function, which converts an integer representing days since Unix epoch (1970-01-01) to a Date32 value. Closes apache#3089 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

codecov-commenter · 2026-01-15T00:52:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.96%. Comparing base (f09f8af) to head (b2cdd60).
⚠️ Report is 907 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3144      +/-   ##
============================================
+ Coverage     56.12%   59.96%   +3.83%     
- Complexity      976     1473     +497     
============================================
  Files           119      175      +56     
  Lines         11743    16169    +4426     
  Branches       2251     2682     +431     
============================================
+ Hits           6591     9695    +3104     
- Misses         4012     5126    +1114     
- Partials       1140     1348     +208

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…x-date # Conflicts: # docs/source/user-guide/latest/configs.md # native/spark-expr/src/comet_scalar_funcs.rs # native/spark-expr/src/datetime_funcs/mod.rs # native/spark-expr/src/lib.rs # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/main/scala/org/apache/comet/serde/datetime.scala # spark/src/test/scala/org/apache/comet/CometTemporalExpressionSuite.scala

andygrove · 2026-01-30T15:03:46Z

Moving this to draft until #3328 is merged

…x-date # Conflicts: # native/spark-expr/src/comet_scalar_funcs.rs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…x-date

…x-date Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…x-date

…x-date # Conflicts: # native/spark-expr/src/comet_scalar_funcs.rs # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/main/scala/org/apache/comet/serde/datetime.scala

mbutrovich · 2026-04-14T15:20:14Z

Thanks @andygrove! Claude summarized my notes for me. Hopefully it didn't transcribe anything wrong or hallucinate :)

PR #3144: date_from_unix_date

Nice straightforward expression. A few things I noticed:

Scalar path returns Array instead of Scalar

The scalar branch in date_from_unix_date.rs converts the input to a 1-element array and returns ColumnarValue::Array. This breaks the scalar-in/scalar-out contract that DataFusion expects for proper broadcast semantics. When date_from_unix_date(0) is used as a literal, returning a 1-element array instead of a scalar could cause issues with downstream columnar operations.

The fix would be to extract the ScalarValue::Int32 directly and return ScalarValue::Date32:
ColumnarValue::Scalar(scalar) => {
    match scalar {
        ScalarValue::Int32(Some(days)) => {
            Ok(ColumnarValue::Scalar(ScalarValue::Date32(Some(days))))
        }
        ScalarValue::Int32(None) | ScalarValue::Null => {
            Ok(ColumnarValue::Scalar(ScalarValue::Date32(None)))
        }
        _ => Err(DataFusionError::Execution(
            "date_from_unix_date expects Int32 scalar input".to_string(),
        )),
    }
}
Docs

docs/spark_expressions_support.md still has date_from_unix_date marked as [ ]. Should be [x].

Tests

It might be worth adding Int32 boundary values (2147483647, -2147483648) to the test INSERT to match Spark's own testIntegralInput coverage. The implementation is identity so it can't overflow, but boundary tests document that the behavior is intentional.

- Return ScalarValue::Date32 directly in the scalar path instead of converting to a 1-element array, preserving scalar-in/scalar-out contract for proper broadcast semantics - Mark date_from_unix_date as supported in spark_expressions_support.md - Add Int32 boundary values (2147483647, -2147483648) to test coverage

kazuyukitanimura · 2026-04-14T16:51:20Z

+-- specific language governing permissions and limitations
+-- under the License.
+
+-- ConfigMatrix: parquet.enable.dictionary=false,true


nit: I think parquet.enable.dictionary=false,true is removed everywhere recently

kazuyukitanimura

Pending ci (it looks there is a format issue)

andygrove · 2026-04-14T19:50:44Z

@mbutrovich regarding It might be worth adding Int32 boundary values (2147483647, -2147483648) . Spark itself cannot handle those cases.

Error executing SQL 'SELECT date_from_unix_date(i) FROM test_date_from_unix_date' [EXPRESSION_DECODING_FAILED] Failed to decode a row to a value of the expressions:                     
  createexternalrow(static_invoke(DateTimeUtils.toJavaDate(input[0, date, true])), StructField(date_from_unix_date(i),DateType,true)). SQLSTATE: 42846

Replace INT_MAX (2147483647) and INT_MIN (-2147483648) with Spark's actual date boundaries (-719162 for 0001-01-01 and 2932896 for 9999-12-31) to fix EXPRESSION_DECODING_FAILED error when Spark tries to convert out-of-range dates to Java Date objects.

andygrove · 2026-04-15T17:09:25Z

Merged. Thanks @kazuyukitanimura @mbutrovich

andygrove and others added 2 commits January 14, 2026 15:31

andygrove marked this pull request as draft January 14, 2026 23:49

andygrove added 2 commits January 14, 2026 17:22

update docs

c18720b

cargo fmt

5dcd64d

andygrove marked this pull request as ready for review January 15, 2026 02:42

andygrove marked this pull request as draft January 30, 2026 01:48

andygrove and others added 6 commits February 10, 2026 11:34

Merge remote-tracking branch 'apache/main' into feature/date-from-uni…

5b2c54c

…x-date # Conflicts: # native/spark-expr/src/comet_scalar_funcs.rs

test: migrate date_from_unix_date tests to SQL file-based approach

4b02d52

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'apache/main' into feature/date-from-uni…

6055ba8

…x-date

Merge remote-tracking branch 'apache/main' into feature/date-from-uni…

5aef007

…x-date Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'apache/main' into feature/date-from-uni…

5a3b76a

…x-date

upmerge

ddd0a65

andygrove marked this pull request as ready for review March 7, 2026 21:00

andygrove added 3 commits April 13, 2026 15:25

Merge remote-tracking branch 'apache/main' into feature/date-from-uni…

a1cc54d

…x-date # Conflicts: # native/spark-expr/src/comet_scalar_funcs.rs # spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala # spark/src/main/scala/org/apache/comet/serde/datetime.scala

chore: apply cargo fmt

0ec82c9

fix: resolve merge conflicts after merging main

57bf643

kazuyukitanimura reviewed Apr 14, 2026

View reviewed changes

kazuyukitanimura approved these changes Apr 14, 2026

View reviewed changes

andygrove added 2 commits April 14, 2026 13:09

cargo fmt

0e1b28d

address feedback

fc69b9a

andygrove added 2 commits April 14, 2026 14:03

docs: add comment explaining date boundary values in test

0d325c7

andygrove merged commit 9372a5e into apache:main Apr 15, 2026
132 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for date_from_unix_date expression#3144

feat: add support for date_from_unix_date expression#3144
andygrove merged 19 commits intoapache:mainfrom
andygrove:feature/date-from-unix-date

andygrove commented Jan 14, 2026

Uh oh!

codecov-commenter commented Jan 15, 2026 •

edited

Loading

Uh oh!

andygrove commented Jan 30, 2026

Uh oh!

mbutrovich commented Apr 14, 2026 •

edited

Loading

PR #3144: `date_from_unix_date`

Scalar path returns Array instead of Scalar

Docs

Tests

Uh oh!

kazuyukitanimura Apr 14, 2026

Uh oh!

kazuyukitanimura left a comment

Uh oh!

andygrove commented Apr 14, 2026

Uh oh!

Uh oh!

andygrove commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

andygrove commented Jan 14, 2026

Summary

Test Plan

Uh oh!

codecov-commenter commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove commented Jan 30, 2026

Uh oh!

mbutrovich commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #3144: date_from_unix_date

Scalar path returns Array instead of Scalar

Docs

Tests

Uh oh!

kazuyukitanimura Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Apr 14, 2026

Uh oh!

Uh oh!

andygrove commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Jan 15, 2026 •

edited

Loading

mbutrovich commented Apr 14, 2026 •

edited

Loading

PR #3144: `date_from_unix_date`