Skip to content

feat: add support for date_from_unix_date expression#3144

Merged
andygrove merged 19 commits intoapache:mainfrom
andygrove:feature/date-from-unix-date
Apr 15, 2026
Merged

feat: add support for date_from_unix_date expression#3144
andygrove merged 19 commits intoapache:mainfrom
andygrove:feature/date-from-unix-date

Conversation

@andygrove
Copy link
Copy Markdown
Member

Summary

  • Adds native Comet support for Spark's date_from_unix_date function
  • The function converts an integer (days since Unix epoch 1970-01-01) to a Date32 value
  • Implementation is straightforward since Arrow's Date32 already stores days since epoch

Test Plan

  • Added unit tests in CometTemporalExpressionSuite
  • Tests cover: epoch date, positive/negative days, null handling
  • All existing tests pass

Note: This PR was generated with AI assistance.

Closes #3089

andygrove and others added 2 commits January 14, 2026 15:31
Adds native Comet support for Spark's last_day function, which returns
the last day of the month for a given date.

Uses the SparkLastDay implementation from datafusion-spark crate.

Closes apache#3090

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds native Comet support for Spark's date_from_unix_date function,
which converts an integer representing days since Unix epoch (1970-01-01)
to a Date32 value.

Closes apache#3089

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@andygrove andygrove marked this pull request as draft January 14, 2026 23:49
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.96%. Comparing base (f09f8af) to head (b2cdd60).
⚠️ Report is 907 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3144      +/-   ##
============================================
+ Coverage     56.12%   59.96%   +3.83%     
- Complexity      976     1473     +497     
============================================
  Files           119      175      +56     
  Lines         11743    16169    +4426     
  Branches       2251     2682     +431     
============================================
+ Hits           6591     9695    +3104     
- Misses         4012     5126    +1114     
- Partials       1140     1348     +208     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andygrove andygrove marked this pull request as ready for review January 15, 2026 02:42
@andygrove andygrove marked this pull request as draft January 30, 2026 01:48
…x-date

# Conflicts:
#	docs/source/user-guide/latest/configs.md
#	native/spark-expr/src/comet_scalar_funcs.rs
#	native/spark-expr/src/datetime_funcs/mod.rs
#	native/spark-expr/src/lib.rs
#	spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
#	spark/src/main/scala/org/apache/comet/serde/datetime.scala
#	spark/src/test/scala/org/apache/comet/CometTemporalExpressionSuite.scala
@andygrove
Copy link
Copy Markdown
Member Author

Moving this to draft until #3328 is merged

@andygrove andygrove marked this pull request as ready for review March 7, 2026 21:00
…x-date

# Conflicts:
#	native/spark-expr/src/comet_scalar_funcs.rs
#	spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
#	spark/src/main/scala/org/apache/comet/serde/datetime.scala
@mbutrovich
Copy link
Copy Markdown
Contributor

mbutrovich commented Apr 14, 2026

Thanks @andygrove! Claude summarized my notes for me. Hopefully it didn't transcribe anything wrong or hallucinate :)

PR #3144: date_from_unix_date

Nice straightforward expression. A few things I noticed:

Scalar path returns Array instead of Scalar

The scalar branch in date_from_unix_date.rs converts the input to a 1-element array and returns ColumnarValue::Array. This breaks the scalar-in/scalar-out contract that DataFusion expects for proper broadcast semantics. When date_from_unix_date(0) is used as a literal, returning a 1-element array instead of a scalar could cause issues with downstream columnar operations.

The fix would be to extract the ScalarValue::Int32 directly and return ScalarValue::Date32:

ColumnarValue::Scalar(scalar) => {
    match scalar {
        ScalarValue::Int32(Some(days)) => {
            Ok(ColumnarValue::Scalar(ScalarValue::Date32(Some(days))))
        }
        ScalarValue::Int32(None) | ScalarValue::Null => {
            Ok(ColumnarValue::Scalar(ScalarValue::Date32(None)))
        }
        _ => Err(DataFusionError::Execution(
            "date_from_unix_date expects Int32 scalar input".to_string(),
        )),
    }
}

Docs

docs/spark_expressions_support.md still has date_from_unix_date marked as [ ]. Should be [x].

Tests

It might be worth adding Int32 boundary values (2147483647, -2147483648) to the test INSERT to match Spark's own testIntegralInput coverage. The implementation is identity so it can't overflow, but boundary tests document that the behavior is intentional.

- Return ScalarValue::Date32 directly in the scalar path instead of
  converting to a 1-element array, preserving scalar-in/scalar-out
  contract for proper broadcast semantics
- Mark date_from_unix_date as supported in spark_expressions_support.md
- Add Int32 boundary values (2147483647, -2147483648) to test coverage
-- specific language governing permissions and limitations
-- under the License.

-- ConfigMatrix: parquet.enable.dictionary=false,true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think parquet.enable.dictionary=false,true is removed everywhere recently

Copy link
Copy Markdown
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending ci (it looks there is a format issue)

@andygrove
Copy link
Copy Markdown
Member Author

@mbutrovich regarding It might be worth adding Int32 boundary values (2147483647, -2147483648) . Spark itself cannot handle those cases.

Error executing SQL 'SELECT date_from_unix_date(i) FROM test_date_from_unix_date' [EXPRESSION_DECODING_FAILED] Failed to decode a row to a value of the expressions:                     
  createexternalrow(static_invoke(DateTimeUtils.toJavaDate(input[0, date, true])), StructField(date_from_unix_date(i),DateType,true)). SQLSTATE: 42846

Replace INT_MAX (2147483647) and INT_MIN (-2147483648) with Spark's
actual date boundaries (-719162 for 0001-01-01 and 2932896 for
9999-12-31) to fix EXPRESSION_DECODING_FAILED error when Spark
tries to convert out-of-range dates to Java Date objects.
@andygrove andygrove merged commit 9372a5e into apache:main Apr 15, 2026
132 checks passed
@andygrove
Copy link
Copy Markdown
Member Author

Merged. Thanks @kazuyukitanimura @mbutrovich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: date_from_unix_date

4 participants