feat: support Spark expression slice by andygrove · Pull Request #4149 · apache/datafusion-comet

andygrove · 2026-04-29T23:25:10Z

Which issue does this PR close?

Closes #.

Rationale for this change

Add native support for Spark's slice(array, start, length) expression so it runs on Comet instead of falling back to Spark.

The datafusion-spark crate already ships a SparkSlice, but it is not Spark-compatible: when a negative start lies before the beginning of the array (e.g. slice([a], -2, 2)), it returns the first element instead of an empty array. We can upstream the fix later; for now this PR ships a Comet-local implementation.

What changes are included in this PR?

native/spark-expr/src/array_funcs/array_slice.rs: new SparkArraySlice UDF (spark_array_slice) implementing Spark's slice semantics, including 1-based indexing, negative-start-from-end, error on start = 0 or length < 0, and clamping length to the array end. Supports both List and LargeList element storage.
native/spark-expr/src/comet_scalar_funcs.rs: register the new UDF.
spark/src/main/scala/org/apache/comet/serde/arrays.scala: CometSlice serde casts the start/length args to Long and serialises a call to spark_array_slice, promising containsNull = true to match DataFusion's list nullability.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: register Slice in arrayExpressions.

How are these changes tested?

12 native unit tests in array_slice.rs covering positive / negative / zero / overflowing start, length 0, length past end, null inputs, empty arrays, and the error cases.
New SQL test file spark/src/test/resources/sql-tests/expressions/array/slice.sql covering all-literal, column + literal, and column-only argument combinations across boolean, tinyint, smallint, int, bigint, float, double, decimal, date, timestamp, timestamp_ntz, string, and nested array element types, plus the negative-start-overflow case that exposed the upstream bug.

andygrove added 2 commits April 29, 2026 17:24

feat: support Spark expression slice

1f6133a

refactor: tighten slice_list inner loop

4a46ed4

andygrove marked this pull request as ready for review April 30, 2026 01:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support Spark expression slice#4149

feat: support Spark expression slice#4149
andygrove wants to merge 2 commits intoapache:mainfrom
andygrove:feat/array-slice

andygrove commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Apr 29, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant