What is the problem the feature request solves?
Note: This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark array_exists function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.
The ArrayExists expression checks whether any element in an array satisfies a given predicate function. It applies a lambda function to each element of the array and returns true if at least one element makes the predicate evaluate to true.
Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
EXISTS(array, lambda_function)
Arguments:
| Argument |
Type |
Description |
argument |
ArrayType |
The input array to evaluate |
function |
LambdaFunction |
The predicate function to apply to each array element |
followThreeValuedLogic |
Boolean |
Controls null handling behavior (internal parameter) |
Return Type: BooleanType - Returns true if any element satisfies the predicate, false if none do, or null in certain null-handling scenarios.
Supported Data Types:
- Input: Any
ArrayType with elements of any data type
- Lambda function must return a boolean result
- Supports arrays with nullable elements
Edge Cases:
- Null array input: Returns null if the input array itself is null
- Empty array: Returns false for empty arrays
- Null lambda results: When
followThreeValuedLogic is true and lambda returns null for some elements but no element returns true, the overall result is null
- Legacy mode: When
followThreeValuedLogic is false, null lambda results are ignored and only affect final result if no true value is found
- Nullable elements: Properly handles null elements within the array by passing them to the lambda function
Examples:
-- Check if any element is null
SELECT EXISTS(array(1, 2, 3), x -> x IS NULL);
-- Returns: false
-- Check if any element is greater than 2
SELECT EXISTS(array(1, 2, 3), x -> x > 2);
-- Returns: true
-- Check with null elements
SELECT EXISTS(array(1, null, 3), x -> x IS NULL);
-- Returns: true
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(exists(col("array_column"), x => x > lit(10)))
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
- Scala Serde: Add expression handler in
spark/src/main/scala/org/apache/comet/serde/
- Register: Add to appropriate map in
QueryPlanSerde.scala
- Protobuf: Add message type in
native/proto/src/proto/expr.proto if needed
- Rust: Implement in
native/spark-expr/src/ (check if DataFusion has built-in support first)
Additional context
Difficulty: Medium
Spark Expression Class: org.apache.spark.sql.catalyst.expressions.ArrayExists
Related:
ArrayForAll - Checks if all elements satisfy a predicate
ArrayFilter - Filters array elements based on a predicate
ArrayTransform - Transforms array elements using a lambda function
This issue was auto-generated from Spark reference documentation.
What is the problem the feature request solves?
Comet does not currently support the Spark
array_existsfunction, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.The
ArrayExistsexpression checks whether any element in an array satisfies a given predicate function. It applies a lambda function to each element of the array and returns true if at least one element makes the predicate evaluate to true.Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.
Describe the potential solution
Spark Specification
Syntax:
Arguments:
argumentfunctionfollowThreeValuedLogicReturn Type:
BooleanType- Returns true if any element satisfies the predicate, false if none do, or null in certain null-handling scenarios.Supported Data Types:
ArrayTypewith elements of any data typeEdge Cases:
followThreeValuedLogicis true and lambda returns null for some elements but no element returns true, the overall result is nullfollowThreeValuedLogicis false, null lambda results are ignored and only affect final result if no true value is foundExamples:
Implementation Approach
See the Comet guide on adding new expressions for detailed instructions.
spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scalanative/proto/src/proto/expr.protoif needednative/spark-expr/src/(check if DataFusion has built-in support first)Additional context
Difficulty: Medium
Spark Expression Class:
org.apache.spark.sql.catalyst.expressions.ArrayExistsRelated:
ArrayForAll- Checks if all elements satisfy a predicateArrayFilter- Filters array elements based on a predicateArrayTransform- Transforms array elements using a lambda functionThis issue was auto-generated from Spark reference documentation.