Summary
When using array_remove with a null element to remove, Comet returns null instead of removing all null elements from the array as Spark does.
Spark Specification
According to Spark's array_remove behavior:
- Null removal element: Removes all null elements from the array
- Null array input: Returns null (null intolerant behavior)
Example:
SELECT array_remove(array(1, null, 2, null, 3), null);
-- Spark returns: [1, 2, 3]
Current Comet Behavior
The current implementation in CometArrayRemove uses a CASE WHEN pattern:
val caseWhenExpr = ExprOuterClass.CaseWhen
.newBuilder()
.addWhen(isNotNullExpr.get) // if element is not null
.addThen(arrayRemoveScalarExpr.get) // call array_remove_all
.setElseExpr(nullLiteralProto.get) // else return null <-- BUG
.build()
When the element to remove is null, instead of removing null elements, Comet returns null for the entire result.
Expected Behavior
When array_remove(arr, null) is called, Comet should remove all null elements from the array (same as Spark).
Suggested Fix
Modify the implementation to:
- Detect when the element is null at evaluation time
- Call a function that removes null values from the array (similar to
array_compact)
- Or create a custom Rust implementation that handles null elements correctly
Impact
This could cause data correctness issues for users who rely on array_remove to filter out null values from arrays.
Note: This issue was generated with AI assistance.
Summary
When using
array_removewith a null element to remove, Comet returns null instead of removing all null elements from the array as Spark does.Spark Specification
According to Spark's
array_removebehavior:Example:
Current Comet Behavior
The current implementation in
CometArrayRemoveuses a CASE WHEN pattern:When the element to remove is null, instead of removing null elements, Comet returns null for the entire result.
Expected Behavior
When
array_remove(arr, null)is called, Comet should remove all null elements from the array (same as Spark).Suggested Fix
Modify the implementation to:
array_compact)Impact
This could cause data correctness issues for users who rely on
array_removeto filter out null values from arrays.