Skip to content

[Bug] array_remove returns null when element is null instead of removing null elements #3173

@andygrove

Description

@andygrove

Summary

When using array_remove with a null element to remove, Comet returns null instead of removing all null elements from the array as Spark does.

Spark Specification

According to Spark's array_remove behavior:

  • Null removal element: Removes all null elements from the array
  • Null array input: Returns null (null intolerant behavior)

Example:

SELECT array_remove(array(1, null, 2, null, 3), null);
-- Spark returns: [1, 2, 3]

Current Comet Behavior

The current implementation in CometArrayRemove uses a CASE WHEN pattern:

val caseWhenExpr = ExprOuterClass.CaseWhen
  .newBuilder()
  .addWhen(isNotNullExpr.get)    // if element is not null
  .addThen(arrayRemoveScalarExpr.get)  // call array_remove_all
  .setElseExpr(nullLiteralProto.get)    // else return null  <-- BUG
  .build()

When the element to remove is null, instead of removing null elements, Comet returns null for the entire result.

Expected Behavior

When array_remove(arr, null) is called, Comet should remove all null elements from the array (same as Spark).

Suggested Fix

Modify the implementation to:

  1. Detect when the element is null at evaluation time
  2. Call a function that removes null values from the array (similar to array_compact)
  3. Or create a custom Rust implementation that handles null elements correctly

Impact

This could cause data correctness issues for users who rely on array_remove to filter out null values from arrays.


Note: This issue was generated with AI assistance.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions