Skip to content

Implement method to apply scalar or aggregate function to Array elements #15882

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge?

Suppose I have an DataFrame in which one column contains arrays. I wish to be able to apply any scalar expr to each value of that array and return an array out. For example I would like to be able to apply an abs() function and convert data such as this:

DataFrame()
+--------------+-------------+
| a            | abs(a)      |
+--------------+-------------+
| [-10, 5, 13] | [10, 5, 13] |
| [2]          | [2]         |
| [-3, 1]      | [3, 1]      |
+--------------+-------------+

Additionally it would be amazing to be able to apply any aggregate function to an array element.

DataFrame()
+--------------+--------+
| a            | sum(a) |
+--------------+--------+
| [-10, 5, 13] | 8      |
| [2]          | 2      |
| [-3, 1]      | 2      |
+--------------+--------+

Describe the solution you'd like

This is similar to the spark transform operation. It is very powerful for highly structured data. I don't know the best form that that functions would take, but it would be even more powerful if we could do element-by-element operations across more than one column in the dataframe. There are many use cases where you will have columns of array elements of the same length.

Describe alternatives you've considered

The current status quo is to either write a UDF to handle these on a case by case basis or to do an unnest and group by. The unnest and group by can be an expensive operation.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions