Is your feature request related to a problem or challenge?
Suppose I have an DataFrame in which one column contains arrays. I wish to be able to apply any scalar expr to each value of that array and return an array out. For example I would like to be able to apply an abs() function and convert data such as this:
DataFrame()
+--------------+-------------+
| a | abs(a) |
+--------------+-------------+
| [-10, 5, 13] | [10, 5, 13] |
| [2] | [2] |
| [-3, 1] | [3, 1] |
+--------------+-------------+
Additionally it would be amazing to be able to apply any aggregate function to an array element.
DataFrame()
+--------------+--------+
| a | sum(a) |
+--------------+--------+
| [-10, 5, 13] | 8 |
| [2] | 2 |
| [-3, 1] | 2 |
+--------------+--------+
Describe the solution you'd like
This is similar to the spark transform operation. It is very powerful for highly structured data. I don't know the best form that that functions would take, but it would be even more powerful if we could do element-by-element operations across more than one column in the dataframe. There are many use cases where you will have columns of array elements of the same length.
Describe alternatives you've considered
The current status quo is to either write a UDF to handle these on a case by case basis or to do an unnest and group by. The unnest and group by can be an expensive operation.
Additional context
No response
Is your feature request related to a problem or challenge?
Suppose I have an DataFrame in which one column contains arrays. I wish to be able to apply any scalar expr to each value of that array and return an array out. For example I would like to be able to apply an
abs()function and convert data such as this:Additionally it would be amazing to be able to apply any aggregate function to an array element.
Describe the solution you'd like
This is similar to the spark
transformoperation. It is very powerful for highly structured data. I don't know the best form that that functions would take, but it would be even more powerful if we could do element-by-element operations across more than one column in the dataframe. There are many use cases where you will have columns of array elements of the same length.Describe alternatives you've considered
The current status quo is to either write a UDF to handle these on a case by case basis or to do an unnest and group by. The unnest and group by can be an expensive operation.
Additional context
No response