-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
From #13687 (comment), where it came up while adding documentation on how to use UDFs in Python. When just wanting to invoke a UDF with arrays, you can do pc.call_function("my_udf", [pc.field("a")]).
But if you want to use your UDF in a context that needs an expression (eg a dataset projection), you need to be able to call the UDF with expressions as argument. And currently, the pc.call_function doesn't work that way (it expects actual, materialized arrays/scalars as arguments). As a workaround, you can use the private Expression._call:
# doesn't work with expressions
>>> pc.call_function("my_udf", [pc.field("col")])
...
TypeError: Got unexpected argument type <class 'pyarrow._compute.Expression'> for compute function
# workaround
>>> pc.Expression._call("my_udf", [pc.field("col")])
<pyarrow.compute.Expression my_udf(col)>So we should try to improve the usability here. Some options:
-
See if we can change
pc.call_functionto also accept Expressions as arguments -
Make the
_callpublic, so one can dopc.Expression.call("my_udf", [..])
Reporter: Joris Van den Bossche / @jorisvandenbossche
Note: This issue was originally created as ARROW-17827. Please see the migration documentation for further details.