Skip to content

[Python] Allow calling UDF kernels with field/scalar expressions #33048

@asfimport

Description

@asfimport

From #13687 (comment), where it came up while adding documentation on how to use UDFs in Python. When just wanting to invoke a UDF with arrays, you can do pc.call_function("my_udf", [pc.field("a")]).

But if you want to use your UDF in a context that needs an expression (eg a dataset projection), you need to be able to call the UDF with expressions as argument. And currently, the pc.call_function doesn't work that way (it expects actual, materialized arrays/scalars as arguments). As a workaround, you can use the private Expression._call:

# doesn't work with expressions
>>> pc.call_function("my_udf", [pc.field("col")])
...
TypeError: Got unexpected argument type <class 'pyarrow._compute.Expression'> for compute function
# workaround
>>> pc.Expression._call("my_udf", [pc.field("col")])
<pyarrow.compute.Expression my_udf(col)>

So we should try to improve the usability here. Some options:

  • See if we can change pc.call_function to also accept Expressions as arguments

  • Make the _call public, so one can do pc.Expression.call("my_udf", [..])

    cc @westonpace @vibhatha

Reporter: Joris Van den Bossche / @jorisvandenbossche

Note: This issue was originally created as ARROW-17827. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions