ARROW-17613: [C++] Add function execution API for a preconfigured kernel #14043

rtpsw · 2022-09-05T10:46:39Z

See https://issues.apache.org/jira/browse/ARROW-17613

github-actions · 2022-09-05T10:47:01Z

https://issues.apache.org/jira/browse/ARROW-17613

rtpsw · 2022-09-07T10:26:32Z

@lidavidm, are you a good person to review this? or can you suggest someone?

lidavidm

The idea sounds reasonable to me. I'd like to see unit tests here.

Possibly expression evaluation can take advantage of this? Right now it manually resolves a kernel and evaluates it, being able to replace that would help prove out the API/implementation here

I'm also curious what @pitrou or @bkietz think

cpp/src/arrow/compute/function.h

rtpsw · 2022-09-08T09:27:47Z

The idea sounds reasonable to me. I'd like to see unit tests here.

Will do.

Possibly expression evaluation can take advantage of this?

Perhaps. My motivation for this change is related to UDFs, which would have their kernel bound once then executed multiple times over a stream of batches.

Right now it manually resolves a kernel and evaluates it, being able to replace that would help prove out the API/implementation here

In principle, the new code gets covered through existing unit tests because the original function API now goes through the new code for kernel-binding and execution. I'll add unit tests specific to the new code.

cpp/src/arrow/compute/exec.h

cpp/src/arrow/compute/function.h

cpp/src/arrow/compute/exec.h

pitrou

Adding these APIs is a welcome improvement. My comments above tend to mirror @lidavidm 's own observations.

bkietz

I'm not sure what purpose this serves; when will we be able to use this? When executing an ExecPlan for example we already resolve all the kernels and types in Expression::Bind, then re-use those kernels for each input batch. Maybe this would be better as adding ExecuteScalarOrVectorExpression?

cpp/src/arrow/compute/function.cc

rtpsw · 2022-09-20T18:35:43Z

I'm not sure what purpose this serves; when will we be able to use this? When executing an ExecPlan for example we already resolve all the kernels and types in Expression::Bind, then re-use those kernels for each input batch. Maybe this would be better as adding ExecuteScalarOrVectorExpression?

The focus here is on the function execution API, i.e., a lower-level API than the plan execution API. Currently, the function execution API goes through kernel selection on each invocation, because the types of the passed arguments may change each time. This PR adds a faster-path for executing a preconfigured kernel when the argument types are known to be fixed across invocations. Note that, in general, these invocations need not be over a stream of batches, like in an execution plan, but could be dynamically driven.

Regarding when to use this: First, a user may directly use this where they would use the original function execution API. Second, as noted above, my motivation for this is related to UDFs, where their kernel would be preconfigured once then executed multiple times over a stream of batches (the kernel state ends up holding Python stuff). It's possible this kernel-preconfiguration can be integrated into expression binding too; I haven't looked into this.

rtpsw · 2022-09-22T13:14:17Z

I'd like to see unit tests here.

Will do.

bkietz · 2022-09-22T16:04:10Z

First, a user may directly use this where they would use the original function execution API.

I understand, but when do users touch the function execution API? I think that'd primarily be through the python or R bindings to handle ad-hoc cases like adding two arrays together... and in that case, constructing a FunctionExecutor would not be useful since the user input time delay will greatly outweigh kernel lookup.

A FunctionExecutor would only be useful when executing the same function multiple times- for example when applied to multiple batches from a stream of data. What I'd like to hear is when that's beneficial and isn't served by construction of an ExecPlan.

Second, as noted above, my motivation for this is related to UDFs, where their kernel would be preconfigured once then executed multiple times over a stream of batches (the kernel state ends up holding Python stuff). It's possible this kernel-preconfiguration can be integrated into expression binding too; I haven't looked into this.

Kernel preconfiguration is precisely the function of Expression::Bind, among other things:

invokes Function::DispatchBest to acquire a kernel and types for implicit casts
caches the Kernel and its state for later use in execution
note that currently Expression execution assumes only scalar functions are referenced and that KernelState is not mutated

In short, it seems that we won't be able to use FunctionExecutor where it seems to me we'd most like to see UDF capabilities: in filter and project expressions in ExecPlans. Since that will eventually require refactoring/extension of the Expression utilities, I'd prefer we start there so that we can have a better picture of the ways ExecPlan etc will need to change to accommodate UDFs. Building parallel streaming execution functionality which will ultimately need to be accommodated or assimilated by ExecPlans seems like much more churn.

rtpsw · 2022-09-22T17:28:19Z

@bkietz, before I answer your points, I should note that the code here is extracted from a working end-to-end (Ibis/Substrait/PyArrow) prototype for UDFs and UDTs, which are UDFs that provide a stream of tabular data, that I developed. While this doesn't mean its design would be accepted as is (and I do welcome feedback on it, or parts of it that I extract), there is currently no alternative working design. I'd expect to see a comparable alternative put forward, so I could evaluate pros and cons in the context of end-to-end support for UDFs and UDTs. In my mind, just evolving expression binding is not a comparable alternative.

I understand, but when do users touch the function execution API?

I said "user" but didn't mean "end-user" necessarily; I should have said "caller" for clarity. Still, the pre-PR function execution API is public, so we should assume it is used by end-users and the burden is actually in claiming the opposite (e.g., for deprecation purposes). The fact that there exists a higher-level API, which may be convenient for a lot of use cases (like streaming), does not change this.

Granted, there is also a burden of showing the proposed API is useful. I could point you to how the prototype uses this new function execution API, if that would be helpful. The general idea is that the end-user is driving from PyArrow and registers a UDT. The UDT is a Python-implemented function that may be invoked multiple times, each at a different source node in the execution plan. Each such invocation returns a stream object implemented in Python that is managed in a kernel state. Invoking the kernel returns tabular data that is part of the dynamically generated stream. The new function execution API is designed to enable this setup.

A FunctionExecutor would only be useful when executing the same function multiple times

Yes, on arguments of the same types.

What I'd like to hear is when that's beneficial and isn't served by construction of an ExecPlan.

AFAICS, the above described UDT functionality cannot be served by an ExecPlan nor by expression binding.

Kernel preconfiguration is precisely the function of Expression::Bind, among other things:

* [invokes Function::DispatchBest](https://github.com/apache/arrow/blob/40ec95646962cccdcd62032c80e8506d4c275bc6/cpp/src/arrow/compute/exec/expression.cc#L372) to acquire a kernel and types for implicit casts

* [caches the Kernel and its state](https://github.com/apache/arrow/blob/40ec95646962cccdcd62032c80e8506d4c275bc6/cpp/src/arrow/compute/exec/expression.h#L54-L58) for later use in execution

* note that currently Expression execution assumes only scalar functions are referenced and that KernelState is not mutated

IIUC, C++ code (for expression binding) is the driver here. In the prototype's design, it is end-user code, via PyArrow, who is the driver. Also, IIUC, you have a call-expression in mind and it is designed to be stateless whereas the stream generator in the prototype is stateful. It's not clear how to reconcile these differences in a proposal based on expression binding. At least this will need to be explained in the context of a more complete description of an alternative.

westonpace · 2022-09-24T15:53:37Z

A few thoughts:

I understand, but when do users touch the function execution API? I think that'd primarily be through the python or R bindings to handle ad-hoc cases like adding two arrays together... and in that case, constructing a FunctionExecutor would not be useful since the user input time delay will greatly outweigh kernel lookup.

I'm pretty sure there are existing cases where users have interacted directly with functions and not via an execution plan (I think @marsupialtail does this with Take and I seem to recall @drin using the compute API directly as well). I'm not sure those cases couldn't be converted to an execution plan but they do exist. IIRC these are cases where the user already has an engine / execution plan of their own and they are simply trying to integrate Arrow compute.

That being said, if we were going to go down this road, I think it would be more valuable to have an "expression executor" and not merely a "function executor". Also, removing the lookup / argument resolution time is nice but the biggest win would be removing allocations for temporaries / outputs but that can be deferred for a future PR :).

@bkietz, before I answer your points, I should note that the code here is extracted from a working end-to-end (Ibis/Substrait/PyArrow) prototype for UDFs and UDTs, which are UDFs that provide a stream of tabular data, that I developed. While this doesn't mean its design would be accepted as is (and I do welcome feedback on it, or parts of it that I extract), there is currently no alternative working design. I'd expect to see a comparable alternative put forward, so I could evaluate pros and cons in the context of end-to-end support for UDFs and UDTs. In my mind, just evolving expression binding is not a comparable alternative.

I think the alternative (and this may be a misunderstanding of your goal) is that a UDT not be put into the function registry, even if it looks like a UDF elsewhere (e.g. Substrait). As an example, consider an embedded python Substrait UDT (which does very much look like a UDF). When we consume that plan we would convert that embedded UDT into a function. Let's say it is a python function that returns an iterator of tabular data. Instead of creating a stateful function to poll that iterator we could put that iterator into a source node, probably one of the source nodes you just created in #14207. The it_maker would be a wrapper around your python function that returns a wrapper around your python iterable (I am fairly certain we wrap python iterators in either RecordBatchReader or AsyncGenerator<RecordBatch>) somewhere else too. This removes the overhead of the function registry entirely.

rtpsw · 2022-09-25T11:53:19Z

I think it would be more valuable to have an "expression executor" and not merely a "function executor".

While I don't know enough about expression use cases, this sounds right to me in the wider context, i.e., outside of just UDFs/UDTs. Does this mean the expression executor should be built on top of the function executor proposed here?

Also, removing the lookup / argument resolution time is nice but the biggest win would be removing allocations for temporaries / outputs but that can be deferred for a future PR :).

I agree that removing allocations etc would be a significant win. What do you think is missing from this PR to do so? At least at the function executor API level, I believe repeated invocations of FunctionExecutor::Execute(const std::vector<Datum>& args, int64_t passed_length) should be able to avoid repeated allocations.

I think the alternative (and this may be a misunderstanding of your goal) is that a UDT not be put into the function registry.

This is an interesting alternative to compare to. I'll try to explain below the differences between this and the one proposed in this PR. I think each of the two alternatives has its merits, and we'll just need to choose whether we want one or both.

The source-node (Weston's) proposal for UDTs has several pros that I can see. It requires less changes to Arrow in an end-to-end solution, probably just in the Substrait engine component. It bypasses the need to manage nested registries for UDTs (though these are still needed for UDFs). And its PyArrow part builds on fewer Arrow APIs, probably just the source-node related APIs. OTOH, it also has some cons. It requires a separate source-node per UDT. It does not directly support composing UDFs (say, from a library) with a UDT. And it does not directly support ordering of UDTs within one execution.

The function-executor (my) proposal for UDTs, besides supporting the expression executor feature Weston mentioned, has pretty much the reverse pros and cons. I'll elaborate on the less trivial points about UDT composition and ordering. In the prototype, from Arrow's point of view, a UDT is defined as a function that returns a generator of tabular data. However, from Substrait's point of view, a UDT is modeled like a monad, which abstracts out side-effects. This enables composition of UDTs and UDFs in a single expression (rather than placing each UDT in a separate node), kind of like functions and monads can be composed in a functional programming language. For example, one can consider an expression like a * prng(1) + b * prng(2), where prng(seed) generates a pseudorandom sequence using seed. This expression yields a different value on each evaluation, but the same sequence of values when restarted. While this example is slightly contrived, and it should (and can easily) be extended to tabular data, it is intended for demonstrating composition and ordering with UDTs.

rtpsw · 2022-09-26T15:57:49Z

Any idea how to deal with the error in this job: undefined reference to 'arrow::compute::internal::GenericOptionsType::Serialize(arrow::compute::FunctionOptions const&) const'? My guess is the error is triggered by my recent commit but the fix should be around GenericOptionsType.

pitrou · 2022-09-26T16:02:33Z

@rtpsw It probably means that the GenericOptionsType class should be marked for export by the DLL

Fix doc Co-authored-by: Antoine Pitrou <pitrou@free.fr>

rtpsw · 2022-11-06T14:49:02Z

OK, I believe I have addressed your remaining comments. Let me know if you see anything more.

pitrou

Thanks a lot @rtpsw . The API looks ok to me, I wrote some comments on the implementation and tests.

cpp/src/arrow/compute/function.h

cpp/src/arrow/compute/exec.h

cpp/src/arrow/compute/function.cc

pitrou · 2022-11-07T15:55:26Z

cpp/src/arrow/compute/function_test.cc

+    for (size_t i = 0; i < 2; i++) {
+      DCHECK(args.values[i].is_array());
+      const ArraySpan& array = args.values[i].array;
+      DCHECK_EQ(*int32(), *array.type);


Use AssertTypeEqual or check the type id.

Suggested change

DCHECK_EQ(*int32(), *array.type);

AssertTypeEqual(int32(), array.type);

or

Suggested change

DCHECK_EQ(*int32(), *array.type);

EXPECT_EQ(array.type->id(), Type::INT32);

cpp/src/arrow/compute/function_test.cc

cpp/src/arrow/compute/exec.h

rtpsw · 2022-11-10T09:37:13Z

@pitrou, the CI job's failure copied below seems to need attention. Let me know if you have an idea about how to fix it.

___________________________ test_cast_table_raises ____________________________
    def test_cast_table_raises():
        table = pa.table({'a': [1, 2]})
    
        with pytest.raises(pa.lib.ArrowInvalid):
>           pc.cast(table, pa.int64())
pyarrow\tests\test_compute.py:2931: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pyarrow\compute.py:389: in cast
    return call_function("cast", [arr], options)
pyarrow\_compute.pyx:560: in pyarrow._compute.call_function
    return func.call(args, options=options, memory_pool=memory_pool,
pyarrow\_compute.pyx:355: in pyarrow._compute.Function.call
    result = GetResultValue(
pyarrow\error.pxi:144: in pyarrow.lib.pyarrow_internal_check_status
    return check_status(status)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>   raise ArrowTypeError(message)
E   pyarrow.lib.ArrowTypeError: Tried executing function with non-array, non-scalar type: Table
pyarrow\error.pxi:123: ArrowTypeError

rtpsw · 2022-11-10T10:51:38Z

@pitrou, the CI job's failure copied below seems to need attention.

The reason for the error is that the Arrow function cast is being called with a Table argument whereas arrow::compute::internal::GetFunctionArgumentTypes(...), which is invoked by arrow::compute::<anonymous>::ExecuteInternal and arrow::compute::GetFunctionExecutor, expects array or scalar argument only, as checked by arrow::compute::internal::CheckAllArrayOrScalar(...). However, the current pre-PR Arrow code has the same implementation of arrow::compute::internal::CheckAllArrayOrScalar(...) which is invoked by arrow::compute::<anonymous>::ExecuteInternal before dealing with cast, so one would expect the same error for the current Arrow code. Presumably, the current Arrow code passed CI and the error didn't happen for it. But then, I don't see yet why this PR's code is different. I'd like to understand this before trying to fix.

pitrou · 2022-11-10T10:55:02Z

This is probably because we were returning Status::Invalid before, and we're now returning Status::TypeError.
Returning Status::TypeError is the right thing to do, so we should just change the Python test.

rtpsw · 2022-11-10T12:16:26Z

@pitrou, I think I addressed all issues. Let me know if you see anything more.

pitrou

Thanks for the update. LGTM, let's wait for CI.

pitrou · 2022-11-10T14:57:49Z

CI failures are unrelated.

pitrou · 2022-11-10T14:58:53Z

Thanks a lot @rtpsw !

By the way, we have a long PR queue. If you're interested, you might want to start reviewing some of them.

After [ARROW-17613](https://issues.apache.org/jira/browse/ARROW-17613) (#14043), which made the error message better, we see an error in our tests because the message changed. Authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>

ursabot · 2022-11-11T01:06:34Z

Benchmark runs are scheduled for baseline = 4e99f59 and contender = 18326f9. 18326f9 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️2.37% ⬆️0.07%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️2.05% ⬆️0.14%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 18326f96 ec2-t3-xlarge-us-east-2
[Finished] 18326f96 test-mac-arm
[Finished] 18326f96 ursa-i9-9960x
[Finished] 18326f96 ursa-thinkcentre-m75q
[Finished] 4e99f59d ec2-t3-xlarge-us-east-2
[Finished] 4e99f59d test-mac-arm
[Finished] 4e99f59d ursa-i9-9960x
[Finished] 4e99f59d ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ARROW-17613: [C++] Add function execution API for a preconfigured kernel

3855d43

github-actions bot added the Component: C++ label Sep 5, 2022

rtpsw added 2 commits September 5, 2022 07:15

Fix declarations

7f6c189

cleanup

eb93a9c

lidavidm reviewed Sep 7, 2022

View reviewed changes

cpp/src/arrow/compute/function.h Outdated Show resolved Hide resolved

cpp/src/arrow/compute/function.h Outdated Show resolved Hide resolved