-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
Description
In issue #37753, Gandiva provides the support to register external functions so that developers can register third party functions to use in Gandiva expressions. However, the supported external functions need to be compiled to LLVM IR so that they can be registered and used. This limitation causes troubles sometimes, in particular when the third party function has some non trivial dependency such as an HTTP library, because it requires compiling all dependent libraries into LLVM IR and compile all the IRs during runtime, which is slow.
Proposal
To address this limitation, I propose to allow registering external C functions to Gandiva, so that Gandiva expression can use these functions without relying on compiling third party functions into LLVM IR. Within Gandiva project, there are already such functions, and they are called stub function internally, but this capability is not exposed to external functions yet.
The following APIs are proposed to be added to the FunctionRegistry API for this purpose:
arrow::Status Register(NativeFunction func, void* c_function_ptr, std::optional<FunctionHolderMaker> function_holder_maker = std::nullopt)- register a C function into the function registry
@param functhe registered function's metadata@param c_function_ptrthe function pointer to the registered function's implementation@param function_holder_makeroptional, this will be used as the function holder if the function requires a function holder, whereusing FunctionHolderMaker = std::function<arrow::Result<std::shared_ptr<gandiva::FunctionHolder>>(const FunctionNode& function_node)>
const std::vector<std::pair<NativeFunction, void*>>& GetCFunctions() const
* get a list of C functions saved in the registry
Benefits
- Complex functions that require some dependent libraries can be used without performance penalty. Previously LLVM IR based functions is slow to construct during runtime if the generated LLVM IR is big (> several MB), and since constructing LLVM module requires copying all LLVM bitcode into the modules, the more functions are implemented in LLVM IR, the slower constructing the LLVM module is (unless selective IR loading is supported)
- LLVM IR does allow users to develop a third party function using different languages. However, complex external functions may use APIs in standard libraries in a language, which makes it necessary to compile that language's standard library into LLVM IR as well. This may not be possible in many languages, additionally, the generated LLVM IR will be too big (dead code elimination doesn't help too much about this as far as I can tell). If we allow using C functions, we could overcome this issue since the standard library usage is typically part of the Gandiva's caller program (statically linked or dynamically loaded)
- Certain capabilities, like using thread local variables, are not available in current Gandiva's JIT engine (MCJIT engine) when JIT-compiling LLVM IR. We have to upgrade MCJIT engine to Orc v2 engine ([C++][Gandiva] Migration JIT engine from MCJIT to LLJIT #37848) to support this. Some libraries uses thread local variables, such as Rust's
std::collections::HashMap, which internally uses thread local variable, and this makes it easily running into this restriction if we are authoring a third party function using Rust. But if we allow C functions, there won't be such limitation.
Notes
Gandiva internally calls this capabilityUpdated to call itstub function, but I am not sure if this a proper name externally, and I call it "C interface function" currently, please let me know if there is a better nameC functionaccording to PR comments (GH-38589: [C++][Gandiva] Support registering external C functions #38632 (review))
Component(s)
C++ - Gandiva