-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
Description
Currently, Gandiva has some internal stub functions, which are registered via two steps:
- function metadata are registered in multiple internal registry classes, such as:
1.1)GetStringFunctionRegistryinfunction_registry_string.cc
1.2)GetMathOpsFunctionRegistryinfunction_registry_math_ops.cc
1.3) etc - The stub functions' implementation are mapped to LLVM engine in:
2.1)ExportedStubFunctions::AddMappings
2.2)ExportedHashFunctions::AddMappings
2.3)ExportedStringFunctions::AddMappings
There are some issues with this organizing approach:
- When adding/removing a stub function, developers need to look for and change two places, which is not convenient. For example, when adding a new string function, both
GetStringFunctionRegistryinfunction_registry_string.ccandExportedStringFunctions::AddMappingsingdv_string_function_stubs.ccneed to be modified - The LLVM type information provided in the
AddMappingsAPI is similar as the function signature metadata provided inGetXXXFunctionRegistryAPI, which cost more time and effort for developers to maintain.
Proposal
In PR #38632, we added the capability to programmatically map function signature NativeFunction into LLVM-typed args. So the LLVM args for each function in AddMappings could be mapped directly from its NativeFunction.
This proposal plans to use FunctionRegistry's Register C function API to internally register the existing stub functions, and this will leverage the above mapping capability, and for stub functions, we could combine the metadata registration and implementation mapping into one step, so that:
- stub function metadata and implementation are associated and registered in one place, and developers don't have to look for two places for maintainance
- when adding/updating a stub function's signature, there is no need for developers to manually map arrow data type signature into LLVM-typed args, which makes it easier to maintain and it is less error prone. And this will simplify the code a lot as well, it is expected to reduce 500~1000 lines of code via this change.
- After this refactoring, the code like below is not necessary and is expected to be removed:
arrow/cpp/src/gandiva/gdv_function_stubs.cc
Lines 1197 to 1212 in 47dadb0
// mask-show-n mask_args = { types->i64_type(), // context types->i8_ptr_type(), // data types->i32_type(), // data_length types->i32_type(), // n_to_show types->i32_ptr_type() // out_length }; engine->AddGlobalMappingForFunc( "gdv_mask_show_first_n_utf8_int32", types->i8_ptr_type() /*return_type*/, mask_args, reinterpret_cast<void*>(gdv_mask_show_first_n_utf8_int32)); engine->AddGlobalMappingForFunc( "gdv_mask_show_last_n_utf8_int32", types->i8_ptr_type() /*return_type*/, mask_args, reinterpret_cast<void*>(gdv_mask_show_last_n_utf8_int32)); - some of the stub functions, are not exposed to end users in expressions, and they are expected to be used only internally, such as the several context helper functions, some
inrelated functions, and thegdv_fn_time_with_zonetime casting function. For these functions, we may have to leave them as they are since they don't even have the corresponding metadata. There are probably some more functions like them, it seems not easy to find them all, and we have to go through all the stub functions to figure out.
- After this refactoring, the code like below is not necessary and is expected to be removed:
Component(s)
C++ - Gandiva