-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge?
As part of making DataFusion even more customizable (#8045), it is valuable to let system designers mix and match different packages of functions to get the precise behavior they want (e.g. postgres style to_date or spark style to_date).
To support this functionality as well as to ensure the ScalarUDF API exposes the full power of DataFusion, we are in the process of extracting the "built in" functions out of the core and into separate crates.
This epic tracks the work to actually move the functions out of the core datafusion crate (spread through datafusion_expr and datafusion-physical-expr and into the new datafusion-functions / datafusion-functions-array crates
Tasks:
Here is list of many of the items necessary to complete this transition. Eventually there should be tickets for all tasks, and there are tickets for some already, but I don't want to make 100s of tickets all at once. I plan to make more as we make it through more of this project.
Anyone should feel free to make other tickets if they want to help with items below.
math_expressions
These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/math/mod.rs
- Move
nullifandisnanto datafusion-functions #9216 - Move
abstodatafusion_functions#9286 - refactor: move acos() to function crate #9297
- Abs, Asin,
- Atan, Atan2, Acosh, Asinh, Atanh,
- Cbrt, Ceil, Cos, Cosh, Degrees
- Exp, Factorial Move
ceil,exp,factorialtodatafusion-functionscrate #9939 - Floor, Gcd, Lcm, Ln, Log, Log10, Log2, Pi, Power,
- Radians, Signum, Sin, Sinh, Sqrt,
- Tan, Tanh, Trunc, Cot, Round, iszero
array_expressions
Note that given the size and specialization of these functions are put in their own subcrate, datafusion-functions-array
- ArrayToString Create
datafusion-functions-arraycrate and moveArrayToStringfunction into it #9113 - ArrayDims, ArrayNdims, Cardinality move ArrayDims, ArrayNdims and Cardinality to datafusion-function-crate #9425
- ArrayHas, ArrayHasAll, ArrayHasAny Port ArrayHas family to
functions-array#9496 - MakeArray, ArrayAppend, ArrayPrepend, ArrayConat move make_array array_append array_prepend array_concat function to datafusion-functions-array crate #9343 Move
make_arrayto datafusion-functions #9288 move make_array array_append array_prepend array_concat function to datafusion-functions-array crate #9504 - Range, GenSeries port range function and change gen_series logic #9352
- ArrayEmpty ArrayLength port array_empty and array_length to datafusion-function-array crate #9510
- Flatten port flatten to datafusion-function-array #9523
- StringToArray Port
StringToArraytofunction-arrays#9497 - ArraySort Port
ArraySorttofunction-arrayssubcrate #9551 - ArrayDistinct Port
ArrayDistincttofunctions-arraysubcrate #9549 - ArrayRepeat Port
ArrayRepeattofunctions-arraysubcrate #9565 - ArrayResize Port
ArrayResizetofunctions-arraysubcrate #9570 - ArrayElement, ArraySlice, ArrayPopFront, ArrayPopBack Port ArrayElem/Slice/PopFront/Back into
functions-array#9615 - ArrayPosition, ArrayPositions Port
ArrayPositionandArrayPositionstofunctions-arraysubcrate #9617 - ArrayReverse Add
array_reversefunction to datafusion-function-* crate #9630 - ArrayIntersect, ArrayUnion Port Array Union and Intersect to
functions-array#9629 - ArrayExcept Port
ArrayExcepttofunctions-arraysubcrate #9634 - ArrayRemove, ArrayRemoveN, ArrayRemoveAll Port
ArrayRemove,ArrayRemoveN,ArrayRemoveAlltofunctions-arraysubcrate #9635 - ArrayReplace, ArrayReplaceN, ArrayReplaceAll move array_replace family functions to datafusion-function-array crate #9651
-
MakeArray: construct an array from columns (union/except depends on this) - Move
datafusion_array_functionspecific rewrite rules like todatafusion_functions_arraycrate #9519
Core functions
These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/core/mod.rs
- Create
coremodule, extractnullif: Movenullifandisnanto datafusion-functions #9216 - Move
arrow_casttodatafusion-functionscrate #9287 - Move
ArrowTypeOf: return the arrow type of a value Portarrow_typeofto datafusion-function #9524 -
Coalesce: return the first non-null value -
Struct: Create a struct -
NullIf: return null if the two values are equal -
Random: return a random number -
Nanvl: return the first non-NaN value
crypto_expressions
These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/crypto/mod.rs
- Create
cryptomodule indatafusion/functions/src/cryptoandcrypto_expressionsfeature flag, movedigestfunction - Digest, MD5, SHA224, SHA256, SHA384, SHA512
string_expressions
These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/string/mod.rs
- concat, concat_ws, ends_with, initcap Move take concat, concat_ws, ends_with, initcap, to datafusion-functions #9540
- Create
stringmodule indatafusion/functions/src/stringandstring_expressionsfeature flag, moveasciifunction - ascii, bit_length, btrim, chr,
- instr, lower, ltrim, octet_length,
- repeat, replace, rtrim, split_part,
- starts_with, to_hex, trim, upper,
- levenshtein, uuid, overlay
unicode_expressions
These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/unicode/mod.rs
- Create
unicodemodule indatafusion/functions/src/unicodeandunicode_expressionsfeature flag, movecharlengthfunction - CharLength,
- Left, Lpad, Reverse, Right, Rpad,
- Strpos, Substr,
- Translate, SubstrIndex, FindInSet
regex_expressions
These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/regexp/mod.rs
- port reg_related function #9328
- RegexpMatch, RegexpReplace
- RegexpLike
datetime_expressions
These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/datetime/mod.rs
- Create
datetimemodule indatafusion/functions/src/datetimeanddatetime_expressionsfeature flag, movedate_part - Move the to_timestamp* functions to datafusion-functions #9291
- port benchmarks to datafusion-functions crate
- date_part, date_trunc, date_bin,
- to_timestamp, to_timestamp_millis, to_timestamp_micros, to_timestamp_nanos, to_timestamp_seconds,
- from_unixtime, now, current_date, current_time
Infrastructure
Describe alternatives you've considered
No response
Additional context
The organization was discussed in #9100