Skip to content

[Epic] Port BuiltInFunctons to datafusion-functions-* crates #9285

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

As part of making DataFusion even more customizable (#8045), it is valuable to let system designers mix and match different packages of functions to get the precise behavior they want (e.g. postgres style to_date or spark style to_date).

To support this functionality as well as to ensure the ScalarUDF API exposes the full power of DataFusion, we are in the process of extracting the "built in" functions out of the core and into separate crates.

This epic tracks the work to actually move the functions out of the core datafusion crate (spread through datafusion_expr and datafusion-physical-expr and into the new datafusion-functions / datafusion-functions-array crates

Tasks:

Here is list of many of the items necessary to complete this transition. Eventually there should be tickets for all tasks, and there are tickets for some already, but I don't want to make 100s of tickets all at once. I plan to make more as we make it through more of this project.

Anyone should feel free to make other tickets if they want to help with items below.

math_expressions

These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/math/mod.rs

array_expressions

Note that given the size and specialization of these functions are put in their own subcrate, datafusion-functions-array

Core functions

These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/core/mod.rs

crypto_expressions

These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/crypto/mod.rs

  • Create crypto module in datafusion/functions/src/crypto and crypto_expressions feature flag, move digest function
  • Digest, MD5, SHA224, SHA256, SHA384, SHA512

string_expressions

These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/string/mod.rs

  • concat, concat_ws, ends_with, initcap Move take concat, concat_ws, ends_with, initcap, to datafusion-functions #9540
  • Create string module in datafusion/functions/src/string and string_expressions feature flag, move ascii function
  • ascii, bit_length, btrim, chr,
  • instr, lower, ltrim, octet_length,
  • repeat, replace, rtrim, split_part,
  • starts_with, to_hex, trim, upper,
  • levenshtein, uuid, overlay

unicode_expressions

These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/unicode/mod.rs

  • Create unicode module in datafusion/functions/src/unicode and unicode_expressions feature flag, move charlength function
  • CharLength,
  • Left, Lpad, Reverse, Right, Rpad,
  • Strpos, Substr,
  • Translate, SubstrIndex, FindInSet

regex_expressions

These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/regexp/mod.rs

datetime_expressions

These should be located in the datafusion-functions crate (source link)
Code location: https://github.com/apache/arrow-datafusion/blob/main/datafusion/functions/src/datetime/mod.rs

  • Create datetime module in datafusion/functions/src/datetime and datetime_expressions feature flag, move date_part
  • Move the to_timestamp* functions to datafusion-functions #9291
  • port benchmarks to datafusion-functions crate
  • date_part, date_trunc, date_bin,
  • to_timestamp, to_timestamp_millis, to_timestamp_micros, to_timestamp_nanos, to_timestamp_seconds,
  • from_unixtime, now, current_date, current_time

Infrastructure

Describe alternatives you've considered

No response

Additional context

The organization was discussed in #9100

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions