Skip to content

Conversation

@JasonLi-cn
Copy link
Contributor

Which issue does this PR close?

Closes #8182

Rationale for this change

In some cases, I need to determine the output type of the function based on the input arguments of the function, but I cannot do this at present. This is because the ReturnTypeFunction provides only the data types of the input parameters.

What changes are included in this PR?

  • Add ReturnTypeFactory trait
  • ReturnTypeFunction impl ReturnTypeFactory
  • Change UDF, UDAF, UDWF's return_type from ReturnTypeFunction into ReturnTypeFactory

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates labels Nov 15, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JasonLi-cn - this is a great start. THe complex_udf is especially nice as an example

CC @2010YOUY01 as I think you have been thinking about this area too

}

/// Factory that returns the functions's return type given the input argument types
pub type ReturnTypeFunction =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you considered changing the ReturnTypeFunction to something more like

    Arc<dyn Fn(&[Expr], &dyn ExprSchemable) -> Result<Arc<DataType>> + Send + Sync>;

Now that we have made the fields of ScalarUDF non pub (in #8079) I think we have much greater leeway to improve the API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could introduce the FunctionImplementation trait as proposed here: https://github.com/apache/arrow-datafusion/pull/8046/files#diff-8a327db2db945bcf6ca2b4229885532feae127e94a450600d3fac6ecdc0eeb3fR141

with the more general return types. Something like this perhaps:

/// Convenience trait for implementing ScalarUDF. See [`ScalarUDF::new_from_impl()`]
pub trait FunctionImplementation {
    /// Returns this function's name
    fn name(&self) -> &str;

    /// Returns this function's signature
    fn signature(&self) -> &Signature;

    /// return the return type of this function given the types of the arguments
    fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;

    /// return the return type of this function given the actual arguments. This is
    /// used to implement functions where the return type is a function of the actual
    /// arguments
    fn return_type_from_args(&self, args: &[Expr], schemabe: &dyn ExprSchemable) -> Result<DataType> {
      // default impl would call `Self::return_type`
      todo!()
    }


    /// Invoke the function on `args`, returning the appropriate result
    fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;
}

@alamb
Copy link
Contributor

alamb commented Nov 15, 2023

Another idea @JasonLi-cn might be to pursue this idea #8051 to "specialize" the function based on its arguments (and in this case the specialization could potentially also include the constants 🤔 )

@alamb alamb marked this pull request as draft November 17, 2023 19:08
@alamb
Copy link
Contributor

alamb commented Nov 17, 2023

marking as draft as I think @JasonLi-cn is working on feedback now

@JasonLi-cn
Copy link
Contributor Author

marking as draft as I think @JasonLi-cn is working on feedback now

Ok, I will try to complete this task

@JasonLi-cn JasonLi-cn closed this Apr 12, 2024
@alamb
Copy link
Contributor

alamb commented Apr 12, 2024

FOr anyone following along, this was implemented in a different way, see #8624

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UDF/UDAF/UDWF: refactor ReturnType

2 participants