Is your feature request related to a problem or challenge?
I'm trying to update Comet to latest DataFusion: apache/datafusion-comet#403. One issue I found is about AggregateUDF.
For some aggregate expressions, their implementation moves to AggregateUDF, e.g., FirstValue and LastValue.
To create a AggregateFunctionExpr for them, we need to call create_aggregate_expr by providing some arguments like input_phy_exprs. input_phy_exprs is used to determine the UDF's return return type (i.e., AggregateUDF.return_type).
To get the input physical expressions for each aggregate expression, we need to get slice of all input expressions which is the state fields of the aggregate expression. But the current design of AggregateUDF doesn't provide its state fields, but it relies on AggregateFunctionExpr to call AggregateUDF.state_fields with StateFieldsArgs.
So it is a circular relationship for someone to create a AggregateFunctionExpr for a AggregateUDF:
- Create a
AggregateUDF
- Call
create_aggregate_expr to create AggregateFunctionExpr, by providing input_phy_exprs
- In order to get
input_phy_exprs, we need to know how to slice input expressions, i.e., how many state fields for the UDF. Call AggregateUDF.state_fields. AggregateUDF.state_fields requires StateFieldsArgs from AggregateFunctionExpr, so we need to create AggregateFunctionExpr first (step 2).
I think this is an issue in the current design. AggregateUDF should determine its state fields by itself instead relying on AggregateFunctionExpr which is a wrapper of it.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
I'm trying to update Comet to latest DataFusion: apache/datafusion-comet#403. One issue I found is about AggregateUDF.
For some aggregate expressions, their implementation moves to
AggregateUDF, e.g.,FirstValueandLastValue.To create a
AggregateFunctionExprfor them, we need to callcreate_aggregate_exprby providing some arguments likeinput_phy_exprs.input_phy_exprsis used to determine the UDF's return return type (i.e.,AggregateUDF.return_type).To get the input physical expressions for each aggregate expression, we need to get slice of all input expressions which is the state fields of the aggregate expression. But the current design of
AggregateUDFdoesn't provide its state fields, but it relies onAggregateFunctionExprto callAggregateUDF.state_fieldswithStateFieldsArgs.So it is a circular relationship for someone to create a
AggregateFunctionExprfor aAggregateUDF:AggregateUDFcreate_aggregate_exprto createAggregateFunctionExpr, by providinginput_phy_exprsinput_phy_exprs, we need to know how to slice input expressions, i.e., how many state fields for the UDF. CallAggregateUDF.state_fields.AggregateUDF.state_fieldsrequiresStateFieldsArgsfromAggregateFunctionExpr, so we need to createAggregateFunctionExprfirst (step 2).I think this is an issue in the current design.
AggregateUDFshould determine its state fields by itself instead relying onAggregateFunctionExprwhich is a wrapper of it.Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response