Rename input_type --> input_types on AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs#11666
Conversation
input_type --> input_types om AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs
This confused me too the first time when I looked at code in user-defined aggregates. I thought the reason it was written like that was because for all existing user-defined aggregates, knowing just the type of the first argument is enough. |
|
@lewiszlw It will be very helpful if you could please also update this example. I checked your branch and this file doesn't exist. So you will need to merge from main to get it. |
|
Thanks for pointing out. I'll update pr in a few days. |
|
I happened to have this PR opened locally (I get anxious with PRs that are open too long 😅 ) so I took the liberty of updating the docs as well in 1a3c5ca while I was merging up from main |
|
I think we don't even need vector of input type, since we only get the first one. 🤔 |
input_type --> input_types om AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgsinput_type --> input_types on AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs
Wouldn't we need a vector of input types if the aggregate had more than one argument? For example |
Second argument is fixed type (f64), so we don't have the actual function that expect different multiple input type yet. Therefore, I suggest we remove |
That certainly sounds cleaner. I feel like we have churned this API a bunch recently, so maybe we can take a step back and ensure that whatever we come up with supports the usecases we know of now (and in the future) so we don't keep changing it 🤔 |
I plan to change I think it would be nice to have something like |
|
The end state in my mind now. pub struct AccumulatorArgs<'a> {
/// Keep, this is return type, the name might be quite confusing.
pub data_type: &'a DataType,
/// We might only need one of `schema` or `dfschema`. It is likely we keep `dfschema`, since we can get `schema` from it.
pub schema: &'a Schema,
pub dfschema: &'a DFSchema,
/// Keep
pub ignore_nulls: bool,
/// Convert to physical sort exprs instead
pub sort_exprs: &'a [Expr],
/// Keep
pub is_reversed: bool,
/// We might able to get the name from expressions
pub name: &'a str,
/// Keep
pub is_distinct: bool,
/// Get the type from schema and expressions
pub input_type: &'a DataType,
/// Convert to physical expressions
pub input_exprs: &'a [Expr],
} |
I agree
I think Further evidence that we don't need dfschema is that you can get a
Agree
Make sense Thanks @jayzhan211 |
|
So what should we do with this PR? Merge it and then revamp in a follow on PR? |
|
Sure, but it is better to remove |
@xinlifoobar I think it is not clear whether we should keep
|
Which issue does this PR close?
Closes #.
Rationale for this change
It confused me when I read these code that AggregateFunctionExpr / AccumulatorArgs / StateFieldsArgs only contain one input type.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?