Skip to content

Conversation

@jayzhan211
Copy link
Contributor

Which issue does this PR close?

Part of #8708

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate labels Apr 5, 2024
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor Author

I think the next step is to make last value udaf, so we can replace builtin first/last.

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 marked this pull request as ready for review April 5, 2024 14:49
@jayzhan211 jayzhan211 requested review from alamb and mustafasrepo April 5, 2024 15:00
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this is looking sweet and close

Can we add a re-export in of this crate in datafusion (like https://github.com/apache/arrow-datafusion/blob/c8584557cdfa7c138ab9039ceac31323f48a44d3/datafusion/core/src/lib.rs#L530-L533)?

Otherwise I think this is basically ready to go. 🚀

Cargo.toml Outdated
ctor = "0.2.0"
dashmap = "5.4.0"
datafusion = { path = "datafusion/core", version = "37.0.0", default-features = false }
datafusion-aggregate-functions = { path = "datafusion/aggregate-functions", version = "37.0.0" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about calling the new crate datafusion-functions-aggregates (to mirror the naming of datafusion-functions and datafusion-functions-array)?

panic!("Failed to register UDAF: {}", err);
}
}
datafusion_aggregate_functions::register_all(&mut new_self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

lit(1),
),
array_replace_all(make_array(vec![lit(1), lit(2), lit(3)]), lit(2), lit(4)),
// TODO: Add first value after built-in functions are deprecated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we would wait until the built in functions are deprecated 🤔

Copy link
Contributor Author

@jayzhan211 jayzhan211 Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Sorry for leaving this confusing comment. I should have left it in the review comment, not in the code 😆

I plan to support the expr API with arguments like distinct, and ordering after I deprecated all the built-in functions.

However, I added a simple example with lit(1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upd: It seems I need to support expr with arguments before deprecated builtin

acc_args: AccumulatorArgs,
) -> Result<Box<dyn Accumulator>> {
let mut all_sort_orders = vec![];
/// TO BE DEPRECATED: Builtin FIRST_VALUE physical aggregate expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment -- what does it mean to be deprecated? What is the alternative?

Is the idea it will be replaced with a UDF version ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be replaced with udaf

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211 jayzhan211 requested a review from alamb April 6, 2024 02:24
@jayzhan211 jayzhan211 marked this pull request as draft April 6, 2024 05:42
@jayzhan211 jayzhan211 marked this pull request as ready for review April 6, 2024 05:43
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Apr 6, 2024

Besides addressing comments, I also move AggregateFunctionExpr and create_aggregate_expr to functions-aggregate from physical-expr-common because I think they are specific to aggregate.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- I think this looks really cool and could be merged.

I think it would be better to leave AggregateFunctionExpr and create_aggregate_expr in the physical-expr-core crate for reasons I mentioned but if you disagree or want to put them back as a follow on PR I think that would be ok too.

let mut all_sort_orders = vec![];

// Construct PhysicalSortExpr objects from Expr objects:
let mut sort_exprs = vec![];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

/// Any implementation of this trait also needs to implement the
/// `PartialEq<dyn Any>` to allows comparing equality between the
/// trait objects.
pub trait AggregateExpr: Send + Sync + Debug + PartialEq<dyn Any> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also move AggregateFunctionExpr and create_aggregate_expr to functions-aggregate from physical-expr-common

While I agree that these two structures are aggregate specific, I still think they should be in datafusion-physical-expr-common because that would allow people to use DataFusion and create their own aggregate functions without having to take DataFusion's built in aggregate functions

I see real value in keeping the APIs required to use datafusion separate from the actual implementations

Copy link
Contributor Author

@jayzhan211 jayzhan211 Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should avoid the user pulling out unnecessary things for them. I thought they would need builtin functions, but they probably don't. I will keep them in common

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor Author

Thanks for your review @alamb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants