chore: Upgrade to latest DataFusion revision#909
Conversation
| /// A utility function from DataFusion. It is not exposed by DataFusion. | ||
| pub fn down_cast_any_ref(any: &dyn Any) -> &dyn Any { | ||
| if any.is::<Arc<dyn PhysicalExpr>>() { | ||
| any.downcast_ref::<Arc<dyn PhysicalExpr>>() | ||
| .unwrap() | ||
| .as_any() | ||
| } else if any.is::<Box<dyn PhysicalExpr>>() { | ||
| any.downcast_ref::<Box<dyn PhysicalExpr>>() | ||
| .unwrap() | ||
| .as_any() | ||
| } else { | ||
| any | ||
| } | ||
| } |
There was a problem hiding this comment.
This function is now public in DataFusion, so we use that version now
|
@huaxingao There are quite a few changes to aggregates in this PR due to upstream API changes. Could you review when you get a chance? |
kazuyukitanimura
left a comment
There was a problem hiding this comment.
LGTM pending CI
| Ok(Arc::new(SumDecimal::new("sum", child, datatype))) | ||
| let func = AggregateUDF::new_from_impl(SumDecimal::new( | ||
| "sum", | ||
| Arc::clone(&child), |
There was a problem hiding this comment.
Just for me to understand, what would happen if we do not do Arc::clone() here?
There was a problem hiding this comment.
We need to clone because we reference child again in the next statement. If I remove the clone, the code fails to compile:
error[E0382]: use of moved value: `child`
--> core/src/execution/datafusion/planner.rs:1357:72
|
1347 | let child = self.create_expr(expr.child.as_ref().unwrap(), Arc::clone(&schema))?;
| ----- move occurs because `child` has type `Arc<dyn datafusion_physical_expr::PhysicalExpr>`, which does not implement the `Copy` trait
...
1354 | child,
| ----- value moved here
...
1357 | AggregateExprBuilder::new(Arc::new(func), vec![child])
| ^^^^^ value used here after move
There was a problem hiding this comment.
Hmm, I think the second parameter is Arc<dyn PhysicalExpr>. If it is not changed, it should be child?
There was a problem hiding this comment.
Oh, I see. It creates Arc<T> actually.
https://doc.rust-lang.org/std/sync/struct.Arc.html#impl-Clone-for-Arc%3CT,+A%3E
There was a problem hiding this comment.
Yes, we recently started using Arc::clone(foo) instead of foo.clone() to make it easy to see when we are just cloning an Arc (cheap) vs a more expensive clone operation. There is a clippy lint that checks that we are using this style.
|
Oops, some test failures |
|
failure: I rolled back implementing the group accumulators. |
* update dependency version * update avg * update avg_decimal * update sum_decimal * variance * stddev * covariance * correlation * save progress * code compiles * clippy * remove duplicate of down_cast_any_ref function * remove duplicate of down_cast_any_ref function * machete * bump DF version again and use StatsType from DataFusion * implement groups accumulator for stddev and variance * refactor * fmt * revert group accumulator (cherry picked from commit 00eaa8e)
Which issue does this PR close?
N/A
Rationale for this change
DataFusion 42 will be released soon so we need to make sure there are no changes that cause regressions in Comet before it is released.
What changes are included in this PR?
StatsTypeand use DataFusion's versiondown_cast_any_refand use DataFusion's versionstddevandvariance(or file follow-on issue)How are these changes tested?
Existing tests.