chore: dependency update Arrow to 58 and DataFusion to 53#6496
chore: dependency update Arrow to 58 and DataFusion to 53#6496timsaucer wants to merge 9 commits intolance-format:mainfrom
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
timsaucer
left a comment
There was a problem hiding this comment.
Ok, there's a few things we need to take care of but otherwise looks in decent shape.
| None, | ||
| PartitionMode::CollectLeft, | ||
| NullEquality::NullEqualsNull, | ||
| false, |
There was a problem hiding this comment.
Need to investigate if this is a null-aware join.
| use datafusion::functions_aggregate; | ||
| use datafusion::logical_expr::{Expr, ScalarUDF, col, lit}; | ||
| use datafusion::physical_expr::PhysicalSortExpr; | ||
| #[allow(deprecated)] |
There was a problem hiding this comment.
Replacing CoalesceBatchesExec should probably be a new PR.
| None, | ||
| PartitionMode::CollectLeft, | ||
| NullEquality::NullEqualsNull, | ||
| false, |
There was a problem hiding this comment.
Also check here if this is a null aware join
| } | ||
|
|
||
| /// Take row indices produced by input plan from the dataset (with projection) | ||
| #[allow(deprecated)] |
There was a problem hiding this comment.
| #[allow(deprecated)] | |
| #[expect(deprecated)] |
| fn statistics(&self) -> DataFusionResult<Statistics> { | ||
| Ok(Statistics { | ||
| num_rows: Precision::Inexact( | ||
| self.query.k * self.query.refine_factor.unwrap_or(1) as usize, | ||
| ), | ||
| ..Statistics::new_unknown(self.schema().as_ref()) | ||
| }) | ||
| } |
There was a problem hiding this comment.
We probably need to implement partition statistics here, not just drop it.
|
|
||
| use super::TakeExec; | ||
| use arrow_schema::Schema as ArrowSchema; | ||
| #[allow(deprecated)] |
There was a problem hiding this comment.
| #[allow(deprecated)] | |
| #[expect(deprecated)] |
| } | ||
|
|
||
| impl PhysicalOptimizerRule for CoalesceTake { | ||
| #[allow(deprecated)] |
There was a problem hiding this comment.
| #[allow(deprecated)] | |
| #[expect(deprecated)] |
| "futures", | ||
| "itertools 0.14.0", | ||
| "log", | ||
| "object_store", |
There was a problem hiding this comment.
We cannot get past having 2 versions of object_store until opendal 0.56 is released
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Upgrade arrow 57 -> 58, datafusion 52.1 -> 53.0, parquet 57 -> 58, geoarrow 0.7 -> 0.8, geodatafusion 0.3 -> 0.4, and datafusion-python >=52,<53 -> >=53,<54. Adapt to DataFusion 53 breaking changes: - PlanProperties now wrapped in Arc (ExecutionPlan::properties returns &Arc<PlanProperties>) - ExecutionPlan::statistics() removed from trait - HashJoinExec::try_new gains null_aware: bool parameter - SimplifyContext::new replaced with SimplifyContext::default() builder - PGBitwiseNot renamed to BitwiseNot (sqlparser update) - ScalarValue::RunEndEncoded new variant handled - MemoryReservation methods no longer require &mut self Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…3 upgrade The DataFusion 53 upgrade guide specifies that statistics() should be replaced with partition_statistics(), not just removed. Three ExecutionPlan implementations had meaningful row-count statistics that were lost in the initial upgrade: - LanceScanExec: computes row counts from fragment metadata - ANNIvfPartitionExec: reports minimum_nprobes as exact row count - ScalarIndexExec: reports Precision::Exact(2) for index expression results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update Python bindings for pyo3 0.26→0.28 API changes: - FromPyObject now takes two lifetime params and uses `extract` instead of `extract_bound` - Replace deprecated `downcast()` with `cast()` across all files - Add `from_py_object` / `skip_from_py_object` to all Clone pyclass types - Replace deprecated `PyCapsule::reference` with `pointer_checked` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DataFusion 53 changed the EnsureCooperative optimizer to skip wrapping leaf nodes when a cooperative ancestor (CoalescePartitionsExec) already exists, so CooperativeExec no longer appears around LanceRead in these plans. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4c70c57 to
5140c22
Compare
|
It looks like opendal should be releasing soon so maybe we end up holding for a little: apache/opendal#7283 |
|
Ok, we're in the green! I've identified a couple of things to follow up on / check but otherwise looking good IMO. @wjones127 / @westonpace do you have thoughts about merging or waiting for the opendal update before we move forward? |
This PR is a dependency update only for Arrow and DataFusion. It includes the minimal changes necessary for the upgrade and is not designed to have any additional impact.
Closes #6137