Skip to content

chore: dependency update Arrow to 58 and DataFusion to 53#6496

Draft
timsaucer wants to merge 9 commits intolance-format:mainfrom
rerun-io:tsaucer/arrow57-datafusion53
Draft

chore: dependency update Arrow to 58 and DataFusion to 53#6496
timsaucer wants to merge 9 commits intolance-format:mainfrom
rerun-io:tsaucer/arrow57-datafusion53

Conversation

@timsaucer
Copy link
Copy Markdown
Contributor

This PR is a dependency update only for Arrow and DataFusion. It includes the minimal changes necessary for the upgrade and is not designed to have any additional impact.

Closes #6137

@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

Copy link
Copy Markdown
Contributor Author

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, there's a few things we need to take care of but otherwise looks in decent shape.

None,
PartitionMode::CollectLeft,
NullEquality::NullEqualsNull,
false,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to investigate if this is a null-aware join.

use datafusion::functions_aggregate;
use datafusion::logical_expr::{Expr, ScalarUDF, col, lit};
use datafusion::physical_expr::PhysicalSortExpr;
#[allow(deprecated)]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing CoalesceBatchesExec should probably be a new PR.

None,
PartitionMode::CollectLeft,
NullEquality::NullEqualsNull,
false,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also check here if this is a null aware join

}

/// Take row indices produced by input plan from the dataset (with projection)
#[allow(deprecated)]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[allow(deprecated)]
#[expect(deprecated)]

Comment on lines -1330 to -1337
fn statistics(&self) -> DataFusionResult<Statistics> {
Ok(Statistics {
num_rows: Precision::Inexact(
self.query.k * self.query.refine_factor.unwrap_or(1) as usize,
),
..Statistics::new_unknown(self.schema().as_ref())
})
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to implement partition statistics here, not just drop it.


use super::TakeExec;
use arrow_schema::Schema as ArrowSchema;
#[allow(deprecated)]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[allow(deprecated)]
#[expect(deprecated)]

}

impl PhysicalOptimizerRule for CoalesceTake {
#[allow(deprecated)]
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[allow(deprecated)]
#[expect(deprecated)]

Comment thread Cargo.lock
"futures",
"itertools 0.14.0",
"log",
"object_store",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot get past having 2 versions of object_store until opendal 0.56 is released

@timsaucer timsaucer changed the title Update Arrow to 58 and DataFusion to 53 chore: dependency update Arrow to 58 and DataFusion to 53 Apr 14, 2026
@github-actions github-actions Bot added the chore label Apr 14, 2026
timsaucer and others added 8 commits April 15, 2026 07:21
Upgrade arrow 57 -> 58, datafusion 52.1 -> 53.0, parquet 57 -> 58,
geoarrow 0.7 -> 0.8, geodatafusion 0.3 -> 0.4, and datafusion-python
>=52,<53 -> >=53,<54. Adapt to DataFusion 53 breaking changes:

- PlanProperties now wrapped in Arc (ExecutionPlan::properties returns &Arc<PlanProperties>)
- ExecutionPlan::statistics() removed from trait
- HashJoinExec::try_new gains null_aware: bool parameter
- SimplifyContext::new replaced with SimplifyContext::default() builder
- PGBitwiseNot renamed to BitwiseNot (sqlparser update)
- ScalarValue::RunEndEncoded new variant handled
- MemoryReservation methods no longer require &mut self

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…3 upgrade

The DataFusion 53 upgrade guide specifies that statistics() should be
replaced with partition_statistics(), not just removed. Three
ExecutionPlan implementations had meaningful row-count statistics that
were lost in the initial upgrade:

- LanceScanExec: computes row counts from fragment metadata
- ANNIvfPartitionExec: reports minimum_nprobes as exact row count
- ScalarIndexExec: reports Precision::Exact(2) for index expression results

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update Python bindings for pyo3 0.26→0.28 API changes:
- FromPyObject now takes two lifetime params and uses `extract` instead of `extract_bound`
- Replace deprecated `downcast()` with `cast()` across all files
- Add `from_py_object` / `skip_from_py_object` to all Clone pyclass types
- Replace deprecated `PyCapsule::reference` with `pointer_checked`

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DataFusion 53 changed the EnsureCooperative optimizer to skip wrapping
leaf nodes when a cooperative ancestor (CoalescePartitionsExec) already
exists, so CooperativeExec no longer appears around LanceRead in these
plans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@timsaucer timsaucer force-pushed the tsaucer/arrow57-datafusion53 branch from 4c70c57 to 5140c22 Compare April 15, 2026 11:23
@timsaucer
Copy link
Copy Markdown
Contributor Author

It looks like opendal should be releasing soon so maybe we end up holding for a little: apache/opendal#7283

@timsaucer
Copy link
Copy Markdown
Contributor Author

Ok, we're in the green! I've identified a couple of things to follow up on / check but otherwise looking good IMO. @wjones127 / @westonpace do you have thoughts about merging or waiting for the opendal update before we move forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update to Arrow 58

1 participant