Skip to content

feat!: upgrade DataFusion dependency to 52.1.0#6015

Merged
wjones127 merged 9 commits intolance-format:mainfrom
wjones127:fix/df52-avg-coercion
Feb 25, 2026
Merged

feat!: upgrade DataFusion dependency to 52.1.0#6015
wjones127 merged 9 commits intolance-format:mainfrom
wjones127:fix/df52-avg-coercion

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

  • Bump datafusion requirement to 52
  • ruff format
  • fix: use fields_with_udf for aggregate type coercion (DF52)
  • fix: use OutputBatches metric variant for DF52 compatibility

timsaucer and others added 4 commits February 25, 2026 10:20
DataFusion 52 changed AVG's type signature from UserDefined to
Coercible, so the old UserDefined-only guard skipped coercion and
AVG(Int64) failed at execution time. Use fields_with_udf to resolve
coerced types from the function signature, which handles all signature
variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DF52 introduced a dedicated `MetricValue::OutputBatches` variant.
Using the generic `Count` variant with name "output_batches" causes
a panic in `aggregate_by_name()` due to mismatched enum variants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@wjones127 wjones127 changed the title feat\!: upgrade DataFusion dependency to 52.1.0 feat!: upgrade DataFusion dependency to 52.1.0 Feb 25, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Feb 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: DataFusion 52 Upgrade

Overall this is a straightforward dependency upgrade with necessary API adaptations. A few items to address:

P1 Issues

  1. Missing trailing newline in python/src/lib.rs - The file ends without a newline character after the ffi_logical_codec_from_pycapsule function.

  2. Minimum Python version raised from 3.9 to 3.10 - The uv.lock shows requires-python changed from >=3.9 to >=3.10. If intentional, this should be documented in the PR description as a breaking change for users still on Python 3.9.

Questions

  1. Aggregate coercion behavior change - The removal of the early-return check for TypeSignature::UserDefined in coerce_aggregate_expr_impl means coercion now applies to all aggregate functions, not just UDFs. Is this the intended behavior for DF52, or was this guard still needed?

  2. Physical optimizer after aggregate - New code adds optimizer rules execution after apply_aggregate. Could you confirm this is required for DF52 correctness and doesn't introduce performance regression for aggregate queries?


Automated review by Claude Code

@wjones127 wjones127 force-pushed the fix/df52-avg-coercion branch from 22eabf9 to 54cda99 Compare February 25, 2026 20:09
@timsaucer timsaucer self-requested a review February 25, 2026 20:40
wjones127 and others added 2 commits February 25, 2026 12:42
CoalesceBatchesExec was removed from the query plan in DataFusion 52,
causing explain_plan and analyze_plan doctests to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/scanner.rs 77.77% 0 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 marked this pull request as ready for review February 25, 2026 21:54
Comment thread python/pyproject.toml
"tensorflow; sys_platform == 'linux'",
"tqdm",
"datafusion>=50.1,<52",
"datafusion>=52,<53; python_version >= '3.10'",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that mean we will officially drop 3.9? If so we should remove "Programming Language :: Python :: 3.9" above

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, datafusion dropped support for 3.9, so we can only test the integration with Python 3.10 and above. We support 3.9 for now, though I think we might drop support in a future PR.

@wjones127 wjones127 merged commit 8f1a099 into lance-format:main Feb 25, 2026
33 checks passed
@wjones127 wjones127 deleted the fix/df52-avg-coercion branch February 25, 2026 22:36
wjones127 added a commit to wjones127/lance that referenced this pull request Feb 25, 2026
- **Bump datafusion requirement to 52**
- **ruff format**
- **fix: use fields_with_udf for aggregate type coercion (DF52)**
- **fix: use OutputBatches metric variant for DF52 compatibility**

---------

Co-authored-by: Tim Saucer <timsaucer@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
wjones127 added a commit to wjones127/lance that referenced this pull request Feb 25, 2026
- **Bump datafusion requirement to 52**
- **ruff format**
- **fix: use fields_with_udf for aggregate type coercion (DF52)**
- **fix: use OutputBatches metric variant for DF52 compatibility**

---------

Co-authored-by: Tim Saucer <timsaucer@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
wjones127 added a commit that referenced this pull request Feb 26, 2026
- **Bump datafusion requirement to 52**
- **ruff format**
- **fix: use fields_with_udf for aggregate type coercion (DF52)**
- **fix: use OutputBatches metric variant for DF52 compatibility**

---------

Co-authored-by: Tim Saucer <timsaucer@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants