feat!: upgrade DataFusion dependency to 52.1.0#6015
feat!: upgrade DataFusion dependency to 52.1.0#6015wjones127 merged 9 commits intolance-format:mainfrom
Conversation
wjones127
commented
Feb 25, 2026
- Bump datafusion requirement to 52
- ruff format
- fix: use fields_with_udf for aggregate type coercion (DF52)
- fix: use OutputBatches metric variant for DF52 compatibility
DataFusion 52 changed AVG's type signature from UserDefined to Coercible, so the old UserDefined-only guard skipped coercion and AVG(Int64) failed at execution time. Use fields_with_udf to resolve coerced types from the function signature, which handles all signature variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DF52 introduced a dedicated `MetricValue::OutputBatches` variant. Using the generic `Count` variant with name "output_batches" causes a panic in `aggregate_by_name()` due to mismatched enum variants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
PR Review: DataFusion 52 UpgradeOverall this is a straightforward dependency upgrade with necessary API adaptations. A few items to address: P1 Issues
Questions
Automated review by Claude Code |
22eabf9 to
54cda99
Compare
CoalesceBatchesExec was removed from the query plan in DataFusion 52, causing explain_plan and analyze_plan doctests to fail. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| "tensorflow; sys_platform == 'linux'", | ||
| "tqdm", | ||
| "datafusion>=50.1,<52", | ||
| "datafusion>=52,<53; python_version >= '3.10'", |
There was a problem hiding this comment.
does that mean we will officially drop 3.9? If so we should remove "Programming Language :: Python :: 3.9" above
There was a problem hiding this comment.
No, datafusion dropped support for 3.9, so we can only test the integration with Python 3.10 and above. We support 3.9 for now, though I think we might drop support in a future PR.
- **Bump datafusion requirement to 52** - **ruff format** - **fix: use fields_with_udf for aggregate type coercion (DF52)** - **fix: use OutputBatches metric variant for DF52 compatibility** --------- Co-authored-by: Tim Saucer <timsaucer@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- **Bump datafusion requirement to 52** - **ruff format** - **fix: use fields_with_udf for aggregate type coercion (DF52)** - **fix: use OutputBatches metric variant for DF52 compatibility** --------- Co-authored-by: Tim Saucer <timsaucer@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- **Bump datafusion requirement to 52** - **ruff format** - **fix: use fields_with_udf for aggregate type coercion (DF52)** - **fix: use OutputBatches metric variant for DF52 compatibility** --------- Co-authored-by: Tim Saucer <timsaucer@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>