-
Notifications
You must be signed in to change notification settings - Fork 268
chore: migrate SchemaAdapter to PhysicalExprAdapter for DataFusion 52 compatibility #3047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR prepares for DataFusion 52.0 by migrating from the deprecated SchemaAdapter approach to the new PhysicalExprAdapter approach. Changes: - Add SparkPhysicalExprAdapterFactory and SparkPhysicalExprAdapter that work at planning time (expression rewriting) instead of runtime (batch transformation) - Replace CastColumnExpr with Spark-compatible Cast expressions - Update parquet_exec.rs to use with_expr_adapter() instead of with_schema_adapter_factory() - Update Iceberg scan to use adapt_batch_with_expressions() - Mark old SparkSchemaAdapterFactory as deprecated The new approach: 1. PhysicalExprAdapterFactory.create() returns PhysicalExprAdapter 2. PhysicalExprAdapter.rewrite() transforms expressions at planning time 3. Casts are injected as expressions that execute when the plan runs 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| .map(|(i, _field)| { | ||
| let col_expr: Arc<dyn PhysicalExpr> = Arc::new(Column::new_with_schema( | ||
| target_schema.field(i).name(), | ||
| target_schema.as_ref(), | ||
| )?); | ||
| adapter.rewrite(col_expr) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .map(|(i, _field)| { | |
| let col_expr: Arc<dyn PhysicalExpr> = Arc::new(Column::new_with_schema( | |
| target_schema.field(i).name(), | |
| target_schema.as_ref(), | |
| )?); | |
| adapter.rewrite(col_expr) | |
| }) | |
| .map(|(i, field)| { | |
| let col_expr: Arc<dyn PhysicalExpr> = Arc::new(Column::new( | |
| field.name(), | |
| i, | |
| )); | |
| adapter.rewrite(col_expr) | |
| }) |
| } | ||
|
|
||
| // ============================================================================ | ||
| // Legacy SchemaAdapter Implementation (Deprecated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just delete it?
|
(I expect a lot of CI failures, I did not run this locally as I don't work with Java / Spark and did not want to set up the stack) |
|
@comphead I've looked through the diff and the approach seems solid to me. I tried to get running locally but am running into what to me are exotic Java errors so unless it would be very helpful to you how about I hand this off? I'll note that although the dep is still DF 51 this was written against the DF 52 APIs. |
Its okay, thanks @adriangb for giving pointers, we already tracking this issue in #2058. I'll check if it would be straightfwd to migrate SchemaAdapter to Comet just for now and this approach would be base to migrate later. Or sooner if SchemaAdapter not easily transferrable |
|
I think you should have no issues migrating to |
Summary
This PR prepares for DataFusion 52.0 by migrating from the deprecated
SchemaAdapterapproach to the newPhysicalExprAdapterapproach.Key changes:
SparkPhysicalExprAdapterFactoryandSparkPhysicalExprAdapterthat work at planning time (expression rewriting) instead of runtime (batch transformation)CastColumnExprwith Spark-compatibleCastexpressionsparquet_exec.rsto usewith_expr_adapter()instead ofwith_schema_adapter_factory()adapt_batch_with_expressions()SparkSchemaAdapterFactoryas deprecatedThe new approach:
PhysicalExprAdapterFactory.create()returnsPhysicalExprAdapterPhysicalExprAdapter.rewrite()transforms expressions at planning timeSee DataFusion upgrading guide for more context on this migration.
Test plan
🤖 Generated with Claude Code