Skip to content

Conversation

@adriangb
Copy link

@adriangb adriangb commented Jan 6, 2026

Summary

This PR prepares for DataFusion 52.0 by migrating from the deprecated SchemaAdapter approach to the new PhysicalExprAdapter approach.

Key changes:

  • Add SparkPhysicalExprAdapterFactory and SparkPhysicalExprAdapter that work at planning time (expression rewriting) instead of runtime (batch transformation)
  • Replace CastColumnExpr with Spark-compatible Cast expressions
  • Update parquet_exec.rs to use with_expr_adapter() instead of with_schema_adapter_factory()
  • Update Iceberg scan to use adapt_batch_with_expressions()
  • Mark old SparkSchemaAdapterFactory as deprecated

The new approach:

  1. PhysicalExprAdapterFactory.create() returns PhysicalExprAdapter
  2. PhysicalExprAdapter.rewrite() transforms expressions at planning time
  3. Casts are injected as expressions that execute when the plan runs

See DataFusion upgrading guide for more context on this migration.

Test plan

  • Verify build compiles successfully
  • Run existing schema adapter tests
  • Test Parquet roundtrip with type casting
  • Test Iceberg scan with schema evolution

🤖 Generated with Claude Code

This PR prepares for DataFusion 52.0 by migrating from the deprecated
SchemaAdapter approach to the new PhysicalExprAdapter approach.

Changes:
- Add SparkPhysicalExprAdapterFactory and SparkPhysicalExprAdapter
  that work at planning time (expression rewriting) instead of runtime
  (batch transformation)
- Replace CastColumnExpr with Spark-compatible Cast expressions
- Update parquet_exec.rs to use with_expr_adapter() instead of
  with_schema_adapter_factory()
- Update Iceberg scan to use adapt_batch_with_expressions()
- Mark old SparkSchemaAdapterFactory as deprecated

The new approach:
1. PhysicalExprAdapterFactory.create() returns PhysicalExprAdapter
2. PhysicalExprAdapter.rewrite() transforms expressions at planning time
3. Casts are injected as expressions that execute when the plan runs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comment on lines +217 to +223
.map(|(i, _field)| {
let col_expr: Arc<dyn PhysicalExpr> = Arc::new(Column::new_with_schema(
target_schema.field(i).name(),
target_schema.as_ref(),
)?);
adapter.rewrite(col_expr)
})
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(|(i, _field)| {
let col_expr: Arc<dyn PhysicalExpr> = Arc::new(Column::new_with_schema(
target_schema.field(i).name(),
target_schema.as_ref(),
)?);
adapter.rewrite(col_expr)
})
.map(|(i, field)| {
let col_expr: Arc<dyn PhysicalExpr> = Arc::new(Column::new(
field.name(),
i,
));
adapter.rewrite(col_expr)
})

}

// ============================================================================
// Legacy SchemaAdapter Implementation (Deprecated)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just delete it?

@adriangb
Copy link
Author

adriangb commented Jan 6, 2026

(I expect a lot of CI failures, I did not run this locally as I don't work with Java / Spark and did not want to set up the stack)

@adriangb adriangb changed the title Migrate SchemaAdapter to PhysicalExprAdapter chore: migrate SchemaAdapter to PhysicalExprAdapter for DataFusion 52 compatibility Jan 6, 2026
@adriangb
Copy link
Author

adriangb commented Jan 6, 2026

@comphead I've looked through the diff and the approach seems solid to me. I tried to get running locally but am running into what to me are exotic Java errors so unless it would be very helpful to you how about I hand this off? I'll note that although the dep is still DF 51 this was written against the DF 52 APIs.

@comphead
Copy link
Contributor

comphead commented Jan 6, 2026

@comphead I've looked through the diff and the approach seems solid to me. I tried to get running locally but am running into what to me are exotic Java errors so unless it would be very helpful to you how about I hand this off? I'll note that although the dep is still DF 51 this was written against the DF 52 APIs.

Its okay, thanks @adriangb for giving pointers, we already tracking this issue in #2058. I'll check if it would be straightfwd to migrate SchemaAdapter to Comet just for now and this approach would be base to migrate later. Or sooner if SchemaAdapter not easily transferrable

@adriangb
Copy link
Author

adriangb commented Jan 6, 2026

I think you should have no issues migrating to PhysicalExprAdapter, it covers all of the use cases of SchemaAdapter (and more) with a simpler API to write against. Forward porting SchemaAdapter may be difficult considering it's not supported in ParquetOpener, etc. anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants