Is your feature request related to a problem or challenge?
The SparkFunctionPlanner was introduced recently, but there is currently no convenient way to register both the Spark UDFs and the Spark expression planner together.
Additionally, combining the default DataFusion features with Spark features is awkward because:
- Expression planners must be registered before calling
with_default_features().build() to take precedence (planners are tried in order, first match wins)
- UDFs must be registered after the state is built (if using the
register_all helper)
Here's the current code required in sqllogictests to properly register Spark features:
let runtime = Arc::new(RuntimeEnv::default());
let mut state_builder = SessionStateBuilder::new()
.with_config(config)
.with_runtime_env(runtime);
// Phase 1: Register planner BEFORE build (so it takes precedence)
if is_spark_path(relative_path) {
state_builder = state_builder.with_expr_planners(vec![Arc::new(
datafusion_spark::planner::SparkFunctionPlanner,
)]);
}
let mut state = state_builder.with_default_features().build();
// Phase 2: Register UDFs AFTER build
if is_spark_path(relative_path) {
info!("Registering Spark functions");
datafusion_spark::register_all(&mut state)
.expect("Can not register Spark functions");
}
Describe the solution you'd like
Provide a with_spark_features() method on SessionStateBuilder that registers both the Spark expression planner and UDFs in one call, ensuring proper precedence.
let state = SessionStateBuilder::new()
.with_config(config)
.with_runtime_env(runtime)
.with_default_features()
.with_spark_features() // Registers planner (with precedence) + UDFs
.build();
Describe alternatives you've considered
-
A standalone function datafusion_spark::register_all_features(&mut SessionState) that handles both planner and UDF registration post-build (though this may not solve the planner precedence issue cleanly)
-
Expose planner priority control - Allow specifying priority when registering planners, rather than relying on insertion order
Additional context
No response
Is your feature request related to a problem or challenge?
The SparkFunctionPlanner was introduced recently, but there is currently no convenient way to register both the Spark UDFs and the Spark expression planner together.
Additionally, combining the default DataFusion features with Spark features is awkward because:
with_default_features().build()to take precedence (planners are tried in order, first match wins)register_allhelper)Here's the current code required in sqllogictests to properly register Spark features:
Describe the solution you'd like
Provide a with_spark_features() method on SessionStateBuilder that registers both the Spark expression planner and UDFs in one call, ensuring proper precedence.
Describe alternatives you've considered
A standalone function datafusion_spark::register_all_features(&mut SessionState) that handles both planner and UDF registration post-build (though this may not solve the planner precedence issue cleanly)
Expose planner priority control - Allow specifying priority when registering planners, rather than relying on insertion order
Additional context
No response