Skip to content

[datafusion-spark] Add method to register udf and expr planner in one go #19843

@cht42

Description

@cht42

Is your feature request related to a problem or challenge?

The SparkFunctionPlanner was introduced recently, but there is currently no convenient way to register both the Spark UDFs and the Spark expression planner together.

Additionally, combining the default DataFusion features with Spark features is awkward because:

  1. Expression planners must be registered before calling with_default_features().build() to take precedence (planners are tried in order, first match wins)
  2. UDFs must be registered after the state is built (if using the register_all helper)

Here's the current code required in sqllogictests to properly register Spark features:

let runtime = Arc::new(RuntimeEnv::default());

let mut state_builder = SessionStateBuilder::new()
    .with_config(config)
    .with_runtime_env(runtime);

// Phase 1: Register planner BEFORE build (so it takes precedence)
if is_spark_path(relative_path) {
    state_builder = state_builder.with_expr_planners(vec![Arc::new(
        datafusion_spark::planner::SparkFunctionPlanner,
    )]);
}

let mut state = state_builder.with_default_features().build();

// Phase 2: Register UDFs AFTER build
if is_spark_path(relative_path) {
    info!("Registering Spark functions");
    datafusion_spark::register_all(&mut state)
        .expect("Can not register Spark functions");
}

Describe the solution you'd like

Provide a with_spark_features() method on SessionStateBuilder that registers both the Spark expression planner and UDFs in one call, ensuring proper precedence.

let state = SessionStateBuilder::new()
    .with_config(config)
    .with_runtime_env(runtime)
    .with_default_features()
    .with_spark_features()  // Registers planner (with precedence) + UDFs
    .build();

Describe alternatives you've considered

  • A standalone function datafusion_spark::register_all_features(&mut SessionState) that handles both planner and UDF registration post-build (though this may not solve the planner precedence issue cleanly)

  • Expose planner priority control - Allow specifying priority when registering planners, rather than relying on insertion order

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions