feat(spark): Add SessionStateBuilderSpark to datafusion-spark#19865
Merged
Jefffrey merged 10 commits intoapache:mainfrom Jan 27, 2026
Merged
feat(spark): Add SessionStateBuilderSpark to datafusion-spark#19865Jefffrey merged 10 commits intoapache:mainfrom
SessionStateBuilderSpark to datafusion-spark#19865Jefffrey merged 10 commits intoapache:mainfrom
Conversation
with_spark_features to SessionStateBuilder
Merged
with_spark_features to SessionStateBuilderwith_spark_features to SessionStateBuilder
Jefffrey
reviewed
Jan 17, 2026
Contributor
Jefffrey
left a comment
There was a problem hiding this comment.
Maybe its better to introduce a new trait (e.g. SessionStateBuilderSparkExt, though with a better name) to datafusion-spark containing the new with_spark_features method and impl this onto SessionStateBuilder to avoid needing having core depend on datafusion-spark
Contributor
Author
Souds good, updated the code to use that approach |
with_spark_features to SessionStateBuilderSessionStateBuilderSpark to datafusion-spark
SessionStateBuilderSpark to datafusion-sparkSessionStateBuilderSpark to datafusion-spark
Jefffrey
approved these changes
Jan 19, 2026
| //! ``` | ||
| //! | ||
| //! Then use the extension trait: | ||
| //! ```ignore |
Contributor
There was a problem hiding this comment.
Would prefer to avoid ignore here if possible
Contributor
|
Thanks @cht42 |
26 tasks
de-bgunter
pushed a commit
to de-bgunter/datafusion
that referenced
this pull request
Mar 24, 2026
…he#19865) ## Which issue does this PR close? - Closes apache#19843 ## Rationale for this change Currently, combining DataFusion's default features with Spark features is awkward because: 1. Expression planners must be registered **before** calling `with_default_features().build()` to take precedence 2. UDFs must be registered **after** the state is built (if using `register_all`) This requires splitting the setup into multiple phases, which is verbose and error-prone. ## What changes are included in this PR? - Added `SessionStateBuilderSpark` extension trait in `datafusion-spark` that provides `with_spark_features()` method to register both the Spark expression planner (with correct precedence) and all Spark UDFs in one call - Added `core` feature flag to `datafusion-spark` with `datafusion` as an optional dependency (this avoids having `datafusion-core` depend on `datafusion-spark`) - Updated `datafusion-spark` crate documentation with usage example - Simplified test context setup in `datafusion-sqllogictest` to use the new extension trait ## Are these changes tested? Yes, there is a unit test in `datafusion-spark/src/session_state.rs` plus the existing Spark SQLLogicTest suite validates that all Spark functions work correctly. The test context in datafusion-sqllogictest now uses the `SessionStateBuilderSpark` extension trait, serving as both a usage example and integration test. ## Are there any user-facing changes? Yes, this adds a new public API: `SessionStateBuilderSpark` extension trait (behind the `core` feature flag in `datafusion-spark`).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Currently, combining DataFusion's default features with Spark features is awkward because:
with_default_features().build()to take precedenceregister_all)This requires splitting the setup into multiple phases, which is verbose and error-prone.
What changes are included in this PR?
SessionStateBuilderSparkextension trait indatafusion-sparkthat provideswith_spark_features()method to register both the Spark expression planner (with correct precedence) and all Spark UDFs in one callcorefeature flag todatafusion-sparkwithdatafusionas an optional dependency (this avoids havingdatafusion-coredepend ondatafusion-spark)datafusion-sparkcrate documentation with usage exampledatafusion-sqllogictestto use the new extension traitAre these changes tested?
Yes, there is a unit test in
datafusion-spark/src/session_state.rsplus the existing Spark SQLLogicTest suite validates that all Spark functions work correctly. The test context in datafusion-sqllogictest now uses theSessionStateBuilderSparkextension trait, serving as both a usage example and integration test.Are there any user-facing changes?
Yes, this adds a new public API:
SessionStateBuilderSparkextension trait (behind thecorefeature flag indatafusion-spark).