[CORE] Correctly handle driver configurations when spark.sql.extensions is explicitly set for GlutenSessionExtensions#9312
Conversation
|
Run Gluten Clickhouse CI on x86 |
spark.sql.extensions is explicitly set for GlutenSessionExtensionsspark.sql.extensions is explicitly set for GlutenSessionExtensions
|
Run Gluten Clickhouse CI on x86 |
|
You can use SPI to load this extension instead of grabbing it with end-users FYI, apache/spark@5181543 |
|
@yaooqinn Thank you for the inputs. Though the static service file doesn't feel straightforward to work with |
|
If we need to load to BTW, |
We load The PR is just to address a corner case that user accidentally specifies |
|
===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====
|
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
spark.sql.extensions=org.apache.gluten.extension.GlutenSessionExtensionsshould be set automatically when Spark driver started with Gluten. However if user explicitly sets it, unexpected errors will be caused. For example:In this case, because
spark.sql.extensions=org.apache.gluten.extension.GlutenSessionExtensionsis actually set twice (from both GlutenPlugin and user), the ruleColumnarCollapseTransformStageswill be unexpectedly executed twice to produces two consecutiveWholeStageTransformers in the query plan which is illegal. Hence, the error below is triggered.By the way, this case also implies that, our columnar rule execution lacks idempotence, which is not optimal in regard to the Catalyst's design.
The patch enhances relevant extension-setting logic to make sure Gluten's extension is only set once so this issue can be avoided correctly.