Skip to content

[CORE] Correctly handle driver configurations when spark.sql.extensions is explicitly set for GlutenSessionExtensions#9312

Merged
zhztheplayer merged 2 commits intoapache:mainfrom
zhztheplayer:wip-fix-ext-conf
Apr 14, 2025
Merged

[CORE] Correctly handle driver configurations when spark.sql.extensions is explicitly set for GlutenSessionExtensions#9312
zhztheplayer merged 2 commits intoapache:mainfrom
zhztheplayer:wip-fix-ext-conf

Conversation

@zhztheplayer
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer commented Apr 14, 2025

spark.sql.extensions=org.apache.gluten.extension.GlutenSessionExtensions should be set automatically when Spark driver started with Gluten. However if user explicitly sets it, unexpected errors will be caused. For example:

java.lang.UnsupportedOperationException: This operator doesn't support doTransform with SubstraitContext.
	at org.apache.gluten.execution.TransformSupport.doTransform(WholeStageTransformer.scala:192)
	at org.apache.gluten.execution.TransformSupport.doTransform$(WholeStageTransformer.scala:190)
	at org.apache.gluten.execution.WholeStageTransformer.doTransform(WholeStageTransformer.scala:222)
	at org.apache.gluten.execution.TransformSupport.$anonfun$transform$1(WholeStageTransformer.scala:185)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)

In this case, because spark.sql.extensions=org.apache.gluten.extension.GlutenSessionExtensions is actually set twice (from both GlutenPlugin and user), the rule ColumnarCollapseTransformStages will be unexpectedly executed twice to produces two consecutive WholeStageTransformers in the query plan which is illegal. Hence, the error below is triggered.

image

By the way, this case also implies that, our columnar rule execution lacks idempotence, which is not optimal in regard to the Catalyst's design.

The patch enhances relevant extension-setting logic to make sure Gluten's extension is only set once so this issue can be avoided correctly.

@github-actions github-actions bot added the CORE works for Gluten Core label Apr 14, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@apache apache deleted a comment from github-actions bot Apr 14, 2025
@zhztheplayer zhztheplayer changed the title [CORE] Correct handle driver configurations when spark.sql.extensions is explicitly set for GlutenSessionExtensions [CORE] Correctly handle driver configurations when spark.sql.extensions is explicitly set for GlutenSessionExtensions Apr 14, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@yaooqinn
Copy link
Copy Markdown
Member

You can use SPI to load this extension instead of grabbing it with end-users

FYI, apache/spark@5181543

@zhztheplayer
Copy link
Copy Markdown
Member Author

@yaooqinn Thank you for the inputs.

Though the static service file doesn't feel straightforward to work with spark.plugins? Which is the recommended way for enabling Gluten for Spark.

@yaooqinn
Copy link
Copy Markdown
Member

If we need to load to GlutenSessionExtensions by default and we've provided a dynamic config to disable it, using SPI might be the best way. Besides, AFAIK, some cloud vendor platforms that provide Spark services are not that easy to load jars/extentions with configs.

BTW, spark.plugins hasn't provided SPI registration yet

@zhztheplayer
Copy link
Copy Markdown
Member Author

If we need to load to GlutenSessionExtensions by default and we've provided a dynamic config to disable it

We load GlutenSessionExtensions only when spark.plugins=org.apache.gluten.GlutenPlugin. We have a dynamic config spark.gluten.enabled to disable Gluten but the option can take effect even after the extensions are injected. So it's implemented in the different way.

The PR is just to address a corner case that user accidentally specifies spark.sql.extensions=org.apache.gluten.extension.GlutenSessionExtensions manually which is not recommended. We'd make Gluten work normally in that case.

@zhztheplayer zhztheplayer merged commit d78825a into apache:main Apr 14, 2025
47 checks passed
@GlutenPerfBot
Copy link
Copy Markdown
Contributor

===== Performance report for TPCDS SF2000 with Velox backend, for reference only ====

query log/native_master_04_14_2025_time.csv log/native_master_04_13_2025_cb35706408_time.csv difference percentage
q1 11.38 10.92 -0.455 96.00%
q2 11.11 11.23 0.117 101.05%
q3 3.11 2.75 -0.366 88.24%
q4 52.09 50.99 -1.105 97.88%
q5 10.10 8.32 -1.785 82.33%
q6 4.86 4.53 -0.328 93.26%
q7 5.52 4.33 -1.184 78.55%
q8 4.95 6.27 1.327 126.83%
q9 16.09 14.24 -1.848 88.51%
q10 12.40 11.93 -0.474 96.18%
q11 28.14 28.70 0.559 101.98%
q12 2.06 2.77 0.712 134.64%
q13 6.59 5.25 -1.341 79.66%
q14a 43.28 44.32 1.047 102.42%
q14b 39.83 38.67 -1.162 97.08%
q15 2.22 2.70 0.476 121.44%
q16 5.06 5.66 0.594 111.74%
q17 7.83 6.22 -1.612 79.41%
q18 8.06 7.58 -0.476 94.10%
q19 4.57 3.39 -1.173 74.30%
q20 2.09 1.69 -0.400 80.86%
q21 0.77 1.12 0.346 144.87%
q22 3.47 2.88 -0.593 82.92%
q23a 60.51 60.93 0.413 100.68%
q23b 71.34 72.71 1.370 101.92%
q24a 72.17 69.58 -2.591 96.41%
q24b 68.27 68.38 0.110 100.16%
q25 5.07 5.79 0.722 114.25%
q26 3.40 2.26 -1.144 66.35%
q27 2.94 2.76 -0.179 93.91%
q28 17.03 17.17 0.142 100.83%
q29 8.07 7.70 -0.376 95.34%
q30 5.59 5.78 0.194 103.47%
q31 7.68 8.43 0.749 109.75%
q32 1.65 1.67 0.017 101.00%
q33 3.56 3.26 -0.301 91.55%
q34 4.53 3.91 -0.626 86.19%
q35 7.89 7.78 -0.104 98.68%
q36 2.31 3.27 0.953 141.18%
q37 3.26 2.98 -0.284 91.29%
q38 12.02 11.14 -0.877 92.70%
q39a 4.87 4.34 -0.532 89.07%
q39b 3.43 3.82 0.396 111.55%
q40 3.67 3.56 -0.110 97.01%
q41 0.62 0.61 -0.013 97.93%
q42 1.44 1.57 0.133 109.23%
q43 2.02 2.08 0.060 102.95%
q44 6.14 5.86 -0.284 95.38%
q45 4.47 3.28 -1.184 73.51%
q46 3.97 4.28 0.313 107.87%
q47 9.75 9.87 0.118 101.21%
q48 3.80 3.30 -0.500 86.83%
q49 5.59 5.77 0.182 103.26%
q50 17.48 17.03 -0.447 97.44%
q51 7.11 7.30 0.193 102.72%
q52 1.28 1.31 0.029 102.27%
q53 1.69 1.88 0.190 111.27%
q54 5.54 6.46 0.920 116.61%
q55 0.58 1.63 1.045 278.97%
q56 4.02 4.10 0.081 102.01%
q57 6.66 6.58 -0.083 98.76%
q58 3.06 2.85 -0.205 93.28%
q59 4.44 3.93 -0.513 88.44%
q60 5.51 4.34 -1.168 78.78%
q61 4.69 4.04 -0.653 86.09%
q62 2.74 2.65 -0.090 96.72%
q63 1.50 1.35 -0.159 89.44%
q64 35.57 35.01 -0.563 98.42%
q65 11.09 11.68 0.596 105.38%
q66 3.39 2.64 -0.753 77.80%
q67 56.76 57.83 1.069 101.88%
q68 3.35 2.94 -0.411 87.71%
q69 4.89 5.34 0.450 109.20%
q70 6.03 5.54 -0.490 91.86%
q71 4.81 4.64 -0.173 96.40%
q72 20.12 20.89 0.770 103.83%
q73 2.59 2.49 -0.105 95.94%
q74 17.49 17.53 0.042 100.24%
q75 22.65 22.64 -0.012 99.95%
q76 6.68 6.90 0.223 103.34%
q77 2.56 2.38 -0.179 93.00%
q78 32.67 33.48 0.812 102.48%
q79 3.61 3.40 -0.213 94.10%
q80 10.00 10.11 0.107 101.07%
q81 6.42 7.07 0.648 110.08%
q82 5.83 5.23 -0.596 89.77%
q83 1.34 1.48 0.140 110.50%
q84 2.99 2.50 -0.485 83.75%
q85 6.72 6.18 -0.542 91.94%
q86 1.94 2.17 0.228 111.76%
q87 11.72 11.55 -0.172 98.54%
q88 15.37 15.04 -0.330 97.85%
q89 2.84 2.19 -0.647 77.19%
q90 2.42 2.05 -0.367 84.82%
q91 3.33 3.50 0.169 105.07%
q92 1.87 1.70 -0.166 91.10%
q93 23.05 23.27 0.220 100.95%
q94 8.65 8.57 -0.079 99.09%
q9 55.73 55.63 -0.091 99.84%
q5 2.10 2.16 0.058 102.76%
q96 10.93 10.50 -0.434 96.03%
q97 2.13 2.91 0.775 136.36%
q98 5.44 5.09 -0.353 93.51%
q99 0.36 0.29 -0.064 81.96%
total 1174.41 1160.27 -14.140 98.80%

@GlutenPerfBot
Copy link
Copy Markdown
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_04_14_2025_time.csv log/native_master_04_13_2025_cb35706408_time.csv difference percentage
q1 25.48 25.59 0.106 100.42%
q2 26.55 26.94 0.396 101.49%
q3 32.77 32.60 -0.166 99.49%
q4 27.43 27.97 0.538 101.96%
q5 59.57 60.55 0.978 101.64%
q6 7.01 7.96 0.952 113.58%
q7 40.71 40.34 -0.372 99.09%
q8 61.75 64.26 2.505 104.06%
q9 95.49 99.16 3.671 103.84%
q10 43.82 42.56 -1.260 97.13%
q11 15.98 16.49 0.505 103.16%
q12 15.81 16.54 0.731 104.63%
q13 24.33 24.75 0.421 101.73%
q14 11.82 11.54 -0.280 97.63%
q15 25.45 25.78 0.329 101.29%
q16 12.62 13.41 0.792 106.27%
q17 74.47 73.07 -1.397 98.12%
q18 110.46 114.34 3.875 103.51%
q19 17.20 20.67 3.476 120.21%
q20 24.81 23.17 -1.642 93.38%
q21 170.71 174.55 3.843 102.25%
q22 12.46 10.20 -2.253 81.91%
total 936.70 952.45 15.748 101.68%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants