== Physical Plan ==
*(3) Project [lead(key1, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3854, lead(value, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST ROWS B
ETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3855]
+- Window [lead(key1#3848, 1, null) windowspecdefinition(key1#3848, key2#3849, value#3850 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead(key1, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST RO
WS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3854, lead(value#3850, 1, null) windowspecdefinition(key1#3848, key2#3849, value#3850 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead(value, 1, NULL) OVER (PARTITION BY key1, k
ey2 ORDER BY value ASC NULLS FIRST ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3855], [key1#3848, key2#3849], [value#3850 ASC NULLS FIRST]
+- *(2) ColumnarToRow
+- CometSort [key1#3848, key2#3849, value#3850], [key1#3848 ASC NULLS FIRST, key2#3849 ASC NULLS FIRST, value#3850 ASC NULLS FIRST]
+- CometColumnarExchange hashpartitioning(key1#3848, key2#3849, 5), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=8988]
+- CometColumnarExchange hashpartitioning(key1#3848, 5), REPARTITION_BY_COL, CometColumnarShuffle, [plan_id=8987]
+- RowToColumnar
+- *(1) Project [_1#3841 AS key1#3848, _2#3842 AS key2#3849, _3#3843 AS value#3850]
+- *(1) LocalTableScan [_1#3841, _2#3842, _3#3843]
There is repeated shuffle operators existing in the query. Currently it fails by
[info] java.lang.UnsupportedOperationException: CometShuffleExchangeExec.doExecute should not be executed.
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.doExecute(CometShuffleExchangeExec.scala:169)
[info] at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195)
[info] at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
[info] at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[info] at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
[info] at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:191)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.inputRDD$lzycompute(CometShuffleExchangeExec.scala:98)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.inputRDD(CometShuffleExchangeExec.scala:91)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.shuffleDependency$lzycompute(CometShuffleExchangeExec.scala:150)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.shuffleDependency(CometShuffleExchangeExec.scala:133)
[info] at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.doExecuteColumnar(CometShuffleExchangeExec.scala:188)
I think it will be more reasonable to skip such case for Comet shuffle.
Describe the bug
I noticed a special case while debugging test failures in #250.
org.apache.spark.sql.DataFrameWindowFunctionsSuite:SPARK-38237: require all cluster keys for child required distribution for window query:There is repeated shuffle operators existing in the query. Currently it fails by
It is cause the upper
CometShuffleExchangeExecwill call the bottomCometShuffleExchangeExec.doExecutebecauseCometShuffleExchangeExectakes row inputs.To fix it, although we can add a
ColumnarToRowon top of the bottomCometShuffleExchangeExec. I don't think it is efficient as the snippet of shuffles has too many row-to-column/column-to-row conversions:I think it will be more reasonable to skip such case for Comet shuffle.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response