Repeated shuffle should not trigger Comet columnar shuffle

### Describe the bug

I noticed a special case while debugging test failures in #250.

`org.apache.spark.sql.DataFrameWindowFunctionsSuite`: `SPARK-38237: require all cluster keys for child required distribution for window query`:

```
== Physical Plan ==                                                                                                
*(3) Project [lead(key1, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3854, lead(value, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST ROWS B
ETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3855]                                                                          
+- Window [lead(key1#3848, 1, null) windowspecdefinition(key1#3848, key2#3849, value#3850 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead(key1, 1, NULL) OVER (PARTITION BY key1, key2 ORDER BY value ASC NULLS FIRST RO
WS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3854, lead(value#3850, 1, null) windowspecdefinition(key1#3848, key2#3849, value#3850 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS lead(value, 1, NULL) OVER (PARTITION BY key1, k
ey2 ORDER BY value ASC NULLS FIRST ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING)#3855], [key1#3848, key2#3849], [value#3850 ASC NULLS FIRST]                                                                                               
   +- *(2) ColumnarToRow                                                                                           
      +- CometSort [key1#3848, key2#3849, value#3850], [key1#3848 ASC NULLS FIRST, key2#3849 ASC NULLS FIRST, value#3850 ASC NULLS FIRST]                                                                                              
         +- CometColumnarExchange hashpartitioning(key1#3848, key2#3849, 5), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=8988]                                                                                                 
            +- CometColumnarExchange hashpartitioning(key1#3848, 5), REPARTITION_BY_COL, CometColumnarShuffle, [plan_id=8987]                                                                                                          
               +- RowToColumnar
                  +- *(1) Project [_1#3841 AS key1#3848, _2#3842 AS key2#3849, _3#3843 AS value#3850]
                     +- *(1) LocalTableScan [_1#3841, _2#3842, _3#3843]
```

There is repeated shuffle operators existing in the query. Currently it fails by

```
[info]   java.lang.UnsupportedOperationException: CometShuffleExchangeExec.doExecute should not be executed.
[info]   at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.doExecute(CometShuffleExchangeExec.scala:169)
[info]   at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:195)                 
[info]   at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
[info]   at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
[info]   at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)              
[info]   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:191)        
[info]   at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.inputRDD$lzycompute(CometShuffleExchangeExec.scala:98)
[info]   at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.inputRDD(CometShuffleExchangeExec.scala:91)
[info]   at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.shuffleDependency$lzycompute(CometShuffleExchangeExec.scala:150)
[info]   at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.shuffleDependency(CometShuffleExchangeExec.scala:133)
[info]   at org.apache.spark.sql.comet.execution.shuffle.CometShuffleExchangeExec.doExecuteColumnar(CometShuffleExchangeExec.scala:188)
```

It is cause the upper `CometShuffleExchangeExec` will call the bottom `CometShuffleExchangeExec.doExecute` because `CometShuffleExchangeExec` takes row inputs.

To fix it, although we can add a `ColumnarToRow` on top of the bottom `CometShuffleExchangeExec`. I don't think it is efficient as the snippet of shuffles has too many row-to-column/column-to-row conversions:

I think it will be more reasonable to skip such case for Comet shuffle.




### Steps to reproduce

_No response_

### Expected behavior

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated shuffle should not trigger Comet columnar shuffle #295

Describe the bug

Steps to reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Repeated shuffle should not trigger Comet columnar shuffle #295

Description

Describe the bug

Steps to reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions