Skip to content

[GLUTEN-10544] Remove unnecessary method separateScanRDD#10545

Merged
zml1206 merged 2 commits intoapache:mainfrom
beliefer:10544
Sep 9, 2025
Merged

[GLUTEN-10544] Remove unnecessary method separateScanRDD#10545
zml1206 merged 2 commits intoapache:mainfrom
beliefer:10544

Conversation

@beliefer
Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

This PR proposes to remove unnecessary method separateScanRDD.
Fixes #10544

How was this patch tested?

GA tests.

@github-actions github-actions bot added the CORE works for Gluten Core label Aug 27, 2025
@github-actions
Copy link
Copy Markdown

#10544

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

* 2. test case where query plan is constructed from simple dataframes (e.g.
* GlutenDataFrameAggregateSuite) in these cases, separate RDDs takes care of SCAN as a
* result, genFinalStageIterator rather than genFirstStageIterator will be invoked
* 1. SCAN with clickhouse backend (check
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want replace ColumnarCollapseTransformStages#separateScanRDD() with BackendsApiManager.getSettings.excludeScanExecFromCollapsedStage(), but the spotless make this change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer, I suggest to refine the comments for better readability as follows. Thanks.

diff --git a/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala b/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala
index 0c5e1b58b..588ba4567 100644
--- a/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala
+++ b/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala
@@ -438,11 +438,14 @@ case class WholeStageTransformer(child: SparkPlan, materializeInput: Boolean = f
     } else {
 
       /**
-       * the whole stage contains NO [[LeafTransformSupport]]. this the default case for:
-       *   1. SCAN with clickhouse backend (check ColumnarCollapseTransformStages#separateScanRDD())
-       *      2. test case where query plan is constructed from simple dataframes (e.g.
-       *      GlutenDataFrameAggregateSuite) in these cases, separate RDDs takes care of SCAN as a
-       *      result, genFinalStageIterator rather than genFirstStageIterator will be invoked
+       * The whole stage contains NO [[LeafTransformSupport]]. This is the default case for:
+       *   - SCAN of clickhouse backend. See
+       *     BackendsApiManager.getSettings.excludeScanExecFromCollapsedStage.
+       *   - Test case where query plan is constructed from simple DataFrames, e.g.
+       *     GlutenDataFrameAggregateSuite.
+       *
+       * In these cases, separate RDDs take care of SCAN. As a result, genFinalStageIterator rather
+       * than genFirstStageIterator will be invoked.
        */
       new WholeStageZippedPartitionsRDD(
         sparkContext,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@beliefer
Copy link
Copy Markdown
Contributor Author

beliefer commented Sep 1, 2025

ping @zzcclp @zml1206 @FelixYBW

@beliefer
Copy link
Copy Markdown
Contributor Author

beliefer commented Sep 9, 2025

cc @philo-he

Copy link
Copy Markdown
Member

@philo-he philo-he left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One minor comment. Thanks.

* 2. test case where query plan is constructed from simple dataframes (e.g.
* GlutenDataFrameAggregateSuite) in these cases, separate RDDs takes care of SCAN as a
* result, genFinalStageIterator rather than genFirstStageIterator will be invoked
* 1. SCAN with clickhouse backend (check
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beliefer, I suggest to refine the comments for better readability as follows. Thanks.

diff --git a/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala b/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala
index 0c5e1b58b..588ba4567 100644
--- a/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala
+++ b/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala
@@ -438,11 +438,14 @@ case class WholeStageTransformer(child: SparkPlan, materializeInput: Boolean = f
     } else {
 
       /**
-       * the whole stage contains NO [[LeafTransformSupport]]. this the default case for:
-       *   1. SCAN with clickhouse backend (check ColumnarCollapseTransformStages#separateScanRDD())
-       *      2. test case where query plan is constructed from simple dataframes (e.g.
-       *      GlutenDataFrameAggregateSuite) in these cases, separate RDDs takes care of SCAN as a
-       *      result, genFinalStageIterator rather than genFirstStageIterator will be invoked
+       * The whole stage contains NO [[LeafTransformSupport]]. This is the default case for:
+       *   - SCAN of clickhouse backend. See
+       *     BackendsApiManager.getSettings.excludeScanExecFromCollapsedStage.
+       *   - Test case where query plan is constructed from simple DataFrames, e.g.
+       *     GlutenDataFrameAggregateSuite.
+       *
+       * In these cases, separate RDDs take care of SCAN. As a result, genFinalStageIterator rather
+       * than genFirstStageIterator will be invoked.
        */
       new WholeStageZippedPartitionsRDD(
         sparkContext,

@github-actions
Copy link
Copy Markdown

github-actions bot commented Sep 9, 2025

Run Gluten Clickhouse CI on x86

@zml1206 zml1206 merged commit 11ae9a7 into apache:main Sep 9, 2025
56 checks passed
@beliefer
Copy link
Copy Markdown
Contributor Author

@zml1206 @philo-he @jinchengchenghh Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove unnecessary method separateScanRDD

4 participants