[GLUTEN-10544] Remove unnecessary method separateScanRDD by beliefer · Pull Request #10545 · apache/gluten

beliefer · 2025-08-27T03:05:46Z

What changes are proposed in this pull request?

This PR proposes to remove unnecessary method separateScanRDD.
Fixes #10544

How was this patch tested?

GA tests.

github-actions · 2025-08-27T03:06:00Z

#10544

github-actions · 2025-08-27T03:06:15Z

Run Gluten Clickhouse CI on x86

jinchengchenghh · 2025-08-27T16:44:19Z

gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala

-       *      2. test case where query plan is constructed from simple dataframes (e.g.
-       *      GlutenDataFrameAggregateSuite) in these cases, separate RDDs takes care of SCAN as a
-       *      result, genFinalStageIterator rather than genFirstStageIterator will be invoked
+       *   1. SCAN with clickhouse backend (check


What's the change?

I just want replace ColumnarCollapseTransformStages#separateScanRDD() with BackendsApiManager.getSettings.excludeScanExecFromCollapsedStage(), but the spotless make this change.

@beliefer, I suggest to refine the comments for better readability as follows. Thanks.

diff --git a/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala b/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala index 0c5e1b58b..588ba4567 100644 --- a/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala +++ b/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala @@ -438,11 +438,14 @@ case class WholeStageTransformer(child: SparkPlan, materializeInput: Boolean = f } else { /** - * the whole stage contains NO [[LeafTransformSupport]]. this the default case for: - * 1. SCAN with clickhouse backend (check ColumnarCollapseTransformStages#separateScanRDD()) - * 2. test case where query plan is constructed from simple dataframes (e.g. - * GlutenDataFrameAggregateSuite) in these cases, separate RDDs takes care of SCAN as a - * result, genFinalStageIterator rather than genFirstStageIterator will be invoked + * The whole stage contains NO [[LeafTransformSupport]]. This is the default case for: + * - SCAN of clickhouse backend. See + * BackendsApiManager.getSettings.excludeScanExecFromCollapsedStage. + * - Test case where query plan is constructed from simple DataFrames, e.g. + * GlutenDataFrameAggregateSuite. + * + * In these cases, separate RDDs take care of SCAN. As a result, genFinalStageIterator rather + * than genFirstStageIterator will be invoked. */ new WholeStageZippedPartitionsRDD( sparkContext,

beliefer · 2025-09-01T03:15:08Z

ping @zzcclp @zml1206 @FelixYBW

beliefer · 2025-09-09T03:07:52Z

cc @philo-he

philo-he

Looks good. One minor comment. Thanks.

philo-he · 2025-09-09T09:39:18Z

gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala

-       *      2. test case where query plan is constructed from simple dataframes (e.g.
-       *      GlutenDataFrameAggregateSuite) in these cases, separate RDDs takes care of SCAN as a
-       *      result, genFinalStageIterator rather than genFirstStageIterator will be invoked
+       *   1. SCAN with clickhouse backend (check


@beliefer, I suggest to refine the comments for better readability as follows. Thanks.

diff --git a/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala b/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala index 0c5e1b58b..588ba4567 100644 --- a/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala +++ b/gluten-substrait/src/main/scala/org/apache/gluten/execution/WholeStageTransformer.scala @@ -438,11 +438,14 @@ case class WholeStageTransformer(child: SparkPlan, materializeInput: Boolean = f } else { /** - * the whole stage contains NO [[LeafTransformSupport]]. this the default case for: - * 1. SCAN with clickhouse backend (check ColumnarCollapseTransformStages#separateScanRDD()) - * 2. test case where query plan is constructed from simple dataframes (e.g. - * GlutenDataFrameAggregateSuite) in these cases, separate RDDs takes care of SCAN as a - * result, genFinalStageIterator rather than genFirstStageIterator will be invoked + * The whole stage contains NO [[LeafTransformSupport]]. This is the default case for: + * - SCAN of clickhouse backend. See + * BackendsApiManager.getSettings.excludeScanExecFromCollapsedStage. + * - Test case where query plan is constructed from simple DataFrames, e.g. + * GlutenDataFrameAggregateSuite. + * + * In these cases, separate RDDs take care of SCAN. As a result, genFinalStageIterator rather + * than genFirstStageIterator will be invoked. */ new WholeStageZippedPartitionsRDD( sparkContext,

github-actions · 2025-09-09T10:55:51Z

Run Gluten Clickhouse CI on x86

beliefer · 2025-09-10T02:10:48Z

@zml1206 @philo-he @jinchengchenghh Thank you!

[GLUTEN-10544] Remove unnecessary method separateScanRDD

9177724

github-actions bot added the CORE works for Gluten Core label Aug 27, 2025

jinchengchenghh reviewed Aug 27, 2025

View reviewed changes

beliefer requested a review from jinchengchenghh August 31, 2025 04:45

philo-he approved these changes Sep 9, 2025

View reviewed changes

Refine comments

a2f6c71

zml1206 approved these changes Sep 9, 2025

View reviewed changes

zml1206 merged commit 11ae9a7 into apache:main Sep 9, 2025
56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-10544] Remove unnecessary method separateScanRDD#10545

[GLUTEN-10544] Remove unnecessary method separateScanRDD#10545
zml1206 merged 2 commits intoapache:mainfrom
beliefer:10544

beliefer commented Aug 27, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

jinchengchenghh Aug 27, 2025

Uh oh!

beliefer Aug 31, 2025

Uh oh!

philo-he Sep 9, 2025

Uh oh!

beliefer Sep 9, 2025

Uh oh!

beliefer commented Sep 1, 2025

Uh oh!

beliefer commented Sep 9, 2025

Uh oh!

philo-he left a comment

Uh oh!

philo-he Sep 9, 2025

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

Uh oh!

beliefer commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

beliefer commented Aug 27, 2025

What changes are proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

jinchengchenghh Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

beliefer Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

philo-he Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

beliefer Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

beliefer commented Sep 1, 2025

Uh oh!

beliefer commented Sep 9, 2025

Uh oh!

philo-he left a comment

Choose a reason for hiding this comment

Uh oh!

philo-he Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 9, 2025

Uh oh!

Uh oh!

beliefer commented Sep 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants