Skip to content

[GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle#9035

Merged
jackylee-ch merged 9 commits intoapache:mainfrom
WangGuangxin:feat_partial_project_opt
Apr 22, 2025
Merged

[GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle#9035
jackylee-ch merged 9 commits intoapache:mainfrom
WangGuangxin:feat_partial_project_opt

Conversation

@WangGuangxin
Copy link
Copy Markdown
Contributor

@WangGuangxin WangGuangxin commented Mar 18, 2025

What changes were proposed in this pull request?

Shuffle read may generate small batch with few rows, which may hurt performance a lot.

A example in our production case is
image

So in this PR proposed to add VeloxResizeBatchesExec right after shuffle read, the plan is changed to

image

As we can see, the average batch size increased from 9 to 1000, and the total hour reduced from 2066h to 766h.

(Fixes: #9034)

How was this patch tested?

manually and UT

@github-actions github-actions bot added the VELOX label Mar 18, 2025
@github-actions
Copy link
Copy Markdown

#9034

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

cc @jinchengchenghh @jackylee-ch

@jackylee-ch
Copy link
Copy Markdown
Contributor

Can you provide the complete dag diagram? Maybe it can be solved by adjusting the number of input partitions, such as maxPartitionSize?

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

without VeloxResizeBatchesExec

image

with VeloxResizeBatchesExec

image

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

Can you provide the complete dag diagram? Maybe it can be solved by adjusting the number of input partitions, such as maxPartitionSize?

@jackylee-ch In some cases, it can be controled by maxPartitionSize. But if it's under Join/Filter etc, it's not easy to make sure the batch size in a reasonable range

@jackylee-ch
Copy link
Copy Markdown
Contributor

image
It seems that the anomal number of vectors occurs after the shuffle read. Maybe consider adding a resize operation right after the shuffle read so that all subsequent operators can benefit?

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

image It seems that the anomal number of vectors occurs after the shuffle read. Maybe consider adding a resize operation right after the shuffle read so that all subsequent operators can benefit?

@jackylee-ch yeah, agree with that. I'll update it later

@github-actions github-actions bot added the CORE works for Gluten Core label Mar 22, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@WangGuangxin WangGuangxin force-pushed the feat_partial_project_opt branch from 70d695b to 25afa8e Compare March 22, 2025 02:29
@WangGuangxin WangGuangxin changed the title [GLUTEN-9034][VL] Add VeloxResizeBatchesExec before ColumnarPartialProject [GLUTEN-9034][VL] Add VeloxResizeBatchesExec right after ShuffleRead Mar 22, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@WangGuangxin WangGuangxin force-pushed the feat_partial_project_opt branch from 25afa8e to 2991053 Compare March 22, 2025 04:54
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@WangGuangxin WangGuangxin force-pushed the feat_partial_project_opt branch from 497d69b to 438a101 Compare March 24, 2025 04:18
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

image It seems that the anomal number of vectors occurs after the shuffle read. Maybe consider adding a resize operation right after the shuffle read so that all subsequent operators can benefit?

@jackylee-ch yeah, agree with that. I'll update it later

@jackylee-ch @jinchengchenghh @zhztheplayer updated, and also update the title and description. please take a look when you are convenient

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma
Copy link
Copy Markdown
Contributor

marin-ma commented Apr 4, 2025

@WangGuangxin Any update on this patch? Could you do a rebase? Thanks!

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

@WangGuangxin Any update on this patch? Could you do a rebase? Thanks!

thinks, will rebase today

@WangGuangxin WangGuangxin force-pushed the feat_partial_project_opt branch from cf4ff8b to 6b21ba9 Compare April 11, 2025 12:21
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot removed the CORE works for Gluten Core label Apr 11, 2025
.intConf
.createOptional

val COLUMNAR_VELOX_RESIZE_BATCHES_SHUFFLE_INPUT_OUTPUT_MIN_SIZE =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/SHUFFLE_INPUT_OUTPUT/SHUFFLE_OUTPUT ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used both for the BatchResizing before shuffle input and after shuffle output, so that we can reduce some config. Usually there is need to do too much customized min size config for these two scenario. What do you think?

}

def veloxResizeBatchesShuffleInputRange: ResizeRange = {
def veloxResizeBatchesShuffleInputOutputRange: ResizeRange = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment thread backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala Outdated
import org.apache.spark.sql.execution.{ColumnarShuffleExchangeExec, SparkPlan}
import org.apache.spark.sql.execution.adaptive.{AQEShuffleReadExec, ShuffleQueryStageExec}

case class AppendBatchResizeAfterShuffleRead() extends Rule[SparkPlan] {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a rule for the resizing on the input side of shuffle. Can we make the ways of optimizations more consistent? Either both via rules, or both not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, resizing on the output side of shuffle followings the way for shuffle read, that's do it when converting to transformer. But after the DummpyLeafExec is introduced, it doesn't work.
So I'll refactor the way to add resizing on the input side of shuffle, to make it enabled by rule

Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Let's keep it as an individual rule.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

WangGuangxin commented Apr 14, 2025

@jackylee-ch @zhztheplayer Please take another look.
The tpcds 1T in our env shows about 1% ~ 3% performance improvements.

@marin-ma
Copy link
Copy Markdown
Contributor

@jackylee-ch @zhztheplayer Do you have any further comments?

Comment on lines -348 to -376
def maybeAddAppendBatchesExec(plan: SparkPlan): SparkPlan = {
plan match {
case shuffle: ColumnarShuffleExchangeExec
if !shuffle.useSortBasedShuffle &&
VeloxConfig.get.veloxResizeBatchesShuffleInput =>
val range = VeloxConfig.get.veloxResizeBatchesShuffleInputRange
val appendBatches =
VeloxResizeBatchesExec(shuffle.child, range.min, range.max)
shuffle.withNewChildren(Seq(appendBatches))
case _ => plan
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for factoring this out!

@WangGuangxin
Copy link
Copy Markdown
Contributor Author

@jackylee-ch @zhztheplayer Can we merge this?

Copy link
Copy Markdown
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late response. Great work!

@jackylee-ch jackylee-ch changed the title [GLUTEN-9034][VL] Add VeloxResizeBatchesExec right after ShuffleRead [GLUTEN-9034][VL] Add VeloxResizeBatchesExec for Shuffle Apr 22, 2025
@jackylee-ch jackylee-ch merged commit 9a7d5fc into apache:main Apr 22, 2025
46 checks passed
val range = VeloxConfig.get.veloxResizeBatchesShuffleInputOutputRange
plan.transformUp {
case shuffle: ColumnarShuffleExchangeExec
if !shuffle.useSortBasedShuffle &&
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this will be only enabled on hash based shuffle?
Cc @marin-ma

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We don't need to resize input batches for sort-based shuffle.

@jinchengchenghh
Copy link
Copy Markdown
Contributor

Do you notice in some queries, the plan cannot be fully replaced? Like TPCDS Q95, there are 4 AQEShuffleRead, but only one add the VeloxResizeBatch node. @WangGuangxin
Image 11-11-2025 at 14 46

@jinchengchenghh
Copy link
Copy Markdown
Contributor

Fixed in #11069 @WangGuangxin @jackylee-ch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Add VeloxResizeBatchesExec right after ShuffleRead

6 participants