Skip to content

[CORE] Optimize JoinSelectionOverrides when disable ras#6048

Closed
zml1206 wants to merge 3 commits intoapache:mainfrom
zml1206:optimize_JoinSelectionOverrides
Closed

[CORE] Optimize JoinSelectionOverrides when disable ras#6048
zml1206 wants to merge 3 commits intoapache:mainfrom
zml1206:optimize_JoinSelectionOverrides

Conversation

@zml1206
Copy link
Copy Markdown
Contributor

@zml1206 zml1206 commented Jun 11, 2024

What changes were proposed in this pull request?

  1. creates "more vanilla" plan when the join operators are falling back
  2. fix use the smaller table to build hashmap in shuffled hash join when AQE enabled
  3. clickhouse backend already support AQE, delete relevant code.

How was this patch tested?

@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@zml1206 zml1206 marked this pull request as draft June 11, 2024 16:15
[CORE] Optimize JoinSelectionOverrides

[CORE] Optimize JoinSelectionOverrides
@zml1206 zml1206 force-pushed the optimize_JoinSelectionOverrides branch from edbc020 to 1852c95 Compare June 12, 2024 11:30
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@zml1206 zml1206 changed the title [CORE] Optimize JoinSelectionOverrides [CORE] Optimize JoinSelectionOverrides when disable ras Jun 12, 2024
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@zml1206 zml1206 marked this pull request as ready for review June 12, 2024 23:48
@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

cc @zhztheplayer

@zhztheplayer
Copy link
Copy Markdown
Member

Thank you for continuing working on this. @zml1206

And would you like to summarize a simple note for the responsibility of each of the following two parts of rules, after this refactor:

  1. Strategy (namely JoinSelectionOverrides)
  2. Columnar rule (OffloadJoin)

? It will help one understand the code change. Thanks.

@zhztheplayer
Copy link
Copy Markdown
Member

zhztheplayer commented Jun 13, 2024

IIUC, is the only thing ShuffledHashJoinExecTemp tends to do is to represent that its build side information is "temporary" that waits for columnar rules to correct?

@ulysses-you
Copy link
Copy Markdown
Contributor

What is ShuffledHashJoinExecTemp ? plese do not introduce extra concept in final plan..

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

IIUC, are the only thing ShuffledHashJoinExecTemp tends to do is to represent that its build side information is "temporary" that waits for columnar rules to correct?

No, the buildSide generated by ShuffledHashJoinExecTemp is final, and the column rules correction is the buildside of ShuffledHashJoinExec when ras is enabled.

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

What is ShuffledHashJoinExecTemp ? plese do not introduce extra concept in final plan..

ShuffledHashJoinExecTemp

What is ShuffledHashJoinExecTemp ? plese do not introduce extra concept in final plan..

ShuffledHashJoinExecTemp is a temporary node and will eventually be eliminated, mainly to select a smaller table as buildSide in custom strategy for ShuffledHashJoinExecTransformer.
In AQE, After reOptimize, , only when the cost becomes smaller or the physical plan changes, then AQE will take effect, and then the sizeByte of the logical plan will be updated (https://github.com/apache/spark/blob /53d65fd12dd9231139188227ef9040d40d759021/sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala#L383-L392)

@zhztheplayer
Copy link
Copy Markdown
Member

zhztheplayer commented Jun 13, 2024

No, the buildSide generated by ShuffledHashJoinExecTemp is final
ShuffledHashJoinExecTemp is a temporary node and will eventually be eliminated

According to the comments and code, the reason ShuffledHashJoinExecTemp is temporary is it's not executable by vanilla Spark because of its different build side, am I understanding it correctly?

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

No, the buildSide generated by ShuffledHashJoinExecTemp is final
ShuffledHashJoinExecTemp is a temporary node and will eventually be eliminated

According to the comments and code, the reason ShuffledHashJoinExecTemp is temporary is it's not executable by vanilla Spark because of its different build side, am I understanding it correctly?

Yes.

@ulysses-you
Copy link
Copy Markdown
Contributor

ulysses-you commented Jun 13, 2024

@zml1206 why not pass prefer build side to ShuffledHashJoinExecTransformer directly ? BTW, if it will eventually be eliminated, why the golden files changed ?

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

@zml1206 why not pass prefer build side to ShuffledHashJoinExecTransformer directly ? BTW, if it will eventually be eliminated, why the golden files changed ?

What I hope is that both creates "more vanilla" plan when the join operators are falling back and builds buildsideide with small tables. Prefer build side to ShuffledHashJoinExecTransformer directly is not always possible to select a smaller table. For example, before AQE is SortMergeJoin, the sizeByte on the left is small. During AQE after reOptimize, it is still SortMergeJoin and the sizeByte on the right is smaller. However, because the plan has not changed, the plan will not be replaced. In the end, the left table is selected.

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

BTW, if it will eventually be eliminated, why the golden files changed ?

Initial Plan will contain ShuffledHashJoinExecTemp.

@ulysses-you
Copy link
Copy Markdown
Contributor

However, because the plan has not changed, the plan will not be replaced

it's not sure, AQE will use the new plan if currentPhysicalPlan != newPhysicalPlan. So I think ShuffledHashJoin should always choose the small table. SortMergeJoin is a special case that Spark always build right side for inner join, etc.

If your goal is to optimize the vanilla Spark SortMergeJoin. I think it's better to push to Spark community first. For gluten, we can just optimize SortMergeJoin when do transform.

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

However, because the plan has not changed, the plan will not be replaced

it's not sure, AQE will use the new plan if currentPhysicalPlan != newPhysicalPlan. So I think ShuffledHashJoin should always choose the small table. SortMergeJoin is a special case that Spark always build right side for inner join, etc.

If your goal is to optimize the vanilla Spark SortMergeJoin. I think it's better to push to Spark community first. For gluten, we can just optimize SortMergeJoin when do transform.

First of all, it is primary to ensure that the plan after fallback is consistent with vanilla spark, so we should not force the generation of shuffledHashJoinExec. We should convert ShuffledHashJoinExec/SortMergeJoinExec into ShuffledHashJoinExecTransformer in offload.
Secondly, vanilla spark supports using smaller table as buildSide starting from version 3.5. It is not supported before 3.5, so ShuffledHashJoinExec before 3.5 and SortMergeJoinExec cannot necessarily choose smaller table.

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

You mean ensure plan of Join fallback is consistent with vanilla spark, abandon some scenarios for use smaller table? @ulysses-you

@ulysses-you
Copy link
Copy Markdown
Contributor

IIUC vanilla Spark shuffle hash join should choose small table since 3.0, can you point out to me which pr in Spark3.5 work on this ? thank you.

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

IIUC vanilla Spark shuffle hash join should choose small table since 3.0, can you point out to me which pr in Spark3.5 work on this ? thank you.

apache/spark#41398

@ulysses-you
Copy link
Copy Markdown
Contributor

ulysses-you commented Jun 13, 2024

It seems that pr only allows more build side for shuffled hash join ? Vanilla Spark would choose a small table without gluten, right ?

If we want to fallback a smj to vanilla Spark, I think pass originalJoin to ShuffledHashJoinExecTransformer is enough. Add a tag ? To be clear, I was asking if ShuffledHashJoinExecTemp is required.

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 13, 2024

It seems that pr only allows more build side for shuffled hash join ? Vanilla Spark would choose a small table without gluten, right ?

If we want to fallback a smj to vanilla Spark, I think pass originalJoin to ShuffledHashJoinExecTransformer is enough. Add a tag ? To be clear, I was asking if ShuffledHashJoinExecTemp is required.

ShuffledHashJoinExecTemp allows ShuffledHashJoinExecTransformer to use smaller table in all versions of spark, including the original plan is SortMergeJoin. Use smaller table for buildSide is more memory safe.

@zml1206
Copy link
Copy Markdown
Contributor Author

zml1206 commented Jun 14, 2024

Discussed with @ulysses-you that adding new physical operators is risky, use custom cost evaluator to force update the physical plan is a good way. close it, new PR #6093 cc @zhztheplayer

@zml1206 zml1206 closed this Jun 14, 2024
@zml1206 zml1206 deleted the optimize_JoinSelectionOverrides branch December 9, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants