[HUDI-6863] Revert auto-tuning of dedup parallelism #9722

yihua · 2023-09-15T05:49:25Z

Change Logs

Before this PR, the auto-tuning logic for dedup parallelism dictates the write parallelism so that the user-configured hoodie.upsert.shuffle.parallelism is ignored. This PR reverts #6802 to fix the issue.

Impact

Performance fix

Risk level

low

Documentation Update

N/A

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

nsivabalan · 2023-09-15T16:43:50Z

Lets revisit the problems 6802 was tackliing. Main issue it was addressing is, making our shuffle parallelism dynamic and relative to the incoming df's num partitions. So, if someone is running 1000s of pipelines, they don't need to statically set the right value for shuffle parallelism for each of the 1000 pipelines.

can you help me understand whats the issue we are hitting that warrants us to revert it?
also, this would mean that we are going back to old state where we expect users to explicitly configure the shuffle parallelism.
If so, do we have a plan around dynamically choosing the right shuffle partition value depending on incoming batch?

yihua · 2023-09-15T17:07:05Z

Lets revisit the problems 6802 was tackliing. Main issue it was addressing is, making our shuffle parallelism dynamic and relative to the incoming df's num partitions. So, if someone is running 1000s of pipelines, they don't need to statically set the right value for shuffle parallelism for each of the 1000 pipelines.

can you help me understand whats the issue we are hitting that warrants us to revert it? also, this would mean that we are going back to old state where we expect users to explicitly configure the shuffle parallelism. If so, do we have a plan around dynamically choosing the right shuffle partition value depending on incoming batch?

This PR does not revert the dynamic determination of the shuffle parallelism. The decided target shuffle parallelism is passed in with "int parallelism" through deduplicateRecords. Without the revert, the user loses the ability to override the parallelism through the shuffle parallelism configs because parallelism can be ignored inside this method and the rest of the write DAG uses the new parallelism.

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala

nsivabalan

1 minor comments. source code changes looks good.

yihua · 2023-09-16T01:17:11Z

CI is green.

Before this PR, the auto-tuning logic for dedup parallelism dictates the write parallelism so that the user-configured `hoodie.upsert.shuffle.parallelism` is ignored. This commit reverts #6802 to fix the issue.

Before this PR, the auto-tuning logic for dedup parallelism dictates the write parallelism so that the user-configured `hoodie.upsert.shuffle.parallelism` is ignored. This commit reverts apache#6802 to fix the issue.

[HUDI-6863] Revert auto-tuning of dedup parallelism

ea619c6

apache deleted a comment from hudi-bot Sep 15, 2023

nsivabalan reviewed Sep 15, 2023

View reviewed changes

hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestHoodieSparkSqlWriter.scala Outdated Show resolved Hide resolved

nsivabalan reviewed Sep 15, 2023

View reviewed changes

Remove unnecessary parallelism configs in tests

09a14fb

nsivabalan approved these changes Sep 15, 2023

View reviewed changes

yihua mentioned this pull request Sep 16, 2023

[MINOR] Add tests on combine parallelism #9731

Merged

4 tasks

apache deleted a comment from hudi-bot Sep 16, 2023

yihua added priority:blocker Production down; release blocker release-0.14.0 labels Sep 16, 2023

yihua merged commit ea8f925 into apache:master Sep 16, 2023

hudi-bot mentioned this pull request Dec 9, 2025

Revert "Auto-tune dedup parallelism" #16231

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-6863] Revert auto-tuning of dedup parallelism #9722

[HUDI-6863] Revert auto-tuning of dedup parallelism #9722

Uh oh!

yihua commented Sep 15, 2023 •

edited

Loading

Uh oh!

nsivabalan commented Sep 15, 2023

Uh oh!

yihua commented Sep 15, 2023

Uh oh!

Uh oh!

nsivabalan left a comment

Uh oh!

yihua commented Sep 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[HUDI-6863] Revert auto-tuning of dedup parallelism #9722

[HUDI-6863] Revert auto-tuning of dedup parallelism #9722

Uh oh!

Conversation

yihua commented Sep 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change Logs

Impact

Risk level

Documentation Update

Contributor's checklist

Uh oh!

nsivabalan commented Sep 15, 2023

Uh oh!

yihua commented Sep 15, 2023

Uh oh!

Uh oh!

nsivabalan left a comment

Choose a reason for hiding this comment

Uh oh!

yihua commented Sep 16, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yihua commented Sep 15, 2023 •

edited

Loading