Improve Datastream for batch performances #32373

jto · 2024-08-30T09:39:40Z

Context

Flink will drop support for the dataset API in 2.0 which should be released by EOY so it quite important for Beam to support Datastream well.

The PR

This PR improves the performances of Batch jobs executed with --useDatastreamForBatch by porting the following performance optimizations already present in FlinkBatchTransformTranslators but lacking in FlinkStreamingTransformTranslators.

Limit the max size of source splits. Similar to Reduce the maximum size of input splits in Flink to better distribute work #28045
Pre-combine before shuffle (both reduce by key and GBK)
Disable bundling in batch mode (except for pre-combine). Lower the default bundle size since the new behavior puts pressure on the heap.

It also implements the following optimizations:

Use a "lazy" split enumerator to distributes split dynamically rather the eagerly. This new enumerator greatly reduces skew as each slot is able to pull new splits to consume only when it has finished its work.
Set the default maxParallelism to parallelism as the total number of splits is equal to maxParallelism. Again this reduces skew.
Make ToKeyedWorkItem part of DoFnOperator which reduces the size of the job graph and avoid unnecessary inter-task communication.
Force a common slot-sharing group on every bounded IOs. This emulate the behavior of the Dataset API which again improves performances especially when data is being shuffled several times while partitioning keys are unchanged (for example of the job does GBK -> map -> CombinePerKey). Add a flag to control this feature (defaults to active).
Other minor optimizations removing repeated serde work.

Benchmarks

The patched version was tested against a few of Spotify's production batch workflows. All settings were left unchanged except for the followings:

passed --useDatastreamForBatch=true
set jobmanager.scheduler: default (otherwise datastream default to adaptive scheduler).

		Beam 2.56 - dataset	Beam 2.56 - datastream		Beam 2.56 - datastream patched
job	# workers	execution time	execution time	% diff	execution time	% diff
Job 1	350	2:19:00	fails after 4h29min	-	1:43:00	-25.90%
Job 2	160	0:23:00	0:35:00	52.17%	0:22:36	-1.74%
Job 3	200	0:53:08	1:34:39	78.14%	failed	-
Job 4	160	2:31:20	4:27:00	76.43%	2:19:35	-7.76%
Job 5	1	0:43:00	not tested	-	0:38:00	-11.63%
Job 6	300	2:58:51	not tested	-	running

Note

Job 3 fails with a stackoverflow exception because if a bug in old version of Kryo. I believe this is because the job uses taskmanager.runtime.large-record-handler: true and it should be fixed in Flink 2.0 since Kryo is upgraded to a more recent version.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

[Flink] fix lazy enumerator package

[Flink] Implement partial reduce [Flink] dead code cleanup [Flink] spotless [Flink] persistent PartialReduceBundleOperator operator state

* spotless * checkstyle * spotless

jto added 17 commits August 29, 2024 11:17

[Flink] Set return type of bounded sources

cdb13cb

[Flink] Use a lazy split enumerator for bounded sources

d27e895

[Flink] fix lazy enumerator package

[Flink] Default to maxParallelism = parallelism in batch

a3267cb

[Flink] Avoid re-serializing trigger on every element

d531300

[Flink] Avoid re-evaluating options every time a new state is stored

5e55454

[Flink] Only serialize states namespace keys if necessary

2f35670

[Flink] Make ToKeyedWorkItem part of the DoFnOperator

5088e93

[Flink] Remove ToBinaryKV

16e37a2

[Flink] Refactor CombinePerKeyTranslator

7147d4b

[Flink] Combine before Reduce (no side-input only)

7146a37

[Flink] Implement partial reduce [Flink] dead code cleanup [Flink] spotless [Flink] persistent PartialReduceBundleOperator operator state

[Flink] Combine before GBK

ad872ab

[Flink] Combine before reduce (with side input)

ec3c54e

[Flink] Force slot sharing group in batch mode

b6cdad1

[Flink] Disable bundling in batch mode

7ef7eed

[Flink] Lower default max bundle size in batch mode

574c3f2

[Flink] Code cleanup

424e80c

* spotless * checkstyle * spotless

[Flink] fix WindowDoFnOperatorTest

aeb8937

github-actions bot added website runners core flink labels Aug 30, 2024

[Flink] spotless

5af3269

jto force-pushed the julient/patched-2.56-clean branch 2 times, most recently from 6d7e1e8 to 32a311e Compare September 10, 2024 09:33

[Flink] fix broken tests

b672504

jto force-pushed the julient/patched-2.56-clean branch from 32a311e to b672504 Compare September 10, 2024 11:39

jto closed this Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Datastream for batch performances #32373

Improve Datastream for batch performances #32373

Uh oh!

jto commented Aug 30, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve Datastream for batch performances #32373

Improve Datastream for batch performances #32373

Uh oh!

Conversation

jto commented Aug 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

The PR

Benchmarks

Note

GitHub Actions Tests Status (on master branch)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jto commented Aug 30, 2024 •

edited

Loading