Skip to content

Conversation

@nielm
Copy link
Contributor

@nielm nielm commented Apr 27, 2020

Grouping adds significant latency and memory use.
When streaming, end-to-end pipeline latency is important, and many worker threads are executed, meaning that OOM's can frequently occur.

This PR disables grouping by default in streaming mode, ensuring lower memory use and faster end-end latency.

Note, this PR is dependent on PR #11528

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@nielm nielm changed the title Disable grouping streaming []BEAM-9822] Disable grouping streaming Apr 27, 2020
@nielm nielm force-pushed the disableGroupingStreaming branch 2 times, most recently from 8b56f44 to 865a11b Compare April 27, 2020 11:14
@nielm nielm changed the title []BEAM-9822] Disable grouping streaming [BEAM-9822] Disable grouping when streaming Apr 27, 2020
@nielm nielm force-pushed the disableGroupingStreaming branch from 865a11b to 573bfb9 Compare May 1, 2020 11:22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if this condition needs to be based on the input passed to this stage or based on some parameter from the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's kinda both!
If the source is unbounded (streaming) - and the groupingFactor has not been specified by the user, then default to no grouping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any chance that someone using SpannerIO in a streaming pipeline is relying on the default grouping factor being 1000? I'm concerned this backwards-incompatible change could break them. Would it be sufficient to just give users the option to disable batching by setting the grouping factor to 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They already can set groupingFactorb to 1 if they want...
Breaking backward compatibility: unlikely.

The default of 1000 causes OOMs when using streaming, with wide windows, and high throughput... When this happens, it is not always obvious that grouping is the issue...

With smaller windows/less throughput, it is much less likely that a group will be filled, (groups are bounded by bundles, which are bounded by windows)., So it is unlikely that anyone ever got to fill the group with 1000 batches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They already can set groupingFactorb to 1 if they want...

Ha yeah sorry that was unclear. At the time I thought that groupingFactor = 1 enabled the optimization in #11529, so I was wondering if this was really necessary since users could just enable them by setting grouping factor manually. But I see now that grouping is separate from batching. And its disabling batching that enables your other PR.

@nielm nielm force-pushed the disableGroupingStreaming branch from 573bfb9 to cecb959 Compare May 3, 2020 10:14
@allenpradeep
Copy link
Contributor

LGTM.

@nielm nielm force-pushed the disableGroupingStreaming branch 2 times, most recently from 9e56c08 to 3cc214b Compare May 18, 2020 23:42
@nielm
Copy link
Contributor Author

nielm commented May 18, 2020

Retest this please

1 similar comment
@TheNeuralBit
Copy link
Member

Retest this please

@TheNeuralBit
Copy link
Member

Looks like you need to run spotless to auto-format. You can use ./gradlew spotlessApply to do that locally (may need to do it on the other PRs as well)

Grouping adds significant latency and memory use, and when streaming
this causes both OOMs and high pipeline latencies.
@nielm nielm force-pushed the disableGroupingStreaming branch from 3cc214b to d64df6a Compare May 19, 2020 11:07
@nielm
Copy link
Contributor Author

nielm commented May 19, 2020

Retest this please

@nielm
Copy link
Contributor Author

nielm commented May 19, 2020

Looks like you need to run spotless to auto-format.
Done - sorry!

@TheNeuralBit
Copy link
Member

Retest this please

1 similar comment
@TheNeuralBit
Copy link
Member

Retest this please

.orElse(
input.isBounded() == IsBounded.BOUNDED
? DEFAULT_GROUPING_FACTOR
: 1),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this were another constant. We could have DEFAULT_GROUPING_FACTOR_BOUNDED and DEFAULT_GROUPING_FACTOR_UNBOUNDED. It doesn't need to be done here, could be in a follow-up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do in a separate pr

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! I'll go ahead and merge this, but can you add a note about this in CHANGES.md in another PR?

@TheNeuralBit TheNeuralBit merged commit 3f2d648 into apache:master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants