-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-9822] Simplify pipeline when batching is disabled. #11529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f133b40 to
5d7dda4
Compare
5d7dda4 to
4031659
Compare
allenpradeep
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
2955973 to
718884b
Compare
|
Retest this please |
Grouping adds significant latency and memory use, and when streaming this causes both OOMs and high pipeline latencies.
When batching is disabled, there is no need for SpannerIO to read the schema, group, sort, batch and write batches, so simplify the pipeline to just write the mutation.
718884b to
ade11d9
Compare
|
Retest this please |
|
Retest this please |
TheNeuralBit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, just one minor request
...ogle-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIOWriteTest.java
Show resolved
Hide resolved
|
Retest this please |
1 similar comment
|
Retest this please |
|
LGTM. I'll merge when CI is green |
|
Retest this please |
3 similar comments
|
Retest this please |
|
Retest this please |
|
Retest this please |
|
Jenkins seems to be very sleepy today... |
|
Retest this please |
|
retest this please |
|
Retest this please |
1 similar comment
|
Retest this please |
|
Run Java PostCommit |
1 similar comment
|
Run Java PostCommit |
|
Run Java PreCommit |
1 similar comment
|
Run Java PreCommit |
* Disable grouping by default when streaming. Grouping adds significant latency and memory use, and when streaming this causes both OOMs and high pipeline latencies. * Simplify pipeline when batching is disabled. When batching is disabled, there is no need for SpannerIO to read the schema, group, sort, batch and write batches, so simplify the pipeline to just write the mutation. * Fix noBatching test
Batching is often disabled in streaming pipelines to improve latency. When this is the case, there is no
point in adding several transforms to group/sort/batch the elements, so if batching is disabled, a simple pipeline is used.
Note, this PR is dependent on PR #11528 and PR #11532
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.