Skip to content

Conversation

@nielm
Copy link
Contributor

@nielm nielm commented Apr 26, 2020

Batching is often disabled in streaming pipelines to improve latency. When this is the case, there is no
point in adding several transforms to group/sort/batch the elements, so if batching is disabled, a simple pipeline is used.

Note, this PR is dependent on PR #11528 and PR #11532

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

Copy link
Contributor

@allenpradeep allenpradeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@nielm nielm force-pushed the skipBatchingIfDisabled branch 2 times, most recently from 2955973 to 718884b Compare May 18, 2020 23:43
@TheNeuralBit
Copy link
Member

Retest this please

nielm added 2 commits May 19, 2020 13:06
Grouping adds significant latency and memory use, and when streaming
this causes both OOMs and high pipeline latencies.
When batching is disabled, there is no need for SpannerIO to read the
schema, group, sort, batch and write batches, so simplify the pipeline
to just write the mutation.
@nielm nielm force-pushed the skipBatchingIfDisabled branch from 718884b to ade11d9 Compare May 19, 2020 11:15
@nielm
Copy link
Contributor Author

nielm commented May 19, 2020

Retest this please

@TheNeuralBit
Copy link
Member

Retest this please

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just one minor request

@nielm
Copy link
Contributor Author

nielm commented May 20, 2020

Retest this please

1 similar comment
@TheNeuralBit
Copy link
Member

Retest this please

@TheNeuralBit
Copy link
Member

LGTM. I'll merge when CI is green

@TheNeuralBit
Copy link
Member

Retest this please

3 similar comments
@TheNeuralBit
Copy link
Member

Retest this please

@TheNeuralBit
Copy link
Member

Retest this please

@TheNeuralBit
Copy link
Member

Retest this please

@TheNeuralBit
Copy link
Member

Jenkins seems to be very sleepy today...

@TheNeuralBit
Copy link
Member

Retest this please

@TheNeuralBit
Copy link
Member

retest this please

@udim
Copy link
Member

udim commented May 20, 2020

Retest this please

1 similar comment
@TheNeuralBit
Copy link
Member

Retest this please

@TheNeuralBit
Copy link
Member

Run Java PostCommit

1 similar comment
@TheNeuralBit
Copy link
Member

Run Java PostCommit

@TheNeuralBit
Copy link
Member

Run Java PreCommit

1 similar comment
@TheNeuralBit
Copy link
Member

Run Java PreCommit

@TheNeuralBit TheNeuralBit merged commit b33ed49 into apache:master May 21, 2020
yirutang pushed a commit to yirutang/beam that referenced this pull request Jul 23, 2020
* Disable grouping by default when streaming.

Grouping adds significant latency and memory use, and when streaming
this causes both OOMs and high pipeline latencies.

* Simplify pipeline when batching is disabled.

When batching is disabled, there is no need for SpannerIO to read the
schema, group, sort, batch and write batches, so simplify the pipeline
to just write the mutation.

* Fix noBatching test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants