-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-10047] Merge the stages 'Gather and Sort' and 'Create Batches' #11570
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f6a09a2 to
87d22b5
Compare
|
This is great niel. With these changes, there are 3 modes of using SpannerIO write. Questions:
|
|
I'm good with these changes except the questions I had regarding the usages. |
...ava/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerIO.java
Outdated
Show resolved
Hide resolved
|
Hi Niel, |
We can discuss this outside the scope of this PR.
I have added a section to the javadoc explaining these 3 modes of operation, and their pros and cons. |
ac119f0 to
8f94438
Compare
|
Retest this please |
|
Retest this please |
2 similar comments
|
Retest this please |
|
Retest this please |
There is minimal benefit in separating these 2 stages, and significant benefity in merging them: Gather and Sort encodes incoming MutationGroups into a List<byte[]> which would contain up to 1GB. This is then output (copied) to the CreateBatches where it is decoded back into MutationGroups. Removing this encode/decode should save up to 2GB of RAM.
|
Retest this please |
1 similar comment
|
Retest this please |
|
Is this ready to merge? |
|
Run Java PreCommit |
Can't tell if tests passed or not, rerunning. |
|
Can we merge this PR? I would want to send out a PR to count bytes written to spanner and that would be dependent on this. |
|
Retest this please |
|
Run Java PostCommit |
|
Thanks. We can merge if post-commit tests pass. |
There is minimal benefit in separating these 2 stages, and significant
benefity in merging them: Gather and Sort encodes incoming
MutationGroups into a List<byte[]> which would contain up to 1GB.
This is then output (copied) to the CreateBatches where it is decoded
back into MutationGroups.
Removing this encode/decode should save up to 2GB of RAM.
Note, this PR is dependent on PR #11528, PR #11532 and PR #11529
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.