Skip to content

Conversation

@mszb
Copy link
Contributor

@mszb mszb commented Jan 29, 2020

In addition to ReadFromSpanner (PR), this transform allows users to write on the google cloud spanner by using WriteToSpanner transform.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@lukecwik
Copy link
Member

ping for tests

@mszb
Copy link
Contributor Author

mszb commented Jan 30, 2020

R: @chamikaramj
R: @aaltay

@mszb mszb force-pushed the BEAM-7246_gcp_spannerio_write branch from 9bcaf69 to c8831cb Compare January 30, 2020 07:44
@charlesccychen
Copy link
Contributor

retest this please

@charlesccychen
Copy link
Contributor

Run Python Precommit

Copy link
Contributor Author

@mszb mszb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I've added some explanations to help the review process. Feedbacks are always welcome!


def process(self, element):
if element.primary().operation == 'delete':
# As delete mutations are not batchable.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per java implementation, we can not determine the size of delete mutation, so we simply mark them as unbatchable mutations.

"provided: <%s: %s>" % (self.__class__.__name__,
str(self.__dict__)))

def __call__(self, *args, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in python we don't use builder pattern, I use this magic method to construct the mutation object! This class also have the static method which I thought is more convenient for the user to use.

from google.cloud.spanner import KeySet
from google.cloud.spanner_v1 import batch
from google.cloud.spanner_v1.database import BatchSnapshot
from google.cloud.spanner_v1.proto.mutation_pb2 import Mutation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since spanner package wont expose the Mutation and batch objects, so this is the only way to import it.

values: Values to be modified.
"""
return _Mutator(
mutation=Mutation(insert=batch._make_write_pb(table, columns, values)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to calculate the mutation size to create the batches, we need to create Mutation object which has the method ByteSize() and we uses its value to determine how many mutations we can have in one batch on _BatchFn dofn. We are not passing these mutation object to spanner or anywhere, just using them to calculate the bytesize for write transform!


@with_input_types(MutationGroup)
@with_output_types(MutationGroup)
class _BatchableFilterFn(DoFn):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this DoFn, we are just filtering the objects whos mutation size is greater the max_batch_size provided by the user. This helps us to focus on only batchable mutation which will further passed on the _BatchFn dofn to make batches.

@iemejia
Copy link
Member

iemejia commented Jan 30, 2020

Retest this please

@lukecwik
Copy link
Member

ping for test

@aaltay aaltay requested a review from chamikaramj February 3, 2020 17:37
@chamikaramj
Copy link
Contributor

cc: @nithinsujir and @nielm

@kamilwu
Copy link
Contributor

kamilwu commented Feb 6, 2020

Just to let you know that we've just introduced Python autoformatter. Your merge conflict might be a result of this.
Here you can find an instruction on how to run autoformatter: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips, section Formatting.
Sorry for inconvenience.

@mszb
Copy link
Contributor Author

mszb commented Feb 6, 2020

Just to let you know that we've just introduced Python autoformatter. Your merge conflict might be a result of this.
Here you can find an instruction on how to run autoformatter: https://cwiki.apache.org/confluence/display/BEAM/Python+Tips, section Formatting.
Sorry for inconvenience.

No worries @kamilwu , i'll resolve the conflicts on my next commit. Thanks for the heads up :)

@mszb
Copy link
Contributor Author

mszb commented Feb 10, 2020

ping for test

@aaltay
Copy link
Member

aaltay commented Feb 10, 2020

Trigger tests.

@aaltay
Copy link
Member

aaltay commented Feb 10, 2020

Could you also add a new feature note in https://github.com/apache/beam/blob/master/CHANGES.md

@mszb
Copy link
Contributor Author

mszb commented Feb 10, 2020

Could you also add a new feature note in https://github.com/apache/beam/blob/master/CHANGES.md

Done. :)

@mszb
Copy link
Contributor Author

mszb commented Feb 10, 2020

ping for test

@aaltay
Copy link
Member

aaltay commented Feb 10, 2020

Trigger tests.

@iemejia
Copy link
Member

iemejia commented Feb 11, 2020

retest this please

@aaltay
Copy link
Member

aaltay commented Feb 11, 2020

retest this please

@aaltay
Copy link
Member

aaltay commented Feb 11, 2020

/cc @markflyhigh - Tests seems to be not triggering?

@mszb
Copy link
Contributor Author

mszb commented Feb 12, 2020

@aaltay @markflyhigh Seems like jobs are triggered and completed successfully but showing no activity on github!

https://builds.apache.org/job/beam_PreCommit_Python_Commit/11080/
https://builds.apache.org/job/beam_PreCommit_PythonFormatter_Commit/67/

@aaltay
Copy link
Member

aaltay commented Feb 12, 2020

retest this please

@mszb
Copy link
Contributor Author

mszb commented Feb 12, 2020

@aaltay All three tests are failed due to ImportError: No module named 'pycodestyle' in avro-python3 package.

Portable_Python PreCommit
https://scans.gradle.com/s/3qf5sqnettmmq/console-log?task=:sdks:python:test-suites:portable:py35:installGcpTest#L299

PreCommit
https://scans.gradle.com/s/c5ncivj7k2pko/console-log?task=:sdks:python:test-suites:dataflow:py37:installGcpTest#L658

PythonFormatter PreCommit
https://scans.gradle.com/s/i6nvgyym5tfqk/console-log?task=:sdks:python:test-suites:tox:py37:formatter#L267

Could you please rerun these test, possibly it'll fix the issue!

@aaltay
Copy link
Member

aaltay commented Feb 12, 2020

I believe this is fixed with #10844, you may need to rebase.

@mszb
Copy link
Contributor Author

mszb commented Feb 13, 2020

@aaltay I've rebased my branch, could you please trigger the tests!

  • Thanks.

@aaltay
Copy link
Member

aaltay commented Feb 13, 2020

retest this please

1 similar comment
@aaltay
Copy link
Member

aaltay commented Feb 13, 2020

retest this please

@mszb
Copy link
Contributor Author

mszb commented Feb 15, 2020

@chamikaramj @nielm could you please verify the changes.

Copy link
Contributor

@nielm nielm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mszb
Copy link
Contributor Author

mszb commented Feb 18, 2020

Thanks @nielm .. @chamikaramj is there anything you would like to add or we are ready for merge?

@aaltay
Copy link
Member

aaltay commented Feb 18, 2020

retest this please

1 similar comment
@aaltay
Copy link
Member

aaltay commented Feb 18, 2020

retest this please

@chamikaramj
Copy link
Contributor

LGTM. Thanks.

@chamikaramj chamikaramj merged commit e73e1d1 into apache:master Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants