-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Add BigTableRead connector and new feature implemented #35696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… connected and actually look good on user end for mutations
…hemaTransformProviderIT, and testing out new mutations etc
…ted new user input, all mutations work correctly, put demo code for it
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java Co-authored-by: Derrick Williams <myutat@gmail.com>
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java Co-authored-by: Derrick Williams <myutat@gmail.com>
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java Co-authored-by: Derrick Williams <myutat@gmail.com>
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java Co-authored-by: Derrick Williams <myutat@gmail.com>
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProviderIT.java Co-authored-by: Derrick Williams <myutat@gmail.com>
|
Assigning reviewers: R: @jrmccluskey for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
...m/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadSchemaTransformProvider.java
Show resolved
Hide resolved
...m/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadSchemaTransformProvider.java
Show resolved
Hide resolved
...m/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadSchemaTransformProvider.java
Show resolved
Hide resolved
...m/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadSchemaTransformProvider.java
Show resolved
Hide resolved
| + "key\": ByteString\n" | ||
| + "\"type\": String\n" | ||
| + "\"value\": ByteString\n" | ||
| + "\"column_qualifier\": ByteString\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type for qualifier is bytearray but here it's bytestring, is this accurate? Same for key and value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for that catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bump on this ^
...src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadSchemaTransformProviderIT.java
Show resolved
Hide resolved
|
Run PythonDocker PreCommit 3.9 |
| if type(self) == type(other): | ||
| other_dict = other.__dict__ | ||
| elif type(other) == type(NamedTuple): | ||
| other_dict = other._asdict() | ||
| else: | ||
| return False | ||
| return ( | ||
| type(self) == type(other) and | ||
| len(self.__dict__) == len(other.__dict__) and all( | ||
| s == o | ||
| for s, o in zip(self.__dict__.items(), other.__dict__.items()))) | ||
| len(self.__dict__) == len(other_dict) and | ||
| all(s == o for s, o in zip(self.__dict__.items(), other_dict.items()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I think the problem described in #35790 is because this only handles top-level NamedTuples. Should be a straightforward fix but not a blocker for this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per Derricks advice, I or derrick can work on it after the PR is merged
ahmedabu98
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that we have BigtableReadSchemaTransformConfiguration implements Serializable, but besides that this LGTM
|
Run PythonDocker PreCommit 3.9 |
* Refactored BigTableReadSchemaTransformConfiguration * changed scope, working on buffer class for making BigTable yaml fully connected and actually look good on user end for mutations * Finished up a bit of standard_io.yaml * Finished up a bit of standard_io.yaml * Added bigTable test * changed some tests for BigTable * Added new IT file for simpleWrite and also made changes integration test debugging * Added new IT file for simpleWrite and also made changes integration test debugging * SetCell mutation test works, I want to see if this draft PR works CI test wise * Fixed a slight error * Added way more changes to integrations test.py, BigTableSimpleWriteSchemaTransformProviderIT, and testing out new mutations etc * BigTableSimpleWriteSchemaTransformProviderIT finished changes to mutated new user input, all mutations work correctly, put demo code for it * Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java Co-authored-by: Derrick Williams <myutat@gmail.com> * Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java Co-authored-by: Derrick Williams <myutat@gmail.com> * Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java Co-authored-by: Derrick Williams <myutat@gmail.com> * Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java Co-authored-by: Derrick Williams <myutat@gmail.com> * Update sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProviderIT.java Co-authored-by: Derrick Williams <myutat@gmail.com> * changed comments * Added changes from derrick comments * Added default schema maybe fixes the issues * Added schema to every test specificly, will run tests to see if it works * Added default schema maybe fixes the issues * Following formatting tests * Following formatting tests * Following checkstyle tests * Made schema and test changes * Made schema and test changes * Made schema and test changes * Made schema and test changes * Made schema and test changes * Added final test * changed timestamp values * added all mutations test * added all mutations test * pushed changes to format errors * pushed changes to format errors * Delete 4 * pushed changes to format errors * pushed changes to format errors * pushed changes to format errors * pushed changes to debugging errors * pushed changes to debugging errors * to see internal error added print(will remove) * to see internal error added print(will remove) * to see internal error added print(will remove) * import fixes * import fixes * import fixes * import fixes * import fixes * import fixes * pushed changes to debugging errors * pushed changes to debugging errors * pushed changes to debugging errors, added pulls from other beam * made changes to allMutations test * made changes to allMutations test * pushed changes to debugging errors, added pulls from other beam * pushed changes to debugging errors, added pulls from other beam * pushed changes to debugging errors, added pulls from other beam * pushed changes to debugging errors, added pulls from other beam * pushed changes to debugging errors, added pulls from other beam * new read errors fixed * pushed changes to debugging errors, added pulls from other beam * consolidated schema transform files, fixed small issues and bugs * consolidated schema transform files, fixed small issues and bugs * consolidated schema transform files, fixed small issues and bugs * consolidated schema transform files, fixed small issues and bugs * pushed changes to debugging errors, added pulls from other beam * pushed changes from ahmed * pushed changes from ahmed * pushed changes from ahmed * pushed changes from ahmed * pushed changes from ahmed * pushed changes from ahmed * pushed changes from ahmed * pushed changes from ahmed * Following checkstyle tests * Following checkstyle tests * pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) * pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff * pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff * Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadSchemaTransformProvider.java Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com> * pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff * pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff * pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff * pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff * new mongo files in branch * fixed family_name to string * fixed family_name to string * fixed family_name to string * fixed family_name to string * fixed family_name to string * fixed family_name to string * fixed family_name to string * fixed family_name to string * fixed cmmit issues * commented assert test, everything should work now --------- Co-authored-by: Derrick Williams <myutat@gmail.com> Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com>
@fozzie15 @ahmedabu98 @damccorm @derrickaw
I added the BigTable Read Connector for BeamYaml
added new logic for bigtable yaml with the option to simplify what the read from bigtable returns, a flattened feature makes it more readable than the old logic,
added tests to integration_test.py (I commented out some bugs in integration_test.py from new commits on the main branch, will remove if no more errors)
fixed BigTableReadSchemaTransformIT to reflect new functionality
made bigtableWrite family_name as string instead of bytes
all logic works, please let me know if anything looks off/funny and if anything can be improved
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.