Skip to content

Conversation

@arnavarora2004
Copy link
Contributor

@arnavarora2004 arnavarora2004 commented Jul 25, 2025

@fozzie15 @ahmedabu98 @damccorm @derrickaw

I added the BigTable Read Connector for BeamYaml

added new logic for bigtable yaml with the option to simplify what the read from bigtable returns, a flattened feature makes it more readable than the old logic,

added tests to integration_test.py (I commented out some bugs in integration_test.py from new commits on the main branch, will remove if no more errors)

fixed BigTableReadSchemaTransformIT to reflect new functionality

made bigtableWrite family_name as string instead of bytes

all logic works, please let me know if anything looks off/funny and if anything can be improved


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

arnavarora2004 and others added 30 commits May 21, 2025 11:43
… connected and actually look good on user end for mutations
…hemaTransformProviderIT, and testing out new mutations etc
…ted new user input, all mutations work correctly, put demo code for it
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java

Co-authored-by: Derrick Williams <myutat@gmail.com>
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java

Co-authored-by: Derrick Williams <myutat@gmail.com>
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java

Co-authored-by: Derrick Williams <myutat@gmail.com>
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java

Co-authored-by: Derrick Williams <myutat@gmail.com>
…am/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProviderIT.java

Co-authored-by: Derrick Williams <myutat@gmail.com>
@github-actions github-actions bot removed the mongodb label Jul 31, 2025
@github-actions
Copy link
Contributor

Assigning reviewers:

R: @jrmccluskey for label python.
R: @chamikaramj for label java.
R: @meeral-k for label bigtable.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

+ "key\": ByteString\n"
+ "\"type\": String\n"
+ "\"value\": ByteString\n"
+ "\"column_qualifier\": ByteString\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type for qualifier is bytearray but here it's bytestring, is this accurate? Same for key and value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for that catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump on this ^

@derrickaw
Copy link
Collaborator

Run PythonDocker PreCommit 3.9

Comment on lines +679 to +687
if type(self) == type(other):
other_dict = other.__dict__
elif type(other) == type(NamedTuple):
other_dict = other._asdict()
else:
return False
return (
type(self) == type(other) and
len(self.__dict__) == len(other.__dict__) and all(
s == o
for s, o in zip(self.__dict__.items(), other.__dict__.items())))
len(self.__dict__) == len(other_dict) and
all(s == o for s, o in zip(self.__dict__.items(), other_dict.items())))
Copy link
Contributor

@ahmedabu98 ahmedabu98 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I think the problem described in #35790 is because this only handles top-level NamedTuples. Should be a straightforward fix but not a blocker for this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Derricks advice, I or derrick can work on it after the PR is merged

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we have BigtableReadSchemaTransformConfiguration implements Serializable, but besides that this LGTM

@derrickaw derrickaw mentioned this pull request Aug 6, 2025
17 tasks
@arnavarora2004
Copy link
Contributor Author

Run PythonDocker PreCommit 3.9

@ahmedabu98 ahmedabu98 merged commit 4114f7c into apache:master Aug 6, 2025
104 of 105 checks passed
parveensania pushed a commit to parveensania/beam-dp that referenced this pull request Aug 17, 2025
* Refactored BigTableReadSchemaTransformConfiguration

* changed scope, working on buffer class for making BigTable yaml fully connected and actually look good on user end for mutations

* Finished up a bit of standard_io.yaml

* Finished up a bit of standard_io.yaml

* Added bigTable test

* changed some tests for BigTable

* Added new IT file for simpleWrite and also made changes integration test debugging

* Added new IT file for simpleWrite and also made changes integration test debugging

* SetCell mutation test works, I want to see if this draft PR works CI test wise

* Fixed a slight error

* Added way more changes to integrations test.py, BigTableSimpleWriteSchemaTransformProviderIT, and testing out new mutations etc

* BigTableSimpleWriteSchemaTransformProviderIT finished changes to mutated new user input, all mutations work correctly, put demo code for it

* Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java

Co-authored-by: Derrick Williams <myutat@gmail.com>

* Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java

Co-authored-by: Derrick Williams <myutat@gmail.com>

* Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java

Co-authored-by: Derrick Williams <myutat@gmail.com>

* Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProvider.java

Co-authored-by: Derrick Williams <myutat@gmail.com>

* Update sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableSimpleWriteSchemaTransformProviderIT.java

Co-authored-by: Derrick Williams <myutat@gmail.com>

* changed comments

* Added changes from derrick comments

* Added default schema maybe fixes the issues

* Added schema to every test specificly, will run tests to see if it works

* Added default schema maybe fixes the issues

* Following formatting tests

* Following formatting tests

* Following checkstyle tests

* Made schema and test changes

* Made schema and test changes

* Made schema and test changes

* Made schema and test changes

* Made schema and test changes

* Added final test

* changed timestamp values

* added all mutations test

* added all mutations test

* pushed changes to format errors

* pushed changes to format errors

* Delete 4

* pushed changes to format errors

* pushed changes to format errors

* pushed changes to format errors

* pushed changes to debugging errors

* pushed changes to debugging errors

* to see internal error added print(will remove)

* to see internal error added print(will remove)

* to see internal error added print(will remove)

* import fixes

* import fixes

* import fixes

* import fixes

* import fixes

* import fixes

* pushed changes to debugging errors

* pushed changes to debugging errors

* pushed changes to debugging errors, added pulls from other beam

* made changes to allMutations test

* made changes to allMutations test

* pushed changes to debugging errors, added pulls from other beam

* pushed changes to debugging errors, added pulls from other beam

* pushed changes to debugging errors, added pulls from other beam

* pushed changes to debugging errors, added pulls from other beam

* pushed changes to debugging errors, added pulls from other beam

* new read errors fixed

* pushed changes to debugging errors, added pulls from other beam

* consolidated schema transform files, fixed small issues and bugs

* consolidated schema transform files, fixed small issues and bugs

* consolidated schema transform files, fixed small issues and bugs

* consolidated schema transform files, fixed small issues and bugs

* pushed changes to debugging errors, added pulls from other beam

* pushed changes from ahmed

* pushed changes from ahmed

* pushed changes from ahmed

* pushed changes from ahmed

* pushed changes from ahmed

* pushed changes from ahmed

* pushed changes from ahmed

* pushed changes from ahmed

* Following checkstyle tests

* Following checkstyle tests

* pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true)

* pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff

* pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff

* Update sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigtable/BigtableReadSchemaTransformProvider.java

Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com>

* pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff

* pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff

* pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff

* pushed new changes to BigTableRead, making it work with new functionality feature of allowing flatten (defaulted to true) and added a new test in IT and fixed formatting stuff

* new mongo files in branch

* fixed family_name to string

* fixed family_name to string

* fixed family_name to string

* fixed family_name to string

* fixed family_name to string

* fixed family_name to string

* fixed family_name to string

* fixed family_name to string

* fixed cmmit issues

* commented assert test, everything should work now

---------

Co-authored-by: Derrick Williams <myutat@gmail.com>
Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com>
@derrickaw
Copy link
Collaborator

derrickaw commented Sep 24, 2025

#28672
#33902

@derrickaw derrickaw mentioned this pull request Sep 24, 2025
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants