Skip to content

Conversation

@mutianf
Copy link
Contributor

@mutianf mutianf commented Nov 7, 2022

Migrate BigtableIO to use the java veneer client under the hood. In the future, we'll move to java-bigtable-hbase 2.x versions and deprecate bigtable-client-core.

The new code has the following structure:

  • BigatableConfig - defines the client connection level settings
  • BigtableReadOptions and BigtableWriteOptions - defines the tables to read / write and their timeout settings
  • deprecate BigtableOptions

BigtableIO.Read or BigtableIO.Write with the same configuration will share the same BigtableService. The service is cached in BigtableServiceFactory. Whenever we instantiating a new BigtableService, translate all the configurations including BigtableOptions to bigtable veneer settings.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@mutianf mutianf changed the title Migration Bigtable: migrate BigtableIO to use the veneer client under the hood Dec 1, 2022
@codecov
Copy link

codecov bot commented Dec 1, 2022

Codecov Report

Merging #24015 (ce231c4) into master (2e7584c) will increase coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #24015      +/-   ##
==========================================
+ Coverage   72.79%   72.81%   +0.02%     
==========================================
  Files         775      775              
  Lines      102840   102928      +88     
==========================================
+ Hits        74864    74949      +85     
- Misses      26522    26525       +3     
  Partials     1454     1454              
Flag Coverage Δ
python 81.96% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/textio.py 94.59% <0.00%> (-2.47%) ⬇️
sdks/python/apache_beam/coders/row_coder.py 94.26% <0.00%> (-0.70%) ⬇️
sdks/python/apache_beam/coders/coders.py 87.04% <0.00%> (-0.44%) ⬇️
sdks/python/apache_beam/transforms/combiners.py 93.05% <0.00%> (-0.39%) ⬇️
sdks/python/apache_beam/coders/coder_impl.py 93.53% <0.00%> (-0.24%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 94.22% <0.00%> (-0.12%) ⬇️
sdks/python/apache_beam/io/kafka.py 80.00% <0.00%> (ø)
sdks/python/apache_beam/io/fileio.py 96.12% <0.00%> (+0.01%) ⬆️
...on/apache_beam/runners/dataflow/dataflow_runner.py 81.88% <0.00%> (+0.14%) ⬆️
sdks/python/apache_beam/coders/slow_stream.py 94.87% <0.00%> (+0.18%) ⬆️
... and 5 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mutianf mutianf changed the title Bigtable: migrate BigtableIO to use the veneer client under the hood [Bigtable] Migrate BigtableIO to use the veneer client under the hood Dec 2, 2022
@mutianf mutianf marked this pull request as ready for review December 2, 2022 18:05
@github-actions
Copy link
Contributor

github-actions bot commented Dec 2, 2022

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @kennknowles for label java.
R: @Abacn for label build.
R: @Abacn for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@diegomez17 diegomez17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed BigtableServiceImpl and it's tests.

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @kennknowles @Abacn @Abacn

@Abacn
Copy link
Contributor

Abacn commented Dec 21, 2022

Thanks for the work! Looking into it.

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some initial comments. Not yet finished (now at BigtableHBaseVeneeringSettings)

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2023

Reminder, please take a look at this pr: @kennknowles @Abacn @Abacn

@Abacn
Copy link
Contributor

Abacn commented Jan 6, 2023

waiting on author

@mutianf
Copy link
Contributor Author

mutianf commented Feb 21, 2023

Hi @Abacn , this pr is ready for review, can you take a look? @igorbernstein2 from my team already took a look and thinks it looks good. Thanks!

@Abacn
Copy link
Contributor

Abacn commented Feb 21, 2023

Thanks @mutianf, will looking into it tomorrow

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me. Would you mind open a GitHub Issue for this PR for tracking, and adding an announcement in CHANGES.md ?https://github.com/apache/beam/blob/master/CHANGES.md?plain=1#L70

}

/** Tests that credentials are used from PipelineOptions if not supplied by BigtableOptions. */
@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering the purpose of removing these unit tests, is it because the scenario is no longer applicable for the new client, or are there any breaking change / regression could be introduced?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The way we get BigtableService is different now so these tests are no longer applicable. I moved the tests to BigtableConfigTranslator which will test the same behavior, PTAL, thanks! 97e96e8#diff-b53a99acd5a4cd30f4b76e8b94f7cd09df6f0fc196d949842d20bc845b48661c

Copy link
Contributor

@Abacn Abacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt fix. Reviewed tests. If the change is not trivial good to have other reviewes to take another look for main source change.

CredentialFactory credentialFactory = config.getCredentialFactory();
try {
// Skip resetting the credentials if it's connected to an emulator
if (!emulator) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the code path is made different intentionally for test and real use cases. Does this cause the if {} clause no longer covered by unit test where it was intended to be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still covered by testUsingPipelineOptionsCredential and testUsingCredentialsFromBigtableOptions in BigtableConfigTranslatorTest to make sure the credentials are updated correctly. The emulator test case is tested by the sql tests.

CHANGES.md Outdated
* Support for X source added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).
* Added in JmsIO a retry policy for failed publications (Java) ([#24971](https://github.com/apache/beam/issues/24971)).
* Support for `LZMA` compression/decompression of text files added to the Python SDK ([#25316](https://github.com/apache/beam/issues/25316))
* Update BigtableIO to use the idiomatic bigtable client under the hood (Java) ([25592](https://github.com/apache/beam/issues/25592))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this won't be in Beam 2.46.0. We can remove it here and insert to the proper line again as a follow up PR after this is in.

@mutianf
Copy link
Contributor Author

mutianf commented Feb 23, 2023

Thanks for the prompt fix. Reviewed tests. If the change is not trivial good to have other reviewes to take another look for main source change.

Sounds good, thanks @Abacn! I'll have Igor review it again after he's back from vacation next week.

for (SampleRowKeysResponse response : sampleRowKeys) {
ByteKey responseEndKey = makeByteKey(response.getRowKey());
long responseOffset = response.getOffsetBytes();
for (KeyOffset keyOffset : sampleRowKeys) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a followup PR, we should migrate the splitting logic to the utils in veneer (also we should fix the logic in veneer to use split points as inclusive endpoints...it currently uses the split points as inclusive start points)

}
}

@ProcessElement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This DoFn has a bunch of additional logic for checking for failures, in a future PR we should just use the error tracking veneer's batching logic

@Abacn
Copy link
Contributor

Abacn commented Mar 2, 2023

Run PostCommit_Java_DataflowV2

@Abacn
Copy link
Contributor

Abacn commented Mar 2, 2023

Run PostCommit_Java_Dataflow

@Abacn
Copy link
Contributor

Abacn commented Mar 2, 2023

Run Java PreCommit

@Abacn
Copy link
Contributor

Abacn commented Mar 2, 2023

postcommit tests passed. Two failing precommit suites due to known flakes. Merging for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants