Skip to content

Conversation

@baeminbo
Copy link
Contributor

@baeminbo baeminbo commented Jun 21, 2021

The current SpannerIO creates transaction at the beginning of pipeline run. This causes SpannerIO error by session not found if the pipeline runs long with ReadAll which gets ReadOperation as input .

This change introduces a delayed transaction creation. It makes a pipeline able to create Spanner transaction when it is needed (e.g. just before ReadAll).

Take a look at the code below with this change applied. It will creates a transaction session after readOperations is finished. FYI, the code will fail to compile with the current SpannerIO because CreateTransaction only accepts PBegin at the moement.

SpannerConfig config = ...

PCollection<ReadOperation> readOperations = ...

PCollectionView<Transaction> transaction = readOperations.apply(SpannerIO.createTransaction());

PCollection<Struct> users = readOperations
  .apply(SpannerIO.readAll().withSpannerConfig(config).withTransaction(transaction))

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@baeminbo baeminbo changed the title [BEAM-12504] Introduce WaitConnectionTransaction in SpannerIO [BEAM-12504] Introduce WaitConnectTransaction in SpannerIO Jun 21, 2021
@baeminbo
Copy link
Contributor Author

@pabloem Can you take a look at this PR?

@pabloem
Copy link
Member

pabloem commented Jun 30, 2021

sorry about the delay. Looking...

@aaltay
Copy link
Member

aaltay commented Jul 9, 2021

@baeminbo - Pablo will be out of office until August. I do not know the level of urgency on this PR. If it can wait, that is fine. Otherwise, do you who else might review this?

/cc @chamikaramj

@baeminbo
Copy link
Contributor Author

baeminbo commented Jul 9, 2021

Hi, this is not urgent. Thanks!

@aaltay aaltay requested a review from pabloem July 23, 2021 03:39
@pabloem
Copy link
Member

pabloem commented Aug 2, 2021

I'm back from vacation! I'll take a look now

@pabloem
Copy link
Member

pabloem commented Aug 12, 2021

@baeminbo doesn't it make sense to use WaitCreateTransaction in all cases? So maybe just update CreateTransaction to always wait?

The current SpannerIO creates transaction at the beginning of pipeline run. This causes SpannerIO error by session not found if the pipeline runs long with `ReadAll` which gets `ReadOperation` as input .

This change introduces a delayed transaction creation. It makes a pipeline able to create Spanner transaction when it is needed (e.g. just before ReadAll).

```
SpannerConfig config = ...

PCollection<ReadOperation> readOperations = ...

// Transaction will be created after readOperations is ready. So, it can avoid session expiration error.
PCollectionView<Transaction> transaction = readOperations.apply(SpannerIO.createTransaction());

PCollection<Struct> users = readOperations
  .apply(SpannerIO.readAll().withSpannerConfig(config).withTransaction(transaction))
```
@baeminbo baeminbo changed the title [BEAM-12504] Introduce WaitConnectTransaction in SpannerIO [BEAM-12504] Make SpannerIO.ConnectTransaction wait on input signal Aug 15, 2021
@baeminbo
Copy link
Contributor Author

Hi @pabloem, I changed the code to CreateTransaction can accept PCollection so that it will wait until the input PCollection is closed. Accordingly, I updated the PR title and summary.

@pabloem
Copy link
Member

pabloem commented Aug 16, 2021

Run Java PostCommit

@pabloem pabloem merged commit 6774062 into apache:master Aug 16, 2021
@pabloem
Copy link
Member

pabloem commented Aug 16, 2021

LGTM thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants