[BEAM-3456] Enable jenkins and large scale scenario in JDBC #4392

lgajowy · 2018-01-11T16:16:13Z

Follow this checklist to help us incorporate your contribution quickly and easily:

Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
Each commit in the pull request should have a meaningful subject line and body.
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

This one allows JDBC to utilize the numberOfRecords pipeline option too, so that tests of different scale are possible.

Note that:

I changed numberOfRecord's type from Long to Integer because to reach even tens of GBs we are fine with it and we do not need to do unnecessary long -> int conversions.
I double-checked it now and it seems that JdbcIO is fine with large amounts of data. 5 000 000 db rows test works well.

@iemejia @chamikaramj could you take a look?

lgajowy · 2018-01-11T16:20:11Z

Run seed job

lgajowy · 2018-01-11T16:26:39Z

Run Java JdbcIO Performance Test

lgajowy · 2018-01-11T16:56:48Z

Run seed job

lgajowy · 2018-01-11T17:02:09Z

Run Java JdbcIO Performance Test

lgajowy · 2018-01-11T17:17:08Z

https://builds.apache.org/job/beam_PerformanceTests_JDBC/217/console

Jenkins failed. I'm having one of the issues with jenkins from HadoopInputFormatIOIT PR: #4332

I think it is the path to kube config in both cases. @jbonofre could you help with determining what should be the path to kubeconfig file? We don't know the jenkins setup.

jbonofre · 2018-01-11T18:08:31Z

Thanks ! Sure, gonna take a look.

chamikaramj · 2018-01-11T18:46:07Z

Can you try running in Jenkins executor 'beam1' ? It's possible that kubectl is only available in 'beam1'. See https://issues.apache.org/jira/browse/INFRA-14819. (I'm not sure if Apache infra acted on this after my last ping)

Following config shows how to restrict execution to a given set of Jenkins executors.
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PostCommit_Python_ValidatesRunner_Dataflow.groovy

lgajowy · 2018-01-12T12:25:52Z

Run seed job

lgajowy · 2018-01-12T12:39:07Z

Run Java JdbcIO Performance Test

lgajowy · 2018-01-12T13:20:12Z

Run Java JdbcIO Performance Test

chamikaramj · 2018-01-16T22:15:42Z

BTW are we assuming that there's an already running Kubernetes cluster in Jenkins with a proper kubeconfig ? This might not be the case.

Does it make a difference if we try to run the Jenkins test against a Kubernetes cluster running in GKE ?

cc: @jasonkuster

The kubernetes infrastructure that is needed for the Jenkins job to run is not available for now. We should add it once the infrastructure is there.

lgajowy · 2018-01-17T13:29:15Z

BTW are we assuming that there's an already running Kubernetes cluster in Jenkins with a proper kubeconfig ? This might not be the case.
Does it make a difference if we try to run the Jenkins test against a Kubernetes cluster running in GKE ?

That's right: we need kubectl and kubeconfig present on all jenkins executors given the previous JDBC test runs. Those indeed seems not to be there yet as @chamikaramj mentioned. Seems that we need to provide all this before we add the jenkins job. By "all this", I mean:

kubectl, installed on all executors, not only on 'beam1'
kubernetes cluster instance hosted on GKE
kubeconfig available on all executors, that allows a connection to the GKE cluster instance. IMO this file shouldn't be in beam's repository as it contains sensitive credentials
probably it would be convinient to have: KUBECONFIG environment variable on Jenkins that would point to the config destination

Given the kubernetes problems and other jenkins problems (the "permission denied" problem mentioned in JIRA 3480) I decided not to enable the jdbc job again yet (hence new commit, reverting the file change). Still the test can be run manually using io-it-suite/io-it-suite-local profiles on any kubernetes cluster, locally. This is still valueable. Let's add the jenkins job in a separate PR while kubernetes is setup properly on jenkins and jenkins itself works properly.

@jasonkuster Could you help with setting the proper kubernetes infrastructure?

chamikaramj · 2018-01-17T23:39:07Z

SGTM. I'll review the PR without Jenkins updates.

@alanmyrvold might be able to help with setting up a GKE-based Kubernetes cluster for I/O ITs.

chamikaramj · 2018-01-18T07:18:43Z

LGTM

) [BEAM-3456] Enable jenkins and large scale scenario in JDBC The kubernetes infrastructure that is needed for the Jenkins job to run is not available for now. We should add it once the infrastructure is there.

* apache#4378 - Field Path * review changes * 2nd review changes * 3rd review changes

[BEAM-3456] Enable jenkins and large scale scenario in JDBC

f811d7e

fixup! [BEAM-3456] Enable jenkins and large scale scenario in JDBC

205aee6

jbonofre self-requested a review January 11, 2018 17:59

[BEAM-3456] Allow only beam1 jenkins executor which has kubernetes on it

3296bb6

[BEAM-3456] Revert Jdbc Performance test job enabling

eaef1ed

The kubernetes infrastructure that is needed for the Jenkins job to run is not available for now. We should add it once the infrastructure is there.

lgajowy mentioned this pull request Jan 17, 2018

[BEAM-3217] add HadoopInputFormatIO integration test using DBInputFormat #4332

Merged

6 tasks

chamikaramj merged commit 2f235dd into apache:master Jan 18, 2018

lgajowy mentioned this pull request Feb 20, 2018

[BEAM-3456] Re-enable JDBC performance test #4714

Merged

10 tasks

lgajowy deleted the jdbc-large-scale branch March 14, 2018 11:38

pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023

Field path class (apache#4392)

93bb088

* apache#4378 - Field Path * review changes * 2nd review changes * 3rd review changes

[BEAM-3456] Enable jenkins and large scale scenario in JDBC #4392

[BEAM-3456] Enable jenkins and large scale scenario in JDBC #4392

Uh oh!

Conversation

lgajowy commented Jan 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgajowy commented Jan 11, 2018

Uh oh!

lgajowy commented Jan 11, 2018

Uh oh!

lgajowy commented Jan 11, 2018

Uh oh!

lgajowy commented Jan 11, 2018

Uh oh!

lgajowy commented Jan 11, 2018

Uh oh!

jbonofre commented Jan 11, 2018

Uh oh!

chamikaramj commented Jan 11, 2018

Uh oh!

lgajowy commented Jan 12, 2018

Uh oh!

lgajowy commented Jan 12, 2018

Uh oh!

lgajowy commented Jan 12, 2018

Uh oh!

chamikaramj commented Jan 16, 2018

Uh oh!

lgajowy commented Jan 17, 2018

Uh oh!

chamikaramj commented Jan 17, 2018

Uh oh!

chamikaramj commented Jan 18, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lgajowy commented Jan 11, 2018 •

edited

Loading