Skip to content

Conversation

@lgajowy
Copy link
Contributor

@lgajowy lgajowy commented Jan 11, 2018

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

This one allows JDBC to utilize the numberOfRecords pipeline option too, so that tests of different scale are possible.

Note that:

  • I changed numberOfRecord's type from Long to Integer because to reach even tens of GBs we are fine with it and we do not need to do unnecessary long -> int conversions.
  • I double-checked it now and it seems that JdbcIO is fine with large amounts of data. 5 000 000 db rows test works well.

@iemejia @chamikaramj could you take a look?

@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 11, 2018

Run seed job

@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 11, 2018

Run Java JdbcIO Performance Test

@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 11, 2018

Run seed job

@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 11, 2018

Run Java JdbcIO Performance Test

@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 11, 2018

https://builds.apache.org/job/beam_PerformanceTests_JDBC/217/console

Jenkins failed. I'm having one of the issues with jenkins from HadoopInputFormatIOIT PR: #4332

I think it is the path to kube config in both cases. @jbonofre could you help with determining what should be the path to kubeconfig file? We don't know the jenkins setup.

@jbonofre jbonofre self-requested a review January 11, 2018 17:59
@jbonofre
Copy link
Member

Thanks ! Sure, gonna take a look.

@chamikaramj
Copy link
Contributor

Can you try running in Jenkins executor 'beam1' ? It's possible that kubectl is only available in 'beam1'. See https://issues.apache.org/jira/browse/INFRA-14819. (I'm not sure if Apache infra acted on this after my last ping)

Following config shows how to restrict execution to a given set of Jenkins executors.
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_beam_PostCommit_Python_ValidatesRunner_Dataflow.groovy

@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 12, 2018

Run seed job

@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 12, 2018

Run Java JdbcIO Performance Test

1 similar comment
@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 12, 2018

Run Java JdbcIO Performance Test

@chamikaramj
Copy link
Contributor

BTW are we assuming that there's an already running Kubernetes cluster in Jenkins with a proper kubeconfig ? This might not be the case.

Does it make a difference if we try to run the Jenkins test against a Kubernetes cluster running in GKE ?

cc: @jasonkuster

The kubernetes infrastructure that is needed for the
Jenkins job to run is not available for now.
We should add it once the infrastructure is there.
@lgajowy
Copy link
Contributor Author

lgajowy commented Jan 17, 2018

BTW are we assuming that there's an already running Kubernetes cluster in Jenkins with a proper kubeconfig ? This might not be the case.
Does it make a difference if we try to run the Jenkins test against a Kubernetes cluster running in GKE ?

That's right: we need kubectl and kubeconfig present on all jenkins executors given the previous JDBC test runs. Those indeed seems not to be there yet as @chamikaramj mentioned. Seems that we need to provide all this before we add the jenkins job. By "all this", I mean:

  • kubectl, installed on all executors, not only on 'beam1'
  • kubernetes cluster instance hosted on GKE
  • kubeconfig available on all executors, that allows a connection to the GKE cluster instance. IMO this file shouldn't be in beam's repository as it contains sensitive credentials
  • probably it would be convinient to have: KUBECONFIG environment variable on Jenkins that would point to the config destination

Given the kubernetes problems and other jenkins problems (the "permission denied" problem mentioned in JIRA 3480) I decided not to enable the jdbc job again yet (hence new commit, reverting the file change). Still the test can be run manually using io-it-suite/io-it-suite-local profiles on any kubernetes cluster, locally. This is still valueable. Let's add the jenkins job in a separate PR while kubernetes is setup properly on jenkins and jenkins itself works properly.

@jasonkuster Could you help with setting the proper kubernetes infrastructure?

@chamikaramj
Copy link
Contributor

SGTM. I'll review the PR without Jenkins updates.

@alanmyrvold might be able to help with setting up a GKE-based Kubernetes cluster for I/O ITs.

@chamikaramj
Copy link
Contributor

LGTM

@chamikaramj chamikaramj merged commit 2f235dd into apache:master Jan 18, 2018
boyuanzz pushed a commit to boyuanzz/beam that referenced this pull request Feb 6, 2018
)

[BEAM-3456] Enable jenkins and large scale scenario in JDBC

The kubernetes infrastructure that is needed for the
Jenkins job to run is not available for now.
We should add it once the infrastructure is there.
@lgajowy lgajowy deleted the jdbc-large-scale branch March 14, 2018 11:38
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
* apache#4378 - Field Path

* review changes

* 2nd review changes

* 3rd review changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants