[BEAM-7502] Create ParDo Python Load Test Jenkins job #9042

kamilwu · 2019-07-11T13:06:10Z

Based on the following proposal: https://s.apache.org/load-test-basic-operations

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Gearpump	Samza
Go	---	---	---	---
Java
Python	---		---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

kamilwu · 2019-07-12T07:08:18Z

Run Seed Job

kamilwu · 2019-07-12T07:17:57Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu · 2019-07-12T09:24:35Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu · 2019-07-12T11:17:15Z

Run Seed Job

kamilwu · 2019-07-12T11:52:39Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu · 2019-07-12T14:17:13Z

Run Seed Job

kamilwu · 2019-07-12T14:26:05Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu · 2019-07-12T15:29:09Z

@lgajowy Please take a look.
I had to increase the number of workers from 5 (as in the proposal) to 10, because the job had been running for longer than 2 hours. As a result, it was always aborted.

kamilwu · 2019-07-15T10:05:49Z

Run Seed Job

lgajowy

Added comments. Thanks!

lgajowy · 2019-07-15T09:57:35Z

.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy

+                jobProperties: [
+                        job_name             : 'load-tests-python-dataflow-batch-pardo-1-' + now,
+                        project              : 'apache-beam-testing',
+                        temp_location        : 'gs://temp-storage-for-perf-tests/smoketests',


Is this the correct temp location?

It's correct, but 'gs://temp-storage-for-perf-tests/loadtests' seems to be used more often in load tests.

Could you explain why is this a problem? Did you experience any issues when using /loadtests?

I think you misunderstood. It's not a problem. /smoketests was there accidentally, and I've already replaced it to /loadtests, which suits better.

Right, we're seeing stale code in the thread. Thanks! :)

lgajowy · 2019-07-15T10:02:18Z

.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy

+                                '"value_size": 90}\'',
+                        iterations           : 10,
+                        number_of_counter_operations: 0,
+                        number_of_counters   : 1,


Why is there 1 counter? According to the proposal the goal of this test is to check what is the inter operation overhead (not metrics). So there should be no counters. Counters are needed in tests #3 and #4 where we examine metrics overhead

The number_of_counter_operations is zero, so there is actually no metrics overhead.
But I guess this 1 counter can be misleading, so I'll change it.

lgajowy · 2019-07-15T10:04:20Z

.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy

+                        iterations           : 10,
+                        number_of_counter_operations: 0,
+                        number_of_counters   : 1,
+                        num_workers          : 10,


Why are we using 10 workers here?

I've already changed it to 5.

.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy

lgajowy · 2019-07-15T10:15:13Z

.test-infra/jenkins/job_LoadTests_ParDo_Python.groovy

+        ],
+]}
+
+def loadTestConfigurationManyCounters = { datasetName -> [


I don't think splitting the job to 2 jobs is the solution here. It is very weird that it takes so much time (above 2 hours) to run, whereas java needs 28 minutes to run all the tests (as we discussed offline). Could you investigate this a little bit more? Maybe the pipeline shape is not as we think it is?

I have double-checked the code and it seems fine. The problem is poor performance of metrics operations in Python — test case with 10 iterations and 100 counter operations needed almost 3 hours to complete.

The solution is to lower the number of iterations to 1, which makes the test way faster. Job split will be also unnecessary.

kamilwu · 2019-07-15T10:31:23Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu · 2019-07-15T12:09:29Z

Run Seed Job

kamilwu · 2019-07-15T13:06:53Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu · 2019-07-15T14:03:30Z

Run Seed Job

kamilwu · 2019-07-15T14:15:29Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu · 2019-07-18T09:19:27Z

Run Seed Job

kamilwu · 2019-07-18T09:28:27Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu · 2019-07-22T08:09:11Z

Run Seed Job

kamilwu · 2019-07-22T08:38:03Z

Run Python Load Tests ParDo Dataflow Batch

lgajowy

Code LGTM. Please reorganize commit history and we're good to go. Remeber about adding "[BEAM-7502]" to each commit title.

Thanks!

.test-infra/jenkins/job_LoadTests_ParDo_Java.groovy

To keep it compliant with other files in the same directory.

kamilwu · 2019-07-22T12:30:28Z

@lgajowy It's done, commits are ready

kamilwu · 2019-07-23T07:36:07Z

Run Seed Job

kamilwu · 2019-07-23T07:45:27Z

Run Python Load Tests ParDo Dataflow Batch

kamilwu force-pushed the pardo-jenkins branch from 102e9b7 to 9629126 Compare July 12, 2019 14:16

lgajowy requested changes Jul 15, 2019

View reviewed changes

kamilwu force-pushed the pardo-jenkins branch from 75ee6bf to 29f383a Compare July 15, 2019 12:08

kamilwu force-pushed the pardo-jenkins branch from 29f383a to eaa5a00 Compare July 15, 2019 13:54

kamilwu force-pushed the pardo-jenkins branch 3 times, most recently from 74cdcdd to d58fbee Compare July 18, 2019 09:18

kamilwu force-pushed the pardo-jenkins branch from d58fbee to 0ddbe24 Compare July 22, 2019 08:08

lgajowy requested changes Jul 22, 2019

View reviewed changes

.test-infra/jenkins/job_LoadTests_ParDo_Java.groovy Show resolved Hide resolved

kamilwu added 3 commits July 22, 2019 14:24

[BEAM-7502] Create ParDo Python Load Test Jenkins job

c604768

[BEAM-7502] Renamed file with Python GBK Load Test job definition

69e3b3e

To keep it compliant with other files in the same directory.

[BEAM-7502] Reduced number of iterations to 1 in Java ParDo job

5d473b5

kamilwu force-pushed the pardo-jenkins branch from 0ddbe24 to 5d473b5 Compare July 22, 2019 12:28

lgajowy merged commit 1640133 into apache:master Jul 23, 2019

kamilwu deleted the pardo-jenkins branch July 23, 2019 09:54

[BEAM-7502] Create ParDo Python Load Test Jenkins job #9042

[BEAM-7502] Create ParDo Python Load Test Jenkins job #9042

Uh oh!

Conversation

kamilwu commented Jul 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

Uh oh!

kamilwu commented Jul 12, 2019

Uh oh!

kamilwu commented Jul 12, 2019

Uh oh!

kamilwu commented Jul 12, 2019

Uh oh!

kamilwu commented Jul 12, 2019

Uh oh!

kamilwu commented Jul 12, 2019

Uh oh!

kamilwu commented Jul 12, 2019

Uh oh!

kamilwu commented Jul 12, 2019

Uh oh!

kamilwu commented Jul 12, 2019

Uh oh!

kamilwu commented Jul 15, 2019

Uh oh!

lgajowy left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamilwu Jul 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kamilwu commented Jul 15, 2019

Uh oh!

kamilwu commented Jul 15, 2019

Uh oh!

kamilwu commented Jul 15, 2019

Uh oh!

kamilwu commented Jul 15, 2019

Uh oh!

kamilwu commented Jul 15, 2019

Uh oh!

kamilwu commented Jul 18, 2019

Uh oh!

kamilwu commented Jul 18, 2019

Uh oh!

kamilwu commented Jul 22, 2019

Uh oh!

kamilwu commented Jul 22, 2019

Uh oh!

lgajowy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kamilwu commented Jul 22, 2019

Uh oh!

kamilwu commented Jul 23, 2019

kamilwu commented Jul 11, 2019 •

edited

Loading

kamilwu Jul 16, 2019 •

edited

Loading