-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-7502] Create ParDo Python Load Test Jenkins job #9042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
1 similar comment
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
|
@lgajowy Please take a look. |
|
Run Seed Job |
lgajowy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments. Thanks!
| jobProperties: [ | ||
| job_name : 'load-tests-python-dataflow-batch-pardo-1-' + now, | ||
| project : 'apache-beam-testing', | ||
| temp_location : 'gs://temp-storage-for-perf-tests/smoketests', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct temp location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's correct, but 'gs://temp-storage-for-perf-tests/loadtests' seems to be used more often in load tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why is this a problem? Did you experience any issues when using /loadtests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you misunderstood. It's not a problem. /smoketests was there accidentally, and I've already replaced it to /loadtests, which suits better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we're seeing stale code in the thread. Thanks! :)
| '"value_size": 90}\'', | ||
| iterations : 10, | ||
| number_of_counter_operations: 0, | ||
| number_of_counters : 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number_of_counter_operations is zero, so there is actually no metrics overhead.
But I guess this 1 counter can be misleading, so I'll change it.
| iterations : 10, | ||
| number_of_counter_operations: 0, | ||
| number_of_counters : 1, | ||
| num_workers : 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using 10 workers here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've already changed it to 5.
| ], | ||
| ]} | ||
|
|
||
| def loadTestConfigurationManyCounters = { datasetName -> [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think splitting the job to 2 jobs is the solution here. It is very weird that it takes so much time (above 2 hours) to run, whereas java needs 28 minutes to run all the tests (as we discussed offline). Could you investigate this a little bit more? Maybe the pipeline shape is not as we think it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have double-checked the code and it seems fine. The problem is poor performance of metrics operations in Python — test case with 10 iterations and 100 counter operations needed almost 3 hours to complete.
The solution is to lower the number of iterations to 1, which makes the test way faster. Job split will be also unnecessary.
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
74cdcdd to
d58fbee
Compare
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
lgajowy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. Please reorganize commit history and we're good to go. Remeber about adding "[BEAM-7502]" to each commit title.
Thanks!
To keep it compliant with other files in the same directory.
|
@lgajowy It's done, commits are ready |
|
Run Seed Job |
|
Run Python Load Tests ParDo Dataflow Batch |
Based on the following proposal: https://s.apache.org/load-test-basic-operations
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.