Continuous benchmark tests as part of the CI#1174
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1174 +/- ##
=======================================
Coverage 93.29% 93.29%
=======================================
Files 174 174
Lines 6404 6404
=======================================
Hits 5974 5974
Misses 430 430 |
| github-token: ${{ secrets.GITHUB_TOKEN }} | ||
| auto-push: true | ||
| # Show alert with commit comment on detecting possible performance regression | ||
| alert-threshold: '200%' |
There was a problem hiding this comment.
This is the threshold of comparison with the previous result from the main branch? Trying to understand the flow here.
There was a problem hiding this comment.
yes there will be an alert similar to this, when there is an slowdown bigger than 200%.
Smaller threshold could be an issue, as the machines used for CI are not always the same.
There was a problem hiding this comment.
Ok, will fail-on-alert block PR merge if the threshold is higher? If yes, is it possible to let CI job fail, but allow merge on case to case basis?
There was a problem hiding this comment.
Ok, will
fail-on-alertblock PR merge if the threshold is higher? If yes, is it possible to let CI job fail, but allow merge on case to case basis?
No the merge will go through, only the job will fail with some alert as a comment on the commit.
| # collect benchmark results into one array | ||
| components=(api sdk exporters) | ||
| pushd $BENCHMARK_DIR | ||
| components=(api sdk exporters) |
| #include "opentelemetry/exporters/otlp/otlp_recordable.h" | ||
|
|
||
| #include <benchmark/benchmark.h> | ||
| #include "opentelemetry/exporters/otlp/otlp_grpc_exporter.h" |
| trace::Provider::SetTracerProvider(provider); | ||
| } | ||
|
|
||
| void BM_otlp_grpc_with_collector(benchmark::State &state) |
There was a problem hiding this comment.
Do we need to measure benchmark with the actual collector, or it would be sufficient to have results using faking the service stub, as we are more interested in the stats resulting from the otel-cpp code?
There was a problem hiding this comment.
I think it's a good-to-have number for the users.
Can we have both with and without the actual collector?
There was a problem hiding this comment.
I would ideally like to keep the stats with an actual collector. We have lately seen CI failures because of transient network timeout issues, I am just concerned if testing with real collector instance shouldn't add to that. Also, whether adding docker instances to the VM consume more resources and slows the CI jobs. And we are spawning another docker instance for jq parsing. We can keep them if you don't see any such slowness in CI with multiple iterations.
There was a problem hiding this comment.
The test with the collector was pretty stable in my test CI. This can't be guaranteed to be the case always though. The job will be executed only when we merge a commit to the main branch, so we will see the issues if any only after merge to main.
This can't be part of checks for RPs as it will be noisy.
Shall we keep it as it is, in case we got failures, I can raise a PR to use mock.
There was a problem hiding this comment.
Shall we keep it as it is, in case we got failures, I can raise a PR to use mock.
Should be fine for me. Let's wait for suggestions from @ThomsonTan too.
There was a problem hiding this comment.
Agree with @lalitb , I prefer to use mock to avoid that the result could be affected by environment workload and network traffic, but we could do this in later PRs.
| do | ||
| out=$component-benchmark_result.json | ||
| find ./$component -type f -name "*_result.json" -exec cat {} \; > $component_tmp_bench.json | ||
| cat $component_tmp_bench.json | docker run -i --rm itchyny/gojq:0.12.6 -s \ |
There was a problem hiding this comment.
nit - one minor comment, if it's not a major rework, can we use jq instead of gojq, it's more lightweight without any external dependencies.
There was a problem hiding this comment.
I've tried jq, it was showing different behavior on my local (Ubuntu 20.4) and on the GHA vm (Ubuntu latest). Collecting the benchmark results of all the tests into one array didn't work on the jq of GHA vm. So I switched to gojq which has a docker image.
Fixes #1170 (issue)
Changes
Runs benchmark tests at every push to the
mainbranch and pushes the results togh-pagesbranch. The results are grouped into api, sdk and exporters.An example of working graphs can be seen here.
For significant contributions please make sure you have completed the following items:
CHANGELOG.mdupdated for non-trivial changes