Skip to content

Conversation

@fsaintjacques
Copy link
Contributor

@fsaintjacques fsaintjacques commented May 3, 2019

  • Implements the archery benchmark run sub-command that performs a single run without comparing to another run. This is the main basic blocks to run a benchmark and publish the result to a database, e.g. codespeed and/or custom db. Push is not provided in this PR.
  • Implements serializing/de-serializing the result of a run in JSON.
  • Improved the from_rev_or_path method to also support the previous point json output. Thus, you can compare a mixture of commit(s)/build(s)/run's output(s). Effectively adding support for comparing on cached results of a run.
  • Implements redirecting the output to a named file via --output, - means stdout (the defaults).
  • Improved the GoogleBenchmarkRunner to output partial results to stdout for progress feedback.

The C++ benchmark would not provide feedback due to capture stdout JSON
result. Refactored this such that benchmarks outputs to both stdout in
text format and json in a named filed.
Supports writing the result into a file. This will allow more stable
composition.
Allows running benchmark suites for a specific build.
@fsaintjacques
Copy link
Contributor Author

@pitrou this adds your requested feature to compare against offline data.

- Implement json serialization for Benchmark and BenchmarkSuite
@fsaintjacques fsaintjacques force-pushed the ARROW-5071-benchmark-run branch from 29034c6 to 1ca0aa9 Compare May 3, 2019 19:36
@codecov-io
Copy link

Codecov Report

Merging #4249 into master will decrease coverage by 24.57%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #4249       +/-   ##
===========================================
- Coverage   88.19%   63.62%   -24.58%     
===========================================
  Files         660      375      -285     
  Lines       72992    53019    -19973     
  Branches     1251        0     -1251     
===========================================
- Hits        64378    33735    -30643     
- Misses       8501    19284    +10783     
+ Partials      113        0      -113
Impacted Files Coverage Δ
cpp/src/arrow/util/memory.h 0% <0%> (-100%) ⬇️
cpp/src/parquet/hasher.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/extension_type.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/util/sse-util.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/flight/server.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/json/chunker.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/util/uri.cc 0% <0%> (-100%) ⬇️
cpp/src/arrow/flight/client_auth.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/compute/logical_type.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/compute/kernels/filter.cc 0% <0%> (-100%) ⬇️
... and 695 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6c626c8...1ca0aa9. Read the comment docs.

@nealrichardson
Copy link
Member

Should the JSON benchmark output format be documented? Any special instructions for those writing or modifying benchmarks (naming, how to maintain continuity with historical data and when to break it, etc.) so that they work correctly in this system?

@fsaintjacques
Copy link
Contributor Author

I don't think it's worth documenting yet, as the goal is to simply pass the serialized state from one archery sub-command to another.

Adding documentation on writing/modifying benchmarks is definitively worth it. It reminds me that we need to add version to benchmark metadata once we want to track this in a (persisted) database.

@nealrichardson
Copy link
Member

Ok.

How necessary is version in the benchmark metadata itself? Isn't version knowable based on timestamp?

@fsaintjacques
Copy link
Contributor Author

I'd say git commit is a better proxy for version. The problem with having only the proxy (say commit or timestamp), is that you can't decided whether 2 samples are comparable.

By version, I didn't mean an actual human/semver version, but a key to differentiate if 2 named benchmarks are comparable. Usually, the shasum of the function body (file in our case because it's the easiest proxy) will dictate if the benchmarks are comparable.

Note that this is low-medium priority as benchmark body don't change often, and you can always work around this by renaming the benchmark (making the old version not comparable with the new name) and making this an official policy.

@kszucs kszucs self-requested a review May 7, 2019 18:18
Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @fsaintjacques!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants