Skip to content

Conversation

@mosche
Copy link
Member

@mosche mosche commented Mar 29, 2022

(PR just for reference, not meant to be merged)

JMH benchmarks on reading field values from RowWithGetters. This is meant to establish a baseline for the current code on master.

Here's a visualization of the results for master comparing two benchmark runs reading field values once & three times with

  • caching disabled.
  • caching enabled (using lazy initialisation of the cache data structure).

NOTE:

  • The score doesn't reflect read access only, measurement includes iterating over a large number of rows. What matters are the relative changes.
  • You cannot easily compare scores between different benchmarks as rows contain different number of fields. Also, depending on types, fields are read recursively to measure the impact of lazy vs eager data structures.
  • Any cache initialization is done lazily on first read (if caching is enabled) to include associated costs in the measurement.

Each benchmark method invocation processes a bundle of 50k rows and
recursively reads all values per row. The rows are created upfront using JMH State to
exclude any initialization costs from the measurement. To prevent unintended cache hits in RowWithGetters a new bundle of rows must be generated before every invocation.

Using state setup per Level#Invocation has significant drawbacks by itself! Though,
given that reading bundles of 50k rows takes well above 1 ms, each
individual invocation can be adequately timestamped without risking generating wrong results.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@codecov
Copy link

codecov bot commented Mar 29, 2022

Codecov Report

Merging #17203 (3b5e1d6) into master (14862cc) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17203      +/-   ##
==========================================
- Coverage   73.96%   73.96%   -0.01%     
==========================================
  Files         672      672              
  Lines       88259    88269      +10     
==========================================
+ Hits        65283    65284       +1     
- Misses      21863    21872       +9     
  Partials     1113     1113              
Flag Coverage Δ
python 83.63% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/io/source_test_utils.py 88.01% <0.00%> (-1.39%) ⬇️
sdks/python/apache_beam/internal/gcp/auth.py 81.03% <0.00%> (-0.45%) ⬇️
...hon/apache_beam/runners/worker/bundle_processor.py 93.39% <0.00%> (-0.25%) ⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py 88.90% <0.00%> (-0.16%) ⬇️
setup.py 0.00% <0.00%> (ø)
sdks/python/apache_beam/transforms/util.py 95.96% <0.00%> (ø)
...on/apache_beam/runners/dataflow/dataflow_runner.py 82.36% <0.00%> (ø)
...am/testing/benchmarks/chicago_taxi/trainer/task.py 0.00% <0.00%> (ø)
...m/testing/benchmarks/chicago_taxi/trainer/model.py 0.00% <0.00%> (ø)
sdks/python/apache_beam/typehints/decorators.py 92.73% <0.00%> (+0.06%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14862cc...3b5e1d6. Read the comment docs.

@TheNeuralBit
Copy link
Member

It could make sense to run these benchmarks continuously and upload metrics to s.apache.org/beam-community-metrics

@TheNeuralBit
Copy link
Member

CC @apilloud

@lukecwik
Copy link
Member

It would be great if we could do t

It could make sense to run these benchmarks continuously and upload metrics to s.apache.org/beam-community-metrics

It would be great if we did this for all JMH benchmarks.

import static org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils.random;
import static org.apache.beam.sdk.values.RowWithGettersBenchmark.MapOfPrimitiveBundle.primitiveMapPojo;
import static org.apache.beam.sdk.values.RowWithGettersBenchmark.SimplePojoBundle.simplePojo;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create a separate JMH subproject that is under sdks/java/core/jmh

@mosche
Copy link
Member Author

mosche commented May 12, 2022

Ok, I'll have a look at that @TheNeuralBit and @lukecwik 👍 Could you point me to any existing build task that would do something similar?

@lukecwik
Copy link
Member

See https://github.com/apache/beam/blob/master/sdks/java/harness/jmh/build.gradle

Key part is enableJmh in applyJavaNature. The rest of the setup is like a normal gradle sub-project by declaring your dependencies.

@mosche
Copy link
Member Author

mosche commented Jul 7, 2022

Closing this, follow up is here #22182

@mosche mosche deleted the RowWithGetters-JMH-master branch July 7, 2022 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants