JMH benchmarks on reading field values from RowWithGetters. #17203

mosche · 2022-03-29T08:09:35Z

(PR just for reference, not meant to be merged)

JMH benchmarks on reading field values from RowWithGetters. This is meant to establish a baseline for the current code on master.

Here's a visualization of the results for master comparing two benchmark runs reading field values once & three times with

caching disabled.
caching enabled (using lazy initialisation of the cache data structure).

NOTE:

The score doesn't reflect read access only, measurement includes iterating over a large number of rows. What matters are the relative changes.
You cannot easily compare scores between different benchmarks as rows contain different number of fields. Also, depending on types, fields are read recursively to measure the impact of lazy vs eager data structures.
Any cache initialization is done lazily on first read (if caching is enabled) to include associated costs in the measurement.

Each benchmark method invocation processes a bundle of 50k rows and
recursively reads all values per row. The rows are created upfront using JMH State to
exclude any initialization costs from the measurement. To prevent unintended cache hits in RowWithGetters a new bundle of rows must be generated before every invocation.

Using state setup per Level#Invocation has significant drawbacks by itself! Though,
given that reading bundles of 50k rows takes well above 1 ms, each
individual invocation can be adequately timestamped without risking generating wrong results.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Choose reviewer(s) and mention them in a comment (R: @username).
Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI.

codecov · 2022-03-29T08:33:06Z

Codecov Report

Merging #17203 (3b5e1d6) into master (14862cc) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17203      +/-   ##
==========================================
- Coverage   73.96%   73.96%   -0.01%     
==========================================
  Files         672      672              
  Lines       88259    88269      +10     
==========================================
+ Hits        65283    65284       +1     
- Misses      21863    21872       +9     
  Partials     1113     1113

Flag	Coverage Δ
python	`83.63% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
sdks/python/apache_beam/io/source_test_utils.py	`88.01% <0.00%> (-1.39%)`	⬇️
sdks/python/apache_beam/internal/gcp/auth.py	`81.03% <0.00%> (-0.45%)`	⬇️
...hon/apache_beam/runners/worker/bundle_processor.py	`93.39% <0.00%> (-0.25%)`	⬇️
...ks/python/apache_beam/runners/worker/sdk_worker.py	`88.90% <0.00%> (-0.16%)`	⬇️
setup.py	`0.00% <0.00%> (ø)`
sdks/python/apache_beam/transforms/util.py	`95.96% <0.00%> (ø)`
...on/apache_beam/runners/dataflow/dataflow_runner.py	`82.36% <0.00%> (ø)`
...am/testing/benchmarks/chicago_taxi/trainer/task.py	`0.00% <0.00%> (ø)`
...m/testing/benchmarks/chicago_taxi/trainer/model.py	`0.00% <0.00%> (ø)`
sdks/python/apache_beam/typehints/decorators.py	`92.73% <0.00%> (+0.06%)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14862cc...3b5e1d6. Read the comment docs.

TheNeuralBit · 2022-05-11T21:52:48Z

It could make sense to run these benchmarks continuously and upload metrics to s.apache.org/beam-community-metrics

TheNeuralBit · 2022-05-11T21:54:46Z

CC @apilloud

lukecwik · 2022-05-11T21:55:05Z

It would be great if we could do t

It could make sense to run these benchmarks continuously and upload metrics to s.apache.org/beam-community-metrics

It would be great if we did this for all JMH benchmarks.

lukecwik · 2022-05-11T21:57:12Z

sdks/java/harness/jmh/src/main/java/org/apache/beam/sdk/values/RowWithGettersBenchmark.java

+import static org.apache.beam.repackaged.core.org.apache.commons.lang3.RandomStringUtils.random;
+import static org.apache.beam.sdk.values.RowWithGettersBenchmark.MapOfPrimitiveBundle.primitiveMapPojo;
+import static org.apache.beam.sdk.values.RowWithGettersBenchmark.SimplePojoBundle.simplePojo;
+


Please create a separate JMH subproject that is under sdks/java/core/jmh

mosche · 2022-05-12T12:32:35Z

Ok, I'll have a look at that @TheNeuralBit and @lukecwik 👍 Could you point me to any existing build task that would do something similar?

lukecwik · 2022-05-16T19:18:08Z

See https://github.com/apache/beam/blob/master/sdks/java/harness/jmh/build.gradle

Key part is enableJmh in applyJavaNature. The rest of the setup is like a normal gradle sub-project by declaring your dependencies.

mosche · 2022-07-07T14:50:49Z

Closing this, follow up is here #22182

JMH benchmarks on reading field values from RowWithGetters.

3b5e1d6

github-actions bot added build java labels Mar 29, 2022

mosche mentioned this pull request Mar 29, 2022

[BEAM-14166] Performance improvements for RowWithGetter #17172

Merged

4 tasks

lukecwik reviewed May 11, 2022

View reviewed changes

mosche mentioned this pull request Jul 7, 2022

JMH module for sdks:java:core with benchmarks for GetterBasedSchemaProvider (resolves #22181) #22182

Merged

4 tasks

mosche closed this Jul 7, 2022

mosche deleted the RowWithGetters-JMH-master branch July 7, 2022 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JMH benchmarks on reading field values from RowWithGetters. #17203

JMH benchmarks on reading field values from RowWithGetters. #17203

Uh oh!

mosche commented Mar 29, 2022

Uh oh!

codecov bot commented Mar 29, 2022 •

edited

Loading

Uh oh!

TheNeuralBit commented May 11, 2022

Uh oh!

TheNeuralBit commented May 11, 2022

Uh oh!

lukecwik commented May 11, 2022

Uh oh!

lukecwik May 11, 2022

Uh oh!

mosche commented May 12, 2022

Uh oh!

lukecwik commented May 16, 2022

Uh oh!

mosche commented Jul 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JMH benchmarks on reading field values from RowWithGetters. #17203

JMH benchmarks on reading field values from RowWithGetters. #17203

Uh oh!

Conversation

mosche commented Mar 29, 2022

GitHub Actions Tests Status (on master branch)

Uh oh!

codecov bot commented Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TheNeuralBit commented May 11, 2022

Uh oh!

TheNeuralBit commented May 11, 2022

Uh oh!

lukecwik commented May 11, 2022

Uh oh!

lukecwik May 11, 2022

Choose a reason for hiding this comment

Uh oh!

mosche commented May 12, 2022

Uh oh!

lukecwik commented May 16, 2022

Uh oh!

mosche commented Jul 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Mar 29, 2022 •

edited

Loading