Skip to content

MINOR: Make ByteUtilsBenchmark deterministic#13307

Closed
divijvaidya wants to merge 2 commits intoapache:trunkfrom
divijvaidya:minor-Improve-ByteUtilsBenchmark
Closed

MINOR: Make ByteUtilsBenchmark deterministic#13307
divijvaidya wants to merge 2 commits intoapache:trunkfrom
divijvaidya:minor-Improve-ByteUtilsBenchmark

Conversation

@divijvaidya
Copy link
Copy Markdown
Member

Motivation

The current implementation of the benchmark may produce different set of integers across benchmarks and hence, may not provide apples to apples comparison.

Changes

  1. With this change, we initialize random number generator with a fixed seed at the start of a benchmark. This ensures that for each benchmark, random number generator produces the same sequence of random values. Hence, the input data across benchmarks would be consistent providing a reliable apples to apples comparison.
  2. We ensure that a new set of random numbers are generated per iteration, so that the bechmark calculation is performed over a diverse range of values

Sample result of a benchmark run:

Benchmark                                           Mode  Cnt     Score   Error  Units
ByteUtilsBenchmark.testSizeOfUnsignedVarint        thrpt   30  1302.193 ± 0.396  ops/s
ByteUtilsBenchmark.testSizeOfUnsignedVarintSimple  thrpt   30   328.678 ± 0.269  ops/s
ByteUtilsBenchmark.testSizeOfVarlong               thrpt   30   880.113 ± 0.676  ops/s
ByteUtilsBenchmark.testSizeOfVarlongSimple         thrpt   30   109.592 ± 0.071  ops/s
JMH benchmarks done

Copy link
Copy Markdown
Member

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement! The first point makes sense to me. But for the second point:

We ensure that a new set of random numbers are generated per iteration, so that the bechmark calculation is performed over a diverse range of values

I didn't understand why we need this change. Could you elaborate more on it? From what I can see, it should make no difference if we use the same values or not for each iteration.

Thank you.

@divijvaidya
Copy link
Copy Markdown
Member Author

I didn't understand why we need this change. Could you elaborate more on it? From what I can see, it should make no difference if we use the same values or not for each iteration.

Having different values for each iteration gives us ability to benchmark over diverse sample set. This increases the probability that the algorithm we are testing is optimal over a larger set of values.

@divijvaidya
Copy link
Copy Markdown
Member Author

@showuon I am discarding this PR as I have included it's changes in #13312

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants