-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-5209: [Java] Provide initial performance benchmarks from SQL workloads #4198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
question to @jacques-n and @siddharthteotia about what might be the best long-term strategy for benchmarking in Java. We would like to collect these benchmark results in a common benchmark database so we can monitor the performance of Java alongside other languages |
|
I'm not familiar with java benchmark frameworks. If it can support c++ benchmark output format (and go), there's not a lot of work involved to capture the results. |
|
[JMH|https://openjdk.java.net/projects/code-tools/jmh/] is the goto for benchmarking in Java. We definitely don't want to right our own harness. Getting this right is its own science with the JIT. I don't really understand the example benchmarks in this commit. They seem somewhat arbitrary. We want to achieve maximum performance we interact directly with the memory using PlatformDependent (or similar). |
|
One other comment, if people want to know more about the precariousness of Java microbenchmarks like this, they should spend some time on this blog: http://psy-lob-saw.blogspot.com/ |
@jacques-n Thank you for your comments. JMH is reasonable benchmark framework. |
Codecov Report
@@ Coverage Diff @@
## master #4198 +/- ##
==========================================
+ Coverage 87.78% 89.43% +1.65%
==========================================
Files 758 622 -136
Lines 92513 83275 -9238
Branches 1251 0 -1251
==========================================
- Hits 81210 74477 -6733
+ Misses 11186 8798 -2388
+ Partials 117 0 -117
Continue to review full report at Codecov.
|
|
@liyafan82 I think we added more targetted micro-benchmarks for the accessing for data. Do you think this PR is still relevant? |
Since there is no agreement such benchmarks should be added. Let's close this PR. |
In this PR, we provide the initial benchmark for running TPC-H SQL benchmark using Arrow.
The output looks like this:
Statistics for benchmark TPC-H Q1#Project & Filter:
Num of repeats = 10
Max duration = 0.64111ms
Min duration = 0.381478ms
Avg duration = 0.48580240000000013ms
If this is OK, we will provide more benchmarks in a similar manner.