ARROW-5209: [Java] Provide initial performance benchmarks from SQL workloads #4198

liyafan82 · 2019-04-25T06:33:14Z

In this PR, we provide the initial benchmark for running TPC-H SQL benchmark using Arrow.
The output looks like this:

Statistics for benchmark TPC-H Q1#Project & Filter:
Num of repeats = 10
Max duration = 0.64111ms
Min duration = 0.381478ms
Avg duration = 0.48580240000000013ms

If this is OK, we will provide more benchmarks in a similar manner.

…kloads

wesm · 2019-04-25T12:39:32Z

question to @jacques-n and @siddharthteotia about what might be the best long-term strategy for benchmarking in Java. We would like to collect these benchmark results in a common benchmark database so we can monitor the performance of Java alongside other languages

cc @fsaintjacques

fsaintjacques · 2019-05-02T18:15:56Z

I'm not familiar with java benchmark frameworks. If it can support c++ benchmark output format (and go), there's not a lot of work involved to capture the results.

jacques-n · 2019-05-02T19:48:46Z

[JMH|https://openjdk.java.net/projects/code-tools/jmh/] is the goto for benchmarking in Java. We definitely don't want to right our own harness. Getting this right is its own science with the JIT.

I don't really understand the example benchmarks in this commit. They seem somewhat arbitrary. We want to achieve maximum performance we interact directly with the memory using PlatformDependent (or similar).

jacques-n · 2019-05-02T20:03:20Z

One other comment, if people want to know more about the precariousness of Java microbenchmarks like this, they should spend some time on this blog: http://psy-lob-saw.blogspot.com/

liyafan82 · 2019-05-05T04:25:51Z

[JMH|https://openjdk.java.net/projects/code-tools/jmh/] is the goto for benchmarking in Java. We definitely don't want to right our own harness. Getting this right is its own science with the JIT.

I don't really understand the example benchmarks in this commit. They seem somewhat arbitrary. We want to achieve maximum performance we interact directly with the memory using PlatformDependent (or similar).

@jacques-n Thank you for your comments. JMH is reasonable benchmark framework.
The benchmarks are extracted from our SQL engine when processing an open SQL benchmark TPC-H.
Our SQL engine is going to be made open source and merged into Apache Flink. Do you think it is reasonable?

codecov-io · 2019-05-05T08:46:26Z

Codecov Report

Merging #4198 into master will increase coverage by 1.65%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4198      +/-   ##
==========================================
+ Coverage   87.78%   89.43%   +1.65%     
==========================================
  Files         758      622     -136     
  Lines       92513    83275    -9238     
  Branches     1251        0    -1251     
==========================================
- Hits        81210    74477    -6733     
+ Misses      11186     8798    -2388     
+ Partials      117        0     -117

Impacted Files	Coverage Δ
cpp/src/arrow/array/builder_union.h	`61.9% <0%> (-38.1%)`	⬇️
cpp/src/arrow/array/builder_base.cc	`76.36% <0%> (-5.99%)`	⬇️
cpp/src/arrow/array/builder_dict.cc	`64.41% <0%> (-3.33%)`	⬇️
cpp/src/arrow/array/builder_base.h	`94.11% <0%> (-2.55%)`	⬇️
cpp/src/parquet/statistics.cc	`87.96% <0%> (-2.3%)`	⬇️
cpp/src/arrow/csv/column-builder.cc	`95.32% <0%> (-1.76%)`	⬇️
cpp/src/arrow/util/thread-pool-test.cc	`97.66% <0%> (-0.94%)`	⬇️
cpp/src/arrow/array/builder_binary.h	`97% <0%> (-0.81%)`	⬇️
python/pyarrow/tests/test_array.py	`96.31% <0%> (-0.49%)`	⬇️
cpp/src/parquet/metadata.cc	`90% <0%> (-0.44%)`	⬇️
... and 193 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67efb73...1bc9002. Read the comment docs.

emkornfield · 2019-06-12T06:06:36Z

@liyafan82 I think we added more targetted micro-benchmarks for the accessing for data. Do you think this PR is still relevant?

liyafan82 · 2019-06-12T06:41:29Z

@liyafan82 I think we added more targetted micro-benchmarks for the accessing for data. Do you think this PR is still relevant?

Since there is no agreement such benchmarks should be added. Let's close this PR.

liyafan82 added 3 commits April 25, 2019 14:28

[ARROW-5209][Java]Provide initial performance benchmarks from SQL wor…

0c3d156

…kloads

[ARROW-5209][Java]Adjust indentions

d1dcff1

[ARROW-5209][Java]Fix style errors

7f994d7

wesm changed the title ~~[ARROW-5209][Java]Provide initial performance benchmarks from SQL workloads~~ ARROW-5209: [Java] Provide initial performance benchmarks from SQL workloads Apr 25, 2019

liyafan82 added 2 commits April 26, 2019 18:16

Sync latest code

8979d18

[ARROW-5209][Java]Add another benchmark for Q1

60d8ef6

liyafan82 added 3 commits May 5, 2019 15:17

[ARROW-5209][Java]Support benchmarks by JMH

a60cf9c

[ARROW-5209][Java]Fix style problems

375da92

[ARROW-5209][Java]Add apache license header

1bc9002

liyafan82 closed this Jun 12, 2019

asfimport mentioned this pull request Feb 13, 2020

[Java] Add performance benchmarks from SQL workloads #21684

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-5209: [Java] Provide initial performance benchmarks from SQL workloads #4198

ARROW-5209: [Java] Provide initial performance benchmarks from SQL workloads #4198

Uh oh!

liyafan82 commented Apr 25, 2019

Uh oh!

wesm commented Apr 25, 2019

Uh oh!

fsaintjacques commented May 2, 2019

Uh oh!

jacques-n commented May 2, 2019

Uh oh!

jacques-n commented May 2, 2019

Uh oh!

liyafan82 commented May 5, 2019

Uh oh!

codecov-io commented May 5, 2019

Uh oh!

emkornfield commented Jun 12, 2019

Uh oh!

liyafan82 commented Jun 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ARROW-5209: [Java] Provide initial performance benchmarks from SQL workloads #4198

ARROW-5209: [Java] Provide initial performance benchmarks from SQL workloads #4198

Uh oh!

Conversation

liyafan82 commented Apr 25, 2019

Uh oh!

wesm commented Apr 25, 2019

Uh oh!

fsaintjacques commented May 2, 2019

Uh oh!

jacques-n commented May 2, 2019

Uh oh!

jacques-n commented May 2, 2019

Uh oh!

liyafan82 commented May 5, 2019

Uh oh!

codecov-io commented May 5, 2019

Codecov Report

Uh oh!

emkornfield commented Jun 12, 2019

Uh oh!

liyafan82 commented Jun 12, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants