Skip to content

Conversation

@liyafan82
Copy link
Contributor

@liyafan82 liyafan82 commented May 10, 2019

This PR adds a flag to enable/disable null checking in "get" methods of vectors. By disabling null checking, vector APIs can have better performance.

To make it more convenient to review this PR. It is split into two parts: this is the first part to provide a flag. If it is OK, another PR will be opened for the second part: to use the flag in vector "get" methods.

@codecov-io
Copy link

codecov-io commented May 10, 2019

Codecov Report

Merging #4288 into master will increase coverage by 0.48%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4288      +/-   ##
==========================================
+ Coverage   88.46%   88.95%   +0.48%     
==========================================
  Files         773      628     -145     
  Lines       95602    84909   -10693     
  Branches     1251        0    -1251     
==========================================
- Hits        84578    75528    -9050     
+ Misses      10788     9381    -1407     
+ Partials      236        0     -236
Impacted Files Coverage Δ
cpp/src/arrow/vendored/datetime/date.h 30.62% <0%> (-26.68%) ⬇️
cpp/src/arrow/vendored/datetime/tz.h 43.47% <0%> (-0.97%) ⬇️
cpp/src/arrow/pretty_print-test.cc 100% <0%> (ø) ⬆️
go/arrow/ipc/writer.go
go/arrow/math/uint64_amd64.go
go/arrow/memory/memory_avx2_amd64.go
go/arrow/ipc/file_reader.go
js/src/enum.ts
go/arrow/array/builder.go
js/src/Arrow.node.ts
... and 139 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a1cb373...588ddbf. Read the comment docs.

@liyafan82
Copy link
Contributor Author

Hi @jacques-n and @emkornfield, would you please take a look at this PR?

@jacques-n
Copy link
Contributor

I think this generally looks fine. It seems like the patch should also include the changes to the vectors with the assembly/microbenchmark observations of impact.

@liyafan82 liyafan82 changed the title [ARROW-5290][Java]Provide a flag to enable/disable null-checking in v… ARROW-5290: [Java]Provide a flag to enable/disable null-checking in v… May 10, 2019
@liyafan82
Copy link
Contributor Author

I think this generally looks fine. It seems like the patch should also include the changes to the vectors with the assembly/microbenchmark observations of impact.

@jacques-n Good suggestion. I have included them as well. Thanks a lot.

@emkornfield
Copy link
Contributor

It seems like maybe this belongs in vector package not memory?

@liyafan82
Copy link
Contributor Author

It seems like maybe this belongs in vector package not memory?

Sounds reasonable. I have moved them accordingly.

@emkornfield
Copy link
Contributor

I think we should refactor the check method into the parent class, but not as part of this PR. I filed https://issues.apache.org/jira/browse/ARROW-5305 to track this.

I'm going to merge the change. Thanks for working through this on the mailing list.

@liyafan82
Copy link
Contributor Author

I think we should refactor the check method into the parent class, but not as part of this PR. I filed https://issues.apache.org/jira/browse/ARROW-5305 to track this.

I'm going to merge the change. Thanks for working through this on the mailing list.

Hi @emkornfield , thanks a lot for all your comments and helpful suggestions, and thank you for opening ARROW-5305 to refine the changes.
I am watching ARROW-5305, since we are looking forward to using the features in our code!

Copy link
Contributor

@jacques-n jacques-n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 until we see JIT analysis of both cases (check enabled and not enabled). I want to make sure this condition disappears as it should.

@emkornfield
Copy link
Contributor

@jacques-n sorry misunderstood the original question will hold off on the merge until @liyafan82 links the results (or do you want to handle the merge when you are happy with the change?

@liyafan82
Copy link
Contributor Author

liyafan82 commented May 13, 2019

@jacques-n and @emkornfield , thanks a lot for your comments.
I made JIT analysis on our previous micro-benchmarks. First, I run JMH performance tests to compare the safe API (with null-checking enabled and disabled) with the unsafe API (in our PR, which is to be discarded).

When the null-checking is enabled, we have:

Benchmark Mode Cnt Score Error Units
VectorAPIBenchmarks.testSafe avgt 5 4.067 ± 0.005 us/op
VectorAPIBenchmarks.testUnSafe avgt 5 2.818 ± 0.001 us/op

This is consistent with the 30% performance difference we observed before.

When the null-checking flag is disabled, we have:

Benchmark Mode Cnt Score Error Units
VectorAPIBenchmarks.testSafe avgt 5 2.817 ± 0.002 us/op
VectorAPIBenchmarks.testUnSafe avgt 5 2.817 ± 0.001 us/op

We can see that by disabling the flag, the performance is almost identical.
Therefore, the null-check accounts for the 30% performance difference, and when it is disabled, the effects disappear completely.

In addition, by inspecting the assembly code, we can see that the code for the safe API (with null-checking disabled) is almost identical to the code for the unsafe API:

The code for the safe API (with null-checking disabled)

safe

The code for the unsafe API

unsafe

So it can be seen that the flag solves the performance problem completely.

@liyafan82
Copy link
Contributor Author

liyafan82 commented May 13, 2019

Please also confirm that your changes do not negatively impact the performance of the path with checking enabled versus without the static.

@jacques-n Sure. Thanks for your kind reminder.
When the static is enabled:

Benchmark Mode Cnt Score Error Units
VectorAPIBenchmarks.testSafe avgt 5 4.068 ± 0.001 us/op

When the static is removed:

Benchmark Mode Cnt Score Error Units
VectorAPIBenchmarks.testSafe avgt 5 4.069 ± 0.001 us/op

So the performance is almost identical.

Assembly code analysis on-going ...

@liyafan82
Copy link
Contributor Author

liyafan82 commented May 13, 2019

Please also confirm that your changes do not negatively impact the performance of the path with checking enabled versus without the static.

@jacques-n Assembly code are also almost identical:

The static enabled
static_enabled

The static removed
static_removed

@jacques-n
Copy link
Contributor

+1 Looks good. Thanks for checking the benchmark, etc!

@wesm wesm changed the title ARROW-5290: [Java]Provide a flag to enable/disable null-checking in v… ARROW-5290: [Java] Provide a flag to enable/disable null-checking in vector's get methods May 13, 2019
@wesm wesm closed this in 0ee2ff1 May 13, 2019
pribor pushed a commit to GlobalWebIndex/arrow that referenced this pull request Oct 24, 2025
…vector's get methods

This PR adds a flag to enable/disable null checking in "get" methods of vectors. By disabling null checking, vector APIs can have better performance.

To make it more convenient to review this PR. It is split into two parts: this is the first part to provide a flag. If it is OK, another PR will be opened for the second part: to use the flag in vector "get" methods.

Author: liyafan82 <fan_li_ya@foxmail.com>

Closes apache#4288 from liyafan82/fly_5290 and squashes the following commits:

588ddbf <liyafan82> Move the option to vector package
456e4ee <liyafan82> Modify get methods and provide the performance results
0c6850e <liyafan82> Provide a flag to enable/disable null-checking in vectors' get methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants