Conversation
Codecov Report
@@ Coverage Diff @@
## master #389 +/- ##
============================================
- Coverage 66.36% 66.35% -0.02%
+ Complexity 5362 5359 -3
============================================
Files 154 154
Lines 22619 22622 +3
Branches 4083 4083
============================================
- Hits 15011 15010 -1
- Misses 6249 6251 +2
- Partials 1359 1361 +2
Continue to review full report at Codecov.
|
This trade-off seems fine to me. Could you add a comment that says "there's a race condition here that we allow deliberately as an optimization" so no one tries to fix it without performance testing down the road? |
…cator, improving performance when the queue gets large.
88f2c95 to
9a74a9d
Compare
|
@zslayton Done. |
Issue #, if available:
Fixes #371
Description of changes:
When the size of the binary writer's block pool queue gets large,
ConcurrentLinkedQueue.size()starts dominating profiles because it's not a constant time operation.One way of fixing this is to use a concurrent queue implementation that does have a constant time
size(). Two such implementations areArrayBlockingQueueandLinkedBlockingQueue. UnlikeConcurrentLinkedQueue, which is a lock-free non-blocking implementation, both of the BlockingQueue implementations use locks. In addition,ArrayBlockingQueueis fixed-size and requires allocation of a backing array of that size up-front. I tried both implementations and found them not to perform as well than the proposed solution that retainsConcurrentLinkedQueue.This proposed solution simply tracks the approximate size of the queue externally using an
AtomicInteger(which is also lock-free).Let's talk about race conditions
First, the existing implementation has a race condition. The
freeBlocks.size() < blockLimitcondition could be satisfied by multiple threads before the followingfreeBlocks.add, resulting in the pool growing beyond its capacity under high contention. This isn't a big deal; keeping an extra block or two around for what is likely a short amount of time isn't going to cause a major headache.The proposal actually solves that race condition by atomically incrementing the size before adding the block. However, because the size is optimistically incremented, there is a race condition in the uncommon case where the pool ends up being full. Looking at the proposed diff, multiple threads could get to line 71 before the "first" one completes it. In this case, a few blocks that could have fit in the pool would get dropped. They'd be re-allocated if the pool ever needed to grow to that size again.
We could make this change such that it has the same race condition behavior as the existing solution; namely, that it may allow the pool to exceed capacity rather than unnecessarily freeing blocks. I like the proposed behavior slightly better because it's more conservative with heap size and it only requires one operation (increment) on the common path (pool not full) instead of two (check then increment). However, I'm open to other opinions.
Performance
I tested a variety of different conditions to make sure there wouldn't be unintended side-effects. For the sake of brevity I'm only including the results for the case that targets large queue size under high contention here because it illustrates the benefits of the solution. The full results for all of the conditions I tried can be found here.
For the following test, I made a temporary modification to ion-java-benchmark-cli to write the same binary Ion stream with 10 different threads, 10 times each. I used the
--ion-writer-block-sizeoption to reduce the block size to 1K from the default 32K, resulting in an increase in the number of blocks in the pool under high contention. Here's theion-java-benchmark-clicommand:Before:
After:
That's a 71% improvement (39.735s -> 11.345s).
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.