Eliminates the use of ConcurrentLinkedQueue.size() in PooledBlockAllocator, improving performance when the queue gets large. by tgregg · Pull Request #389 · amazon-ion/ion-java

tgregg · 2021-10-11T21:53:54Z

Issue #, if available:
Fixes #371

Description of changes:
When the size of the binary writer's block pool queue gets large, ConcurrentLinkedQueue.size() starts dominating profiles because it's not a constant time operation.

One way of fixing this is to use a concurrent queue implementation that does have a constant time size(). Two such implementations are ArrayBlockingQueue and LinkedBlockingQueue. Unlike ConcurrentLinkedQueue, which is a lock-free non-blocking implementation, both of the BlockingQueue implementations use locks. In addition, ArrayBlockingQueue is fixed-size and requires allocation of a backing array of that size up-front. I tried both implementations and found them not to perform as well than the proposed solution that retains ConcurrentLinkedQueue.

This proposed solution simply tracks the approximate size of the queue externally using an AtomicInteger (which is also lock-free).

Let's talk about race conditions

First, the existing implementation has a race condition. The freeBlocks.size() < blockLimit condition could be satisfied by multiple threads before the following freeBlocks.add, resulting in the pool growing beyond its capacity under high contention. This isn't a big deal; keeping an extra block or two around for what is likely a short amount of time isn't going to cause a major headache.

The proposal actually solves that race condition by atomically incrementing the size before adding the block. However, because the size is optimistically incremented, there is a race condition in the uncommon case where the pool ends up being full. Looking at the proposed diff, multiple threads could get to line 71 before the "first" one completes it. In this case, a few blocks that could have fit in the pool would get dropped. They'd be re-allocated if the pool ever needed to grow to that size again.

We could make this change such that it has the same race condition behavior as the existing solution; namely, that it may allow the pool to exceed capacity rather than unnecessarily freeing blocks. I like the proposed behavior slightly better because it's more conservative with heap size and it only requires one operation (increment) on the common path (pool not full) instead of two (check then increment). However, I'm open to other opinions.

Performance

I tested a variety of different conditions to make sure there wouldn't be unintended side-effects. For the sake of brevity I'm only including the results for the case that targets large queue size under high contention here because it illustrates the benefits of the solution. The full results for all of the conditions I tried can be found here.

For the following test, I made a temporary modification to ion-java-benchmark-cli to write the same binary Ion stream with 10 different threads, 10 times each. I used the --ion-writer-block-size option to reduce the block size to 1K from the default 32K, resulting in an increase in the number of blocks in the pool under high contention. Here's the ion-java-benchmark-cli command:

ion-java-benchmark write --io-type buffer --format ion_binary --iterations 2 --warmups 2 --ion-writer-block-size 1024 log.ion

Before:

Benchmark                                   (input)                                          (options)  Mode  Cnt           Score   Error   Units
Bench.run                                   log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2       39735.502           ms/op
Bench.run:Heap usage                        log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2        1557.834              MB
Bench.run:Serialized size                   log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2          23.545              MB
Bench.run:·gc.alloc.rate                    log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2          ≈ 10⁻⁴          MB/sec
Bench.run:·gc.alloc.rate.norm               log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2        9128.000            B/op
Bench.run:·gc.churn.PS_Eden_Space           log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2         174.076          MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2  7050337912.000            B/op
Bench.run:·gc.churn.PS_Survivor_Space       log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2          18.090          MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2   747206192.000            B/op
Bench.run:·gc.count                         log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2          30.000          counts
Bench.run:·gc.time                          log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2        2186.000              ms

After:

Benchmark                                   (input)                                          (options)  Mode  Cnt           Score   Error   Units
Bench.run                                   log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2       11345.630           ms/op
Bench.run:Heap usage                        log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2        1579.797              MB
Bench.run:Serialized size                   log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2          23.545              MB
Bench.run:·gc.alloc.rate                    log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2           0.001          MB/sec
Bench.run:·gc.alloc.rate.norm               log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2        9140.000            B/op
Bench.run:·gc.churn.PS_Eden_Space           log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2         632.363          MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2  7792570168.000            B/op
Bench.run:·gc.churn.PS_Old_Gen              log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2          49.544          MB/sec
Bench.run:·gc.churn.PS_Old_Gen.norm         log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2   681872076.000            B/op
Bench.run:·gc.churn.PS_Survivor_Space       log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2          73.401          MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2   942625592.000            B/op
Bench.run:·gc.count                         log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2          34.000          counts
Bench.run:·gc.time                          log.ion  write::{f:ION_BINARY,t:BUFFER,a:STREAMING,b:1024}    ss    2        4155.000              ms

That's a 71% improvement (39.735s -> 11.345s).

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

codecov · 2021-10-11T21:57:53Z

Codecov Report

Merging #389 (88f2c95) into master (2518f31) will decrease coverage by 0.01%.
The diff coverage is 50.00%.

❗ Current head 88f2c95 differs from pull request most recent head 9a74a9d. Consider uploading reports for the commit 9a74a9d to get more accurate results

@@             Coverage Diff              @@
##             master     #389      +/-   ##
============================================
- Coverage     66.36%   66.35%   -0.02%     
+ Complexity     5362     5359       -3     
============================================
  Files           154      154              
  Lines         22619    22622       +3     
  Branches       4083     4083              
============================================
- Hits          15011    15010       -1     
- Misses         6249     6251       +2     
- Partials       1359     1361       +2

Impacted Files	Coverage Δ
...zon/ion/impl/bin/PooledBlockAllocatorProvider.java	`81.81% <50.00%> (-1.52%)`	⬇️
src/com/amazon/ion/impl/BlockedBuffer.java	`50.97% <0.00%> (-0.37%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2518f31...9a74a9d. Read the comment docs.

zslayton · 2021-10-12T12:17:55Z

The proposal actually solves that race condition by atomically incrementing the size before adding the block. However, because the size is optimistically incremented, there is a race condition in the uncommon case where the pool ends up being full. Looking at the proposed diff, multiple threads could get to line 71 before the "first" one completes it. In this case, a few blocks that could have fit in the pool would get dropped. They'd be re-allocated if the pool ever needed to grow to that size again.

We could make this change such that it has the same race condition behavior as the existing solution; namely, that it may allow the pool to exceed capacity rather than unnecessarily freeing blocks. I like the proposed behavior slightly better because it's more conservative with heap size and it only requires one operation (increment) on the common path (pool not full) instead of two (check then increment). However, I'm open to other opinions.

This trade-off seems fine to me. Could you add a comment that says "there's a race condition here that we allow deliberately as an optimization" so no one tries to fix it without performance testing down the road?

…cator, improving performance when the queue gets large.

tgregg · 2021-10-12T22:11:18Z

@zslayton Done.

tgregg mentioned this pull request Oct 11, 2021

Reduce regression detector false positives in Github Actions #390

Open

tgregg changed the title ~~Eliminates the use of ConcurrentLinkedQueue.size() is PooledBlockAllocator, improving performance when the queue gets large.~~ Eliminates the use of ConcurrentLinkedQueue.size() in PooledBlockAllocator, improving performance when the queue gets large. Oct 11, 2021

jobarr-amzn previously approved these changes Oct 12, 2021

View reviewed changes

zslayton previously approved these changes Oct 12, 2021

View reviewed changes

Eliminates the use of ConcurrentLinkedQueue.size() is PooledBlockAllo…

9a74a9d

…cator, improving performance when the queue gets large.

tgregg dismissed stale reviews from zslayton and jobarr-amzn via 9a74a9d October 12, 2021 22:10

tgregg force-pushed the block-pool-size branch from 88f2c95 to 9a74a9d Compare October 12, 2021 22:10

jobarr-amzn approved these changes Oct 13, 2021

View reviewed changes

tgregg merged commit 82478a8 into master Oct 13, 2021

tgregg deleted the block-pool-size branch October 13, 2021 17:20

tgregg mentioned this pull request Jan 24, 2022

Consider replacing ArrayBlockingQueue with ConcurrentLinkedQueue in Pool #403

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminates the use of ConcurrentLinkedQueue.size() in PooledBlockAllocator, improving performance when the queue gets large.#389

Eliminates the use of ConcurrentLinkedQueue.size() in PooledBlockAllocator, improving performance when the queue gets large.#389
tgregg merged 1 commit intomasterfrom
block-pool-size

tgregg commented Oct 11, 2021

Uh oh!

codecov bot commented Oct 11, 2021 •

edited

Loading

Uh oh!

zslayton commented Oct 12, 2021

Uh oh!

tgregg commented Oct 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tgregg commented Oct 11, 2021

Let's talk about race conditions

Performance

Uh oh!

codecov bot commented Oct 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zslayton commented Oct 12, 2021

Uh oh!

tgregg commented Oct 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Oct 11, 2021 •

edited

Loading