Skip to content

Backs Pool with ConcurrentLinkedQueue instead of ArrayBlockingQueue.#405

Merged
tgregg merged 1 commit intomasterfrom
pool-concurrentlinkedqueue
Jan 25, 2022
Merged

Backs Pool with ConcurrentLinkedQueue instead of ArrayBlockingQueue.#405
tgregg merged 1 commit intomasterfrom
pool-concurrentlinkedqueue

Conversation

@tgregg
Copy link
Contributor

@tgregg tgregg commented Jan 25, 2022

Issue #, if available:

Resolves #403

Description of changes:

#403 was created in response to a user report that ArrayBlockingQueue was performing poorly under high contention. This PR proposes to use ConcurrentLinkedQueue instead, as we do in the binary writer's PooledBlockAllocator. Now, the implementation of both pools is practically identical.

I tested the performance by reading a small binary Ion payload using 256 different threads simultaneously. My results did not capture much impact to raw performance, as is shown below.

Before:

Benchmark                                     (input)                                                    (options)  Mode  Cnt        Score         Error   Units
Bench.run                                   offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4      625.710 ±      98.023   us/op
Bench.run:Heap usage                        offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4       69.722 ±     270.399      MB
Bench.run:Serialized size                   offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4        0.001                    MB
Bench.run:·gc.alloc.rate                    offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4     1390.198 ±    5897.273  MB/sec
Bench.run:·gc.alloc.rate.norm               offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4   951955.882 ± 4032655.736    B/op
Bench.run:·gc.churn.PS_Eden_Space           offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4     1853.046 ±     295.808  MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4  1278128.574 ±   17563.878    B/op
Bench.run:·gc.churn.PS_Survivor_Space       offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4        0.011 ±       0.109  MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4        7.463 ±      72.235    B/op
Bench.run:·gc.count                         offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4      525.000                counts
Bench.run:·gc.time                          offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4      386.000                    ms

Bench.run                                   offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4     1066.717 ±     131.231   us/op
Bench.run:Heap usage                        offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4       40.355 ±     119.060      MB
Bench.run:Serialized size                   offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4        0.001                    MB
Bench.run:·gc.alloc.rate                    offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4      690.663 ±    2917.179  MB/sec
Bench.run:·gc.alloc.rate.norm               offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4   817010.976 ± 3451364.936    B/op
Bench.run:·gc.churn.PS_Eden_Space           offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4      934.683 ±     106.301  MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4  1098612.249 ±   12630.623    B/op
Bench.run:·gc.churn.PS_Survivor_Space       offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4        0.034 ±       0.104  MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4       40.135 ±     122.807    B/op
Bench.run:·gc.count                         offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4      510.000                counts
Bench.run:·gc.time                          offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4      382.000                    ms

After:

Benchmark                                     (input)                                                    (options)  Mode  Cnt        Score         Error   Units
Bench.run                                   offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4      620.585 ±     213.008   us/op
Bench.run:Heap usage                        offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4      105.876 ±     219.392      MB
Bench.run:Serialized size                   offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4        0.001                    MB
Bench.run:·gc.alloc.rate                    offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4     1367.362 ±    5785.516  MB/sec
Bench.run:·gc.alloc.rate.norm               offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4   958212.760 ± 4059694.097    B/op
Bench.run:·gc.churn.PS_Eden_Space           offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4     1880.581 ±     676.380  MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4  1284008.912 ±   14065.695    B/op
Bench.run:·gc.churn.PS_Survivor_Space       offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4        0.021 ±       0.095  MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4       14.082 ±      66.259    B/op
Bench.run:·gc.count                         offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4      496.000                counts
Bench.run:·gc.time                          offer.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    4      374.000                    ms

Bench.run                                   offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4     1022.732 ±     114.783   us/op
Bench.run:Heap usage                        offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4       42.911 ±     135.640      MB
Bench.run:Serialized size                   offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4        0.001                    MB
Bench.run:·gc.alloc.rate                    offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4      723.678 ±    3059.124  MB/sec
Bench.run:·gc.alloc.rate.norm               offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4   812518.216 ± 3431796.993    B/op
Bench.run:·gc.churn.PS_Eden_Space           offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4      969.458 ±     101.871  MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4  1092581.180 ±   15322.962    B/op
Bench.run:·gc.churn.PS_Survivor_Space       offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4        0.030 ±       0.028  MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4       34.207 ±      33.903    B/op
Bench.run:·gc.count                         offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4      531.000                counts
Bench.run:·gc.time                          offer.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    4      383.000                    ms

However, I did notice a significant difference on the CPU profiles before and after the change. The "before" profile closely resembles the profile the requester showed me.

Before:

Screen Shot 2022-01-24 at 4 35 24 PM

After:

Screen Shot 2022-01-24 at 4 34 50 PM

After the change, polling and offering the queue basically disappears from the profile. Even if this doesn't have a large performance impact in most cases, I'd be happy to merge it just to clean up performance profiles. It also has the benefit of added consistency with our other pool implementation.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Jan 25, 2022

Codecov Report

Merging #405 (1d29044) into master (eb3766e) will increase coverage by 0.01%.
The diff coverage is 71.42%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #405      +/-   ##
============================================
+ Coverage     66.42%   66.43%   +0.01%     
+ Complexity     5386     5385       -1     
============================================
  Files           155      155              
  Lines         22701    22705       +4     
  Branches       4093     4093              
============================================
+ Hits          15080    15085       +5     
+ Misses         6261     6259       -2     
- Partials       1360     1361       +1     
Impacted Files Coverage Δ
src/com/amazon/ion/impl/bin/utf8/Pool.java 85.71% <71.42%> (-14.29%) ⬇️
src/com/amazon/ion/impl/BlockedBuffer.java 51.33% <0.00%> (+0.36%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eb3766e...1d29044. Read the comment docs.

@tgregg
Copy link
Contributor Author

tgregg commented Jan 25, 2022

Note: the performance regression test failed for an unrelated reason (The performance regression detected when benchmark the ion-java from the new commit with the test data: testList.10n and parameters: write::{format:"JSON",type:"FILE",api:"STREAMING"} The following aspects have regressions: {Heap usage=-0.0614610633338160}), which is just noise. We occasionally get false positives with the regression detector. We should try to drive that down, but I'll take it over a false negative.

Copy link
Contributor

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the change, polling and offering the queue basically disappears from the profile. Even if this doesn't have a large performance impact in most cases, I'd be happy to merge it just to clean up performance profiles. It also has the benefit of added consistency with our other pool implementation.

Agreed; this is a good change but I would love to know where the JVM is spending its newfound free time.

Some minor cleanup thoughts below.

@@ -1,6 +1,8 @@
package com.amazon.ion.impl.bin.utf8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class started out as a Utf8EncoderPool and was generalized when we realized how many different resources we could be pooling. We should consider promoting Pool<T> to a higher-level package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True; I'll leave that for its own PR. Issue to track: #406


// A queue of previously initialized objects that can be loaned out.
private final ArrayBlockingQueue<T> bufferQueue;
private final Queue<T> bufferQueue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice some pre-existing comments/variables mention things like buffer and block; these aren't accurate for the generalized Pool<T> type.

Suggested change
private final Queue<T> bufferQueue;
private final Queue<T> objectQueue;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 77 to 79
// Under high contention, multiple threads could end up here before the first one
// decrements the size, causing blocks to be dropped wastefully. This is not harmful
// because blocks will be re-allocated when necessary; the pool is kept as close as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Under high contention, multiple threads could end up here before the first one
// decrements the size, causing blocks to be dropped wastefully. This is not harmful
// because blocks will be re-allocated when necessary; the pool is kept as close as
// Under high contention, multiple threads could end up here before the first one
// decrements the size, causing objects to be dropped wastefully. This is not harmful
// because objects will be re-allocated when necessary; the pool is kept as close as

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tgregg tgregg force-pushed the pool-concurrentlinkedqueue branch from 91f97f0 to 1d29044 Compare January 25, 2022 20:18
@tgregg tgregg merged commit 0747831 into master Jan 25, 2022
@tgregg tgregg deleted the pool-concurrentlinkedqueue branch January 25, 2022 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider replacing ArrayBlockingQueue with ConcurrentLinkedQueue in Pool

2 participants