Skip to content

Adds a pool of UTF-8 decoders, making reader instantiation less expensive.#388

Merged
tgregg merged 1 commit intomasterfrom
reader-decoder-pool
Oct 11, 2021
Merged

Adds a pool of UTF-8 decoders, making reader instantiation less expensive.#388
tgregg merged 1 commit intomasterfrom
reader-decoder-pool

Conversation

@tgregg
Copy link
Contributor

@tgregg tgregg commented Oct 8, 2021

Issue #, if available:
Fixes #370

Description of changes:

  1. Created generic base classes for poolable objects (Poolable - the negative diff for Utf8StringEncoder shows what was extracted) and for pools of poolable objects (Pool - the negative diff for Utf8StringEncoderPool shows what was extracted).
  2. Created Utf8StringDecoder (extends Poolable) and Utf8StringDecoderPool (extends Pool) to provide reusable logic and character buffers for decoding UTF-8 strings.
  3. Modified IonReaderBinaryIncremental (the new incremental binary reader) and IonReaderBinaryRawX (the old binary reader) to use Utf8StringDecoderPool and Utf8StringDecoder.
  4. Because IonReaderBinaryRawX requires an additional ByteBuffer for its string decoding (not required by the incremental reader), created PoolableByteBuffer (extends Poolable) and ByteBufferPool (extends Pool) to provide reusable ByteBuffers to this reader.

This takes care of a lot of the expense that comes with instantiating binary readers. To test, I benchmarked a loop like the following on a binary Ion file containing only "foo" (using a manual change to ion-java-benchmark-cli that I plan to add as a proper option at some point):

for (int i = 0; i < 10000; i++) {
    IonReader reader = readerBuilder.build(buffer);
    fullyTraverse(reader, false);
    reader.close();
}

Results before the proposed change:

Benchmark                                               (input)                                                    (options)  Mode  Cnt          Score         Error   Units
Bench.run                                   singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3       6136.583 ±    1525.138   us/op
Bench.run:Heap usage                        singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3        218.982 ±    4715.263      MB
Bench.run:Serialized size                   singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3         ≈ 10⁻⁵                    MB
Bench.run:·gc.alloc.rate                    singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3      14709.968 ±    3647.687  MB/sec
Bench.run:·gc.alloc.rate.norm               singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3   99440016.278 ±       0.260    B/op
Bench.run:·gc.churn.PS_Eden_Space           singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3      14694.721 ±    4046.463  MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3   99335597.947 ± 3029246.426    B/op
Bench.run:·gc.churn.PS_Survivor_Space       singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3          0.144 ±       0.273  MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3        971.336 ±    1938.260    B/op
Bench.run:·gc.count                         singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3        703.000                counts
Bench.run:·gc.time                          singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3        272.000                    ms

Bench.run                                   singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3       7526.666 ±     588.629   us/op
Bench.run:Heap usage                        singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3        700.571 ±    3701.472      MB
Bench.run:Serialized size                   singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3         ≈ 10⁻⁵                    MB
Bench.run:·gc.alloc.rate                    singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3      16081.923 ±    1245.981  MB/sec
Bench.run:·gc.alloc.rate.norm               singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3  133360016.341 ±       0.232    B/op
Bench.run:·gc.churn.PS_Eden_Space           singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3      16022.834 ±    1376.757  MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3  132869899.349 ± 2851250.571    B/op
Bench.run:·gc.churn.PS_Survivor_Space       singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3          0.187 ±       0.541  MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3       1552.268 ±    4454.582    B/op
Bench.run:·gc.count                         singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3        674.000                counts
Bench.run:·gc.time                          singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3        271.000                    ms

Results after the change:

Benchmark                                               (input)                                                    (options)  Mode  Cnt         Score         Error   Units
Bench.run                                   singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3      2492.856 ±     199.448   us/op
Bench.run:Heap usage                        singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3       181.488 ±    4739.342      MB
Bench.run:Serialized size                   singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3        ≈ 10⁻⁵                    MB
Bench.run:·gc.alloc.rate                    singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3      5971.134 ±     476.275  MB/sec
Bench.run:·gc.alloc.rate.norm               singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3  16400016.116 ±       0.010    B/op
Bench.run:·gc.churn.PS_Eden_Space           singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3      5956.038 ±     873.075  MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3  16358390.389 ± 1170599.800    B/op
Bench.run:·gc.churn.PS_Survivor_Space       singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3         0.195 ±       0.633  MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3       536.332 ±    1761.203    B/op
Bench.run:·gc.count                         singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3       676.000                counts
Bench.run:·gc.time                          singleStringFoo.10n      read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:INCREMENTAL}  avgt    3       264.000                    ms

Bench.run                                   singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3      2611.016 ±     239.516   us/op
Bench.run:Heap usage                        singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3        69.573 ±     317.594      MB
Bench.run:Serialized size                   singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3        ≈ 10⁻⁵                    MB
Bench.run:·gc.alloc.rate                    singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3      3031.101 ±     280.326  MB/sec
Bench.run:·gc.alloc.rate.norm               singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3   8720016.116 ±       0.095    B/op
Bench.run:·gc.churn.PS_Eden_Space           singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3      3025.731 ±     308.857  MB/sec
Bench.run:·gc.churn.PS_Eden_Space.norm      singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3   8704552.518 ±   84175.653    B/op
Bench.run:·gc.churn.PS_Survivor_Space       singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3         0.150 ±       0.332  MB/sec
Bench.run:·gc.churn.PS_Survivor_Space.norm  singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3       430.269 ±     918.321    B/op
Bench.run:·gc.count                         singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3       647.000                counts
Bench.run:·gc.time                          singleStringFoo.10n  read::{f:ION_BINARY,t:BUFFER,a:STREAMING,R:NON_INCREMENTAL}  avgt    3       254.000                    ms

I tried with various other file sizes as well, and observed (as expected) that the benefit diminishes as the size of the stream increases and more time is spent on parsing than reader instantiation.

I'm going to continue to experiment with pooling in the readers and writers. At the extreme, we could pool an object that holds everything readers and writers need. I'll determine whether the performance benefits would be worth the complexity.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented Oct 8, 2021

Codecov Report

Merging #388 (870d8f8) into master (e1a24e8) will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #388      +/-   ##
============================================
+ Coverage     66.32%   66.35%   +0.02%     
- Complexity     5348     5360      +12     
============================================
  Files           148      154       +6     
  Lines         22602    22619      +17     
  Branches       4084     4083       -1     
============================================
+ Hits          14990    15008      +18     
  Misses         6252     6252              
+ Partials       1360     1359       -1     
Impacted Files Coverage Δ
...om/amazon/ion/impl/IonReaderBinaryIncremental.java 95.13% <100.00%> (-0.06%) ⬇️
src/com/amazon/ion/impl/IonReaderBinaryRawX.java 80.03% <100.00%> (+<0.01%) ⬆️
...rc/com/amazon/ion/impl/bin/IonRawBinaryWriter.java 91.19% <100.00%> (ø)
...c/com/amazon/ion/impl/bin/utf8/ByteBufferPool.java 100.00% <100.00%> (ø)
src/com/amazon/ion/impl/bin/utf8/Pool.java 100.00% <100.00%> (ø)
src/com/amazon/ion/impl/bin/utf8/Poolable.java 100.00% <100.00%> (ø)
...m/amazon/ion/impl/bin/utf8/PoolableByteBuffer.java 100.00% <100.00%> (ø)
...om/amazon/ion/impl/bin/utf8/Utf8StringDecoder.java 100.00% <100.00%> (ø)
...mazon/ion/impl/bin/utf8/Utf8StringDecoderPool.java 100.00% <100.00%> (ø)
...om/amazon/ion/impl/bin/utf8/Utf8StringEncoder.java 100.00% <100.00%> (+8.10%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1a24e8...870d8f8. Read the comment docs.

@tgregg
Copy link
Contributor Author

tgregg commented Oct 8, 2021

I reviewed the results of the failing performance regression detector task, and I'm comfortable with them. It detected a very small regression (0.847ms vs 0.860ms, 0.793ms vs 0.803 ms) in two of the three binary read tests, and an improvement in the third (0.926ms vs 0.671ms). In all cases, the slower data point had a super high % error (>40%), indicating either that something may have been competing for resources on the host or that we need to run more iterations, more warmups, throw out runs with a % error above a certain threshold, etc. Note: these tests currently instantiate a single reader, read a small amount of data (~50KB), and close the reader. No performance impact is expected; I confirmed this locally with my ion-java-benchmark-cli runs.

Copy link
Contributor

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 👍

* Base class for types that may be pooled.
* @param <T> the concrete type.
*/
abstract class Poolable<T extends Poolable<T>> implements Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type bound is to ensure that only subclasses of Poolable can implement Poolable, right? Which provides the guarantee that you'll get instantiated with reference to an appropriate Pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it allows us to guarantee that we're getting a Pool<T> (i.e. a Pool for this type of Poolable) in the constructor.

@tgregg tgregg merged commit 2518f31 into master Oct 11, 2021
@tgregg tgregg deleted the reader-decoder-pool branch October 11, 2021 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use a memory pool for binary reader string decoding buffers

3 participants