Only initiate PatchPoint when needed. by linlin-s · Pull Request #565 · amazon-ion/ion-java

linlin-s · 2023-09-14T05:54:47Z

Issue #, if available:
N/A

Note: This is the implementation on top of PR #521

Description of changes:

Previously, we allocated a placeholder patchpoint for every container. During the popContainer process, these placeholders were assigned the appropriate patchpoint values. However, for conditions that didn't require a patchpoint, we had to reclaim these placeholder patchpoints. To eliminate the need to reclaim unused placeholder patchpoints, we implemented the following changes:

Instead of initializing a placeholder patchpoint for each container, we now maintain an index of the patchpoint associated with each container. By default, the patchpoint index is set to -1, indicating that no patchpoint value has been assigned to that container.

During the popContainer process, child containers are popped first, while their parent containers remain in the stack. If the current container (the child container) meets a condition that requires a patchpoint, it implies that its ancestors also need patchpoints. At this point, we trace back to its ancestors and allocate a placeholder patchpoint to each ancestor and assign patchIndex for each ancestor container. This continues until we encounter an ancestor with an assigned patchpoint. We will replace the placeholder values with the correct data while we pop the ancestors. In order to test if the new changes show the expected performance improvements, we also include the benchmark results from using a generated test data to benchmark the new implementation. The generated test data contains a stream of 500000 nested container values, and all container requires patchpoint allocation.

Results summary:
When we benchmark the new implementation with the container-only test data, we found 7.12% performance regression of the current implementation comparing to the previous implementation(patch list as a single contiguous array). However, when we use the real world data for benchmarking, we found that comparing to the previous implementation there is 5.54% speed improvement using test data log_59155.ion, and 6.12% speed improvement using test data log_194617.ion. Overall, after the new implementations on patchpoints, we gained 1.84% improvement comparing to the original implementation on dataset log_59155.ion, 4.38% improvement on dataset log_194617.ion and 28.23% on dataset generatedContainerOnlyTestData.10n.

Test Data	No change	Patchpoints as Single Contiguous Array	PatchPoint Initialization Optimization (Current)	Improvement Comparing to No Change	Improvement Comparing to Patchpoints as Single Contiguous Array
log_59155.ion	507.505	526.728	498.13	1.84%	5.42%
log_194617.ion	4108.958	4185.182	3928.665	4.38%	6.12%
generatedContainerOnlyTestData.10n	1060.129	710.243	760.815	28.23%	-7.12%

Next step:
For the next step, we should investigate on finding the reason that caused performance regression while benchmarking with the container-only test data. By comparing the profiling results from those two implementations, we will find more insights on how to improve the current implementation.

Full benchmark results :
Benchmark a write of data equivalent to a of stream of 500000 nested container values using IonWriter(binary). The output data will write into an in-memory buffer. (3 forks, 2 warmups, 2 iterations, preallocation 1)

Benchmark	No Change	Patchpoints as Single Contiguous Array	PatchPoint Initialization Optimization(current)	Units
Bench.run	1060.129	710.243	760.815	ms/op
Bench.run:Heap usage	2494.653	1835.835	1542.986	MB
Bench.run:Serialized size	219	219	219	MB
Bench.run:·gc.alloc.rate	710.967	928.757	867.017	MB/sec
Bench.run:·gc.alloc.rate.norm	827411053	725002740	725002290	B/op
Bench.run:·gc.churn.G1_Eden_Space	295.055	290.346	271.915	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	343828070	226638884	227370070	B/op
Bench.run:·gc.churn.G1_Old_Gen	146.525	194.51	164.57	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	170611935	151583561	137733978	B/op
Bench.run:·gc.churn.G1_Survivor_Space	12.524	21.643	25.668	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	14518855.5	16913286.1	21485215.9	B/op
Bench.run:·gc.count	83	137	128	counts
Bench.run:·gc.time	21521	7134	7007	ms

Benchmark a write of data equivalent to a of stream of 194617 nested binary data using IonWriter(binary). The output data will write into an in-memory buffer. (3 forks, 2 warmups, 2 iterations, preallocation 1)

Benchmark	No Change	Patchpoints as Single Contiguous Array	PatchPoint Initialization Optimization(current)	Units
Bench.run	4108.958	4185.182	3928.665	ms/op
Bench.run:Heap usage	3082.915	2621.584	2626.018	MB
Bench.run:Serialized size	201.663	201.663	201.663	MB
Bench.run:·gc.alloc.rate	155.321	152.185	159.247	MB/sec
Bench.run:·gc.alloc.rate.norm	703012445	696433814.2	685816196	B/op
Bench.run:·gc.churn.G1_Eden_Space	28.499	34.858	34.25	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	129324373	159616568.9	147499691	B/op
Bench.run:·gc.churn.G1_Old_Gen	49.755	74.813	79.79	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	225884103	342237838.2	343763940	B/op
Bench.run:·gc.churn.G1_Survivor_Space	2.343	0.176	2.934	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	10555428.9	815559.111	12582912	B/op
Bench.run:·gc.count	13	12	14	counts
Bench.run:·gc.time	549	210	229	ms

Benchmark a write of data equivalent to a stream of 59155 nested binary Ion values. The output data will write into an in-memory buffer. (3 forks, 2 warmups, 2 iterations, preallocation 1)

Benchmark	No change	Patchpoints as Single Contiguous Array	PatchPoint Initialization Optimization(current)	Units
Bench.run	507.505	526.728	498.13	ms/op
Bench.run:Heap usage	359.78	333.188	409.035	MB
Bench.run:Serialized size	21.271	21.271	21.271	MB
Bench.run:·gc.alloc.rate	125.91	119.233	126.107	MB/sec
Bench.run:·gc.alloc.rate.norm	70381489.1	69094211.64	69108945.6	B/op
Bench.run:·gc.churn.G1_Eden_Space	7.382	3.498	3.247	MB/sec
Bench.run:·gc.churn.G1_Eden_Space.norm	4127727.75	2024487.523	1780914.79	B/op
Bench.run:·gc.churn.G1_Old_Gen	131.22	128.452	129.58	MB/sec
Bench.run:·gc.churn.G1_Old_Gen.norm	73365937	74415311.5	71001078.2	B/op
Bench.run:·gc.churn.G1_Survivor_Space	0.081	0.048	0.035	MB/sec
Bench.run:·gc.churn.G1_Survivor_Space.norm	46186.201	28189.211	19001.263	B/op
Bench.run:·gc.count	95	86	68	counts
Bench.run:·gc.time	728	145	113	ms

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

tgregg · 2023-09-14T19:39:17Z

badGeneratedData.10n

Is this intentionally committed?

Is this intentionally committed?

This shouldn't be committed, will be removed in the next commit.

tgregg · 2023-09-14T19:48:23Z

src/com/amazon/ion/impl/IonReaderBinaryIncremental.java

-        ContainerInfo containerInfo = containerStack.push();
-        containerInfo.type = valueType;
-        containerInfo.endPosition = valueEndPosition;
+        containerStack.push(c -> c.initialize(valueType, valueEndPosition));


This is a change that would warrant performance testing of binary incremental reading to ensure it does not introduce a regression. Do we see extra overhead introduced by creation and invocation of a lambda on each stepIn?

I will check if the change introduce regression on binary incremental reading.

Here is the benchmark results of benchmarking incremental reader with 4 test data, and the results are neutral.

Test Data Before After

log_59155.ion 242.721 249.478

log_194617.ion 1858.668 1854.447

test.json 4273.663 4266.974

catalog.ion 0.476 0.478

singleValue.10n 0.063 0.063

tgregg · 2023-09-14T19:54:52Z

src/com/amazon/ion/impl/bin/IonRawBinaryWriter.java

    {
+        // If we're adding a patch point we first need to ensure that all of our ancestors (containing values) already
+        // have a patch point. No container can be smaller than the contents, so all outer layers also require patches.
+        ListIterator<ContainerInfo> stackIterator = containers.iterator();


This allocates a new $Iterator every time. It looks like we could store and reuse a single instance, which may improve performance.

linlin-s · 2023-09-19T03:21:04Z

Here is the benchmark comparison results between commit ad7c0f9 and 05acf07. According to the results, there is 0.78% performance improvements using test data generatedContainerOnlyTestData.10n, 1.13% performance improvements using test data log_59155.ion, and 0.69% performance improvements using test data log_194617.ion.

Test Data	Before	After	Improvement
generatedContainerOnlyTestData.10n	770.678	764.687	0.78%
log_59155.ion	500.689	495.008	1.13%
log_194617.ion	3994.782	3967.299	0.69%

Here is the overall benchmark results after the change from commit 05acf07. (We re-run ion-java-benchmark-cli on every change to get this version of benchmarking results)

Test Data	No Change	Patchpoints as Single Contiguous Array	PatchPoint Initialization Optimization (Updated)	Improvement Comparing to No Change	Improvement Comparing to Patchpoints as Single Contiguous Array
generatedContainerOnlyTestData.10n	1125.62	709.77	764.687	32.06%	-7.73%
log_59155.ion	506.531	522.596	495.008	2.27%	5.28%
log_194617.ion	4087.047	4192.821	3967.299	2.92%	5.38%

tgregg · 2023-09-19T18:05:57Z

src/com/amazon/ion/impl/_Private_RecyclingStack.java

+        stackIterator = new $Iterator();
+        return stackIterator;


I think this will be cleaner if in this method, you

Allocate the iterator only if it has not yet been allocated, and

Reset the iterator

That way you don't need a public resetIterator method, which you call every time you need to retrieve the iterator anyway. Also, you don't need to store a stackIterator variable in IonRawBinaryWriter, as retrieving it via containers.iterator() where you need it will be sufficient. In other words, iterator reuse can be achieved without any changes to IonRawBinaryWriter.

Thanks for the suggestions. I experimentally updated method iterator() with the following change:
Conditional initialization:

public ListIterator<T> iterator() { if (stackIterator != null) { stackIterator.cursor = _Private_RecyclingStack.this.currentIndex; } else { stackIterator = new $Iterator(); } return stackIterator; }

One concern I have for this change is we need to run the if...else... condition check every time we call containers.iterator().
Unconditional initialization:

Then I tried to initialize $Iterator() while constructing the Recycling_Stack and iterator() method only reset the cursor.

Unconditional initialization:

public final class _Private_RecyclingStack<T> implements Iterable<T> { private $Iterator stackIterator; public ListIterator<T> iterator() { stackIterator.cursor = _Private_RecyclingStack.this.currentIndex; return stackIterator; } //Here is where we initialize the `stackIterator` private $Iterator stackIterator; public _Private_RecyclingStack(int initialCapacity, ElementFactory<T> elementFactory) { elements = new ArrayList<>(initialCapacity); this.elementFactory = elementFactory; currentIndex = -1; top = null; stackIterator = new $Iterator(); }

This change might cause unnecessary iterator allocation while we do not need to iterate the stack. From the benchmark results the first method (conditional initialize iterator) is more performant. Would there be any other alternative implementations that I wasn't aware of? Thanks

Test Data conditional allocation unconditional allocation

log_59155.ion 496.576 501.664

log_194617.ion 3924.792 3965.73

The if/else is only a problem if it results in a noticeable performance degradation, but it doesn't look like it does. It's surprising that unconditional allocation is slower, but I think the conditional allocation is fine.

Only initiate PatchPoint when needed.

ad7c0f9

jobarr-amzn approved these changes Sep 14, 2023

View reviewed changes

tgregg reviewed Sep 14, 2023

View reviewed changes

Reuse iterator to avoid unnecessary allocation.

05acf07

tgregg reviewed Sep 19, 2023

View reviewed changes

Conditionally allocate iterator.

96ef72d

tgregg approved these changes Sep 20, 2023

View reviewed changes

linlin-s merged commit 6a1cba1 into update-patches Sep 20, 2023

linlin-s added a commit that referenced this pull request Nov 7, 2023

Only initiate PatchPoint when needed. (#565)

a4cb237

linlin-s added a commit that referenced this pull request Nov 7, 2023

Only initiate PatchPoint when needed. (#565)

973b026

linlin-s added a commit that referenced this pull request Nov 7, 2023

Only initiate PatchPoint when needed. (#565)

c902f0a

linlin-s deleted the patchpoint-optimization branch January 16, 2024 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only initiate PatchPoint when needed.#565

Only initiate PatchPoint when needed.#565
linlin-s merged 3 commits intoupdate-patchesfrom
patchpoint-optimization

linlin-s commented Sep 14, 2023

Uh oh!

tgregg Sep 14, 2023

Uh oh!

linlin-s Sep 14, 2023

Uh oh!

tgregg Sep 14, 2023

Uh oh!

linlin-s Sep 14, 2023

Uh oh!

linlin-s Sep 20, 2023

Uh oh!

tgregg Sep 14, 2023

Uh oh!

linlin-s commented Sep 19, 2023

Uh oh!

tgregg Sep 19, 2023

Uh oh!

linlin-s Sep 19, 2023 •

edited

Loading

Uh oh!

tgregg Sep 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Test Data	Before	After
log_59155.ion	242.721	249.478
log_194617.ion	1858.668	1854.447
test.json	4273.663	4266.974
catalog.ion	0.476	0.478
singleValue.10n	0.063	0.063

Conversation

linlin-s commented Sep 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linlin-s commented Sep 19, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

linlin-s Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linlin-s Sep 19, 2023 •

edited

Loading