Conversation
popematt
reviewed
Feb 28, 2024
|
|
||
| // A map of symbol ID to SymbolToken representation. Because most use cases only require symbol text, this | ||
| // is used only if necessary to avoid imposing the extra expense on all symbol lookups. | ||
| private List<SymbolToken> symbolTokensById = null; |
Contributor
There was a problem hiding this comment.
Another solution would be to change this to a Map<Int, SymbolToken> and then lazily populate it as symbols are requested. This means that (a) the size of the cache is determined by the number of unique SymbolTokens that are requested, and (b) each SymbolToken is allocated only once. However, it does incur the cost of looking something up in a Map.
Contributor
Author
There was a problem hiding this comment.
Yeah, I called this out in the PR description too. I'd prefer to remove the cache until it's clear that there is a use case that demands it.
Contributor
There was a problem hiding this comment.
Wow, I don't know how I missed that.
popematt
approved these changes
Feb 28, 2024
… optimization for certain cases but a major regression for others.
4bc4145 to
a7502a7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of changes:
The
symbolTokensByIdcache was added to avoid repeatedly allocatingSymbolTokenwrappers when symbol lookup APIs that returnSymbolTokenwere called. This provided a minor (<10%) speedup due to reduced allocation rate / GC churn in cases where suchSymbolTokenAPIs were called frequently. However, it should be very uncommon for this to be the case, as users tend to use the equivalent APIs that returnStringinstead, as most do not need to deal with symbols with unknown text.There is some usage of these APIs within the IonJava DOM, but when using the DOM, other sources of slowdown tend to dominate. There is, however, a case where this cache can significantly hurt performance within the DOM: when a small binary Ion value is loaded into the DOM from a stream that imports a large shared symbol table. In that case, the cache is allocated to fit all symbols, even the imported ones. This can cause allocation of this large list, and the resulting GC churn, to dominate.
Because the cache only has minor benefits to an uncommon case, I propose simply removing it. If in the future we find the need to optimize this uncommon case, we can consider more robust options, like:
Note: this cache was introduced in the incremental reader implementation made opt-in in v1.9.0, and was ultimately adopted by the unified implementation introduced in 1.11.0.
Performance comparisons:
To demonstrate the regression, I ran the following data through
ion-java-benchmark-cli:In a separate PR, we will add this data and the following benchmark CLI invocation to our regression detection workflow.
v1.10.5
v1.11.3
v1.11.4-SNAPSHOT (with proposed fix)
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.