[improve][broker] Improve the performance of TopicName constructor#24463
Conversation
|
Before 48ffb7a, there is a constructor parameter that determines whether to initialize the The benchmark result is: As you can see
|
I wouldn't trust JMH results from benchmarking on Mac OS. In the case of PR #24457, the results are very different when benchmarking on Linux x86_64 with Intel i9 processor (Dell XPS 2019 laptop). |
|
@lhotari So could you help run benchmark in your Linux environment to see the difference? I can also create a workflow via GitHub Actions to see the result. |
I don't want to debate about it in this PR. If you have a chance to look at how The ultimate solution is to create some utils methods instead of leveraging the |
|
Updated test results to compare it with the legacy implementation in https://github.com/BewareMyPower/pulsar/actions/runs/15872434081/job/44752001433?pr=45
|
48ffb7a to
355140a
Compare
|
@lhotari @codelipenghui @nodece @dao-jun @poorbarcode @coderzc This PR is now ready to review, PTAL |
|
@BewareMyPower My point from an earlier comment isn't addressed:
In this case, I think it's irrelevant to just benchmark the performance of creating/looking up TopicName instances. Duplicate I do agree that a lot of the TopicName handling code is a mess. For example in Topic listing, the topic name is converted from/to String multiple times. That's adding up a lot of pressure for a fast lookup solution when instances are cached. So I'm not against changing the current caching, it's just that there's a need to consider duplicate instances as well. It's likely that the caching is more relevant for NamespaceName than TopicName regarding duplicate instances. We might not be keeping a reference to TopicName instance in that many places when broker is running. |
I've read more code and changed my idea a bit. Caching could be helpful in many cases. But how to establish the cache might depend on specific use cases. Writing a common cache is hard. I don't like the solution in #23052, but it's anyway good to resolve the issue encountered at that time. As I've mentioned here, exposing the public method is helpful for downstream to construct its own cache. In addition, the
It's just an example, |
|
Regarding this PR, I'm going to revert other changes and only leaving the improvement on Exposing a public method will be easier for the downstream to maintain its custom cache, but it will also be confusing for the core Pulsar developers to make the decision on Anyway, we should improve the use of
The In this case, |
|
This PR will be an improvement. Regarding the argument I made in a previous comment about duplicate tenant and namespace |
lhotari
left a comment
There was a problem hiding this comment.
For deduplicating the tenant and namespace String instances, it would be useful to assign the tenant and namespacePortion fields from the NamespaceName instance. This wouldn't add much overhead, but benefit in reducing the amount of heap memory since there would be less duplication of java.lang.String instances at runtime.
e147446 to
4755500
Compare
|
@lhotari Do you have a chance to review this PR again according to #25367 (comment)? I've minimize the change of this PR to only optimize the constructor of the |
|
Updated benchmark can be found here: https://github.com/BewareMyPower/JavaBenchmark/actions/runs/24712104796/job/72279087737 my local run: It's 2.5x ~ 2.8x faster. Actually, the performance is also affected by the slow It's 7x faster |
lhotari
left a comment
There was a problem hiding this comment.
LGTM, great work @BewareMyPower
|
I backported this to maintenance branches too. |
One additional detail to check is to see if optimizing org.apache.pulsar.common.naming.TopicDomain#getEnum would help. A general performance advice has been in the past to avoid calling Enum's something like this: public static TopicDomain getEnum(String value) {
if (persistent.value.equalsIgnoreCase(value)) {
return persistent;
}
if (non_persistent.value.equalsIgnoreCase(value)) {
return non_persistent;
}
if (topic.value.equalsIgnoreCase(value)) {
return topic;
}
if (segment.value.equalsIgnoreCase(value)) {
return segment;
}
throw new IllegalArgumentException("Invalid topic domain: '" + value + "'");
}I'm not sure if this type of optimization applies to modern JVMs. |
…pache#24463) (cherry picked from commit 3130a93) (cherry picked from commit fbbfe87)
…pache#24463) (cherry picked from commit 3130a93) (cherry picked from commit dbf7626)
Motivation
The
TopicName's constructor has poor performance:NamespaceName#getis very slowSplitter.on("/").splitToList(rest)is slowString.formatis slower than the+operation on strings andcompleteTopicNameis unnecessarily created againModifications
Initialize(don't do that because it assumes the constructor is called more frequently thanNamespaceNamein a lazy waygetNamespaceorgetNamespaceObject)Splitterwith a manually writtensplitBySlashintroduced from [fix][proxy] Fix proxy OOM by replacing TopicName with a simple conversion method #24465. Actually,StringUtils#splithas good performance as well. But it will split"//xxx/yyy/zzz"to["xxx", "yyy", "zzz"]without reporting any error.completeTopicNamefrom the argument directly without any concentrate operationReplaceString.formatwith+infromPersistenceNamingEncodingNamespaceNameandfromPersistenceNamingEncodingwill not be handled in this PR.Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: BewareMyPower#44