-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][broker] Improve the topic loading observability and performance #24577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
BewareMyPower
wants to merge
14
commits into
apache:master
from
BewareMyPower:bewaremypower/topic-policies-temp-stuck
Closed
[improve][broker] Improve the topic loading observability and performance #24577
BewareMyPower
wants to merge
14
commits into
apache:master
from
BewareMyPower:bewaremypower/topic-policies-temp-stuck
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0214bd2 to
4ffe9e8
Compare
Contributor
Author
|
I will write a PIP for the topic initialization optimization later, close this PR |
Contributor
Author
|
Regarding the duplicated ownership check method, I will use split it into a smaller PR. |
Contributor
Author
|
I'm splitting this PR into multiple small PRs, the 1st one is #24780 |
Contributor
Author
|
2nd PR: #24785 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/broker
doc-not-needed
Your PR changes do not impact docs
release/4.0.7
type/enhancement
The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
I observed a topic has been loaded for more than 30 seconds in production environment recently after its namespace bundle ownership was transferred. From the logs and heap dump, the conclusion is that the topic has been stuck before calling
asyncOpenincreatePersistentTopic0.After revisiting the long code path of topic loading, I found many places are not efficient, and the existing
pulsar_topic_load_timesmetric is incorrect. This metric only counts the time from the beginning ofcreatePersistentTopic0. However, there are some other time-consuming tasks before it's called.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Line 1171 in c96f27a
It waits for topic policies of this topic are available before inserting a topic future to
BrokerService#topics. In extreme cases, there could be many concurrentTopicPoliciesService#getTopicPoliciesAsynccall.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Line 1670 in c96f27a
In
loadOrCreatePersistentTopic, it will check the ownership viacheckTopicNsOwnershipbefore acquiring the topic load semaphore here:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Line 1674 in c96f27a
This violates the semantics of the
maxConcurrentTopicLoadRequestconfig.After acquiring the semaphore, it calls
checkOwnershipAndCreatePersistentTopic, which validates the ownership again viaNamespaceService#isServiceUnitActiveAsync:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Line 1739 in c96f27a
Actually, the implementation of
isServiceUnitActiveAsyncis exactly the same withcheckTopicNsOwnership, where the only difference is that the previous one returns a boolean future, while the latter one returns a failed future if it's false.Even after that, it could fetch the topic policies before
createPersistentTopic0:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Line 1745 in c96f27a
Modifications
Major changes:
getTopicPoliciesBypassSystemTopiccall before inserting a topic future. This method is only used ingetManagedLedgerConfig.BrokerService#getTopicwill succeed even when the topic's bundle is not owned. These tests could pass just because the system topic reader creation will trigger acquiring the bundles in the same namespace.getManagedLedgerConfigandfetchPartitionedTopicMetadataAsyncrepeatedly by executing these tasks before other tasks that depend on them.checkMaxTopicsPerNamespace,checkTopicAlreadyMigrated,validateTopicConsistency.Though many tasks are metadata store access with
MetadataCacheused, executing them concurrently is still more efficient.For observability,
pulsar_topic_load_timesmetric.hasMessageAvailableandreadNextloop could have poor performance when CPU pressure is high. This read loop is also too heavy.Other changes and refactoring:
TopicNameinstance across the whole flow to avoid unnecessary conversions betweenTopicNameandString.isTransactionInternalNameat the beginningisServiceUnitActiveAsyncwithcheckTopicNsOwnershipand remove this method to avoid users using this method, which makes code hard to read.failTopicFutureto invalidate the topic cache for failures during topic loadingDocumentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: BewareMyPower#50