-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][broker] Avoid reconnection when a partitioned topic was created concurrently #16043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
BewareMyPower
merged 3 commits into
apache:master
from
BewareMyPower:bewaremypower/reduce-warn-logs
Jun 14, 2022
Merged
[improve][broker] Avoid reconnection when a partitioned topic was created concurrently #16043
BewareMyPower
merged 3 commits into
apache:master
from
BewareMyPower:bewaremypower/reduce-warn-logs
Jun 14, 2022
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ated concurrently
### Motivation
When a partitioned topic was created concurrently, especially when
automatically created by many producers. This case can be reproduced
easily by configuring `allowAutoTopicCreationType=non-partitioned` and
starting a Pulsar standalone. Then, run the following code:
```java
try (PulsarClient client = PulsarClient.builder()
.serviceUrl("pulsar://localhost:6650").build()) {
for (int i = 0; i < 10; i++) {
client.newProducer().topic("topic").createAsync();
}
Thread.sleep(1000);
}
```
We can see a lot of "Could not get connection while
getPartitionedTopicMetadata" warning logs at client side, while there
were more warning logs with full stack traces at broker side:
```
2022-06-14T02:04:20,522+0800 [metadata-store-22-1] WARN org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata [/127.0.0.1:64846] persistent://public/default/topic: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic
org.apache.pulsar.metadata.api.MetadataStoreException$AlreadyExistsException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic
```
It's because when broker handles the partitioned metadata command, it
calls `fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync` and
will try creating a partitioned topic if it doesn't exist. It's a race
condition that if many connections are established during a short time
interval and one of them created successfully, the following will fail
with the `AlreadyExistsException`.
### Modifications
Handles the `MetadataStoreException.AlreadyExistsException` in
`unsafeGetPartitionedTopicMetadataAsync`. In this case, invoke
`fetchPartitionedTopicMetadataAsync` to get the partitioned metadata
again.
### Verifying this change
Even if without this patch, the creation of producers could also succeed
because they will reconnect to broker again after 100 ms because broker
will return a `ServiceNotReady` error in thiss case. The only way to
verify this fix is reproducing the bug again with this patch, we can
see no reconnection will happen from the logs.
Member
|
To me it would help if this PR were tested using OMB which creates many topic-partitions at once from several clients. |
RobertIndie
reviewed
Jun 14, 2022
pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/PersistentTopicsBase.java
Outdated
Show resolved
Hide resolved
… was created concurrently" This reverts commit c259c0f.
…lowAutoCreationAsync
lhotari
approved these changes
Jun 14, 2022
Member
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good catch @BewareMyPower
RobertIndie
approved these changes
Jun 14, 2022
codelipenghui
pushed a commit
that referenced
this pull request
Jun 28, 2022
…ated concurrently (#16043) * [improve][broker] Avoid reconnection when a partitioned topic was created concurrently ### Motivation When a partitioned topic was created concurrently, especially when automatically created by many producers. This case can be reproduced easily by configuring `allowAutoTopicCreationType=non-partitioned` and starting a Pulsar standalone. Then, run the following code: ```java try (PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://localhost:6650").build()) { for (int i = 0; i < 10; i++) { client.newProducer().topic("topic").createAsync(); } Thread.sleep(1000); } ``` We can see a lot of "Could not get connection while getPartitionedTopicMetadata" warning logs at client side, while there were more warning logs with full stack traces at broker side: ``` 2022-06-14T02:04:20,522+0800 [metadata-store-22-1] WARN org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata [/127.0.0.1:64846] persistent://public/default/topic: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic org.apache.pulsar.metadata.api.MetadataStoreException$AlreadyExistsException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic ``` It's because when broker handles the partitioned metadata command, it calls `fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync` and will try creating a partitioned topic if it doesn't exist. It's a race condition that if many connections are established during a short time interval and one of them created successfully, the following will fail with the `AlreadyExistsException`. ### Modifications Handles the `MetadataStoreException.AlreadyExistsException` in `unsafeGetPartitionedTopicMetadataAsync`. In this case, invoke `fetchPartitionedTopicMetadataAsync` to get the partitioned metadata again. ### Verifying this change Even if without this patch, the creation of producers could also succeed because they will reconnect to broker again after 100 ms because broker will return a `ServiceNotReady` error in thiss case. The only way to verify this fix is reproducing the bug again with this patch, we can see no reconnection will happen from the logs. * Revert "[improve][broker] Avoid reconnection when a partitioned topic was created concurrently" This reverts commit c259c0f. * Handle AlreadyExistsException in fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync (cherry picked from commit 2a7a855)
mattisonchao
pushed a commit
that referenced
this pull request
Jul 2, 2022
…ated concurrently (#16043) * [improve][broker] Avoid reconnection when a partitioned topic was created concurrently ### Motivation When a partitioned topic was created concurrently, especially when automatically created by many producers. This case can be reproduced easily by configuring `allowAutoTopicCreationType=non-partitioned` and starting a Pulsar standalone. Then, run the following code: ```java try (PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://localhost:6650").build()) { for (int i = 0; i < 10; i++) { client.newProducer().topic("topic").createAsync(); } Thread.sleep(1000); } ``` We can see a lot of "Could not get connection while getPartitionedTopicMetadata" warning logs at client side, while there were more warning logs with full stack traces at broker side: ``` 2022-06-14T02:04:20,522+0800 [metadata-store-22-1] WARN org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata [/127.0.0.1:64846] persistent://public/default/topic: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic org.apache.pulsar.metadata.api.MetadataStoreException$AlreadyExistsException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic ``` It's because when broker handles the partitioned metadata command, it calls `fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync` and will try creating a partitioned topic if it doesn't exist. It's a race condition that if many connections are established during a short time interval and one of them created successfully, the following will fail with the `AlreadyExistsException`. ### Modifications Handles the `MetadataStoreException.AlreadyExistsException` in `unsafeGetPartitionedTopicMetadataAsync`. In this case, invoke `fetchPartitionedTopicMetadataAsync` to get the partitioned metadata again. ### Verifying this change Even if without this patch, the creation of producers could also succeed because they will reconnect to broker again after 100 ms because broker will return a `ServiceNotReady` error in thiss case. The only way to verify this fix is reproducing the bug again with this patch, we can see no reconnection will happen from the logs. * Revert "[improve][broker] Avoid reconnection when a partitioned topic was created concurrently" This reverts commit c259c0f. * Handle AlreadyExistsException in fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync (cherry picked from commit 2a7a855)
nicoloboschi
pushed a commit
to datastax/pulsar
that referenced
this pull request
Jul 4, 2022
…ated concurrently (apache#16043) * [improve][broker] Avoid reconnection when a partitioned topic was created concurrently ### Motivation When a partitioned topic was created concurrently, especially when automatically created by many producers. This case can be reproduced easily by configuring `allowAutoTopicCreationType=non-partitioned` and starting a Pulsar standalone. Then, run the following code: ```java try (PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://localhost:6650").build()) { for (int i = 0; i < 10; i++) { client.newProducer().topic("topic").createAsync(); } Thread.sleep(1000); } ``` We can see a lot of "Could not get connection while getPartitionedTopicMetadata" warning logs at client side, while there were more warning logs with full stack traces at broker side: ``` 2022-06-14T02:04:20,522+0800 [metadata-store-22-1] WARN org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata [/127.0.0.1:64846] persistent://public/default/topic: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic org.apache.pulsar.metadata.api.MetadataStoreException$AlreadyExistsException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic ``` It's because when broker handles the partitioned metadata command, it calls `fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync` and will try creating a partitioned topic if it doesn't exist. It's a race condition that if many connections are established during a short time interval and one of them created successfully, the following will fail with the `AlreadyExistsException`. ### Modifications Handles the `MetadataStoreException.AlreadyExistsException` in `unsafeGetPartitionedTopicMetadataAsync`. In this case, invoke `fetchPartitionedTopicMetadataAsync` to get the partitioned metadata again. ### Verifying this change Even if without this patch, the creation of producers could also succeed because they will reconnect to broker again after 100 ms because broker will return a `ServiceNotReady` error in thiss case. The only way to verify this fix is reproducing the bug again with this patch, we can see no reconnection will happen from the logs. * Revert "[improve][broker] Avoid reconnection when a partitioned topic was created concurrently" This reverts commit c259c0f. * Handle AlreadyExistsException in fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync (cherry picked from commit 2a7a855) (cherry picked from commit cc9ff59)
BewareMyPower
added a commit
that referenced
this pull request
Jul 27, 2022
…ated concurrently (#16043) * [improve][broker] Avoid reconnection when a partitioned topic was created concurrently ### Motivation When a partitioned topic was created concurrently, especially when automatically created by many producers. This case can be reproduced easily by configuring `allowAutoTopicCreationType=non-partitioned` and starting a Pulsar standalone. Then, run the following code: ```java try (PulsarClient client = PulsarClient.builder() .serviceUrl("pulsar://localhost:6650").build()) { for (int i = 0; i < 10; i++) { client.newProducer().topic("topic").createAsync(); } Thread.sleep(1000); } ``` We can see a lot of "Could not get connection while getPartitionedTopicMetadata" warning logs at client side, while there were more warning logs with full stack traces at broker side: ``` 2022-06-14T02:04:20,522+0800 [metadata-store-22-1] WARN org.apache.pulsar.broker.service.ServerCnx - Failed to get Partitioned Metadata [/127.0.0.1:64846] persistent://public/default/topic: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic org.apache.pulsar.metadata.api.MetadataStoreException$AlreadyExistsException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /admin/partitioned-topics/public/default/persistent/topic ``` It's because when broker handles the partitioned metadata command, it calls `fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync` and will try creating a partitioned topic if it doesn't exist. It's a race condition that if many connections are established during a short time interval and one of them created successfully, the following will fail with the `AlreadyExistsException`. ### Modifications Handles the `MetadataStoreException.AlreadyExistsException` in `unsafeGetPartitionedTopicMetadataAsync`. In this case, invoke `fetchPartitionedTopicMetadataAsync` to get the partitioned metadata again. ### Verifying this change Even if without this patch, the creation of producers could also succeed because they will reconnect to broker again after 100 ms because broker will return a `ServiceNotReady` error in thiss case. The only way to verify this fix is reproducing the bug again with this patch, we can see no reconnection will happen from the logs. * Revert "[improve][broker] Avoid reconnection when a partitioned topic was created concurrently" This reverts commit c259c0f. * Handle AlreadyExistsException in fetchPartitionedTopicMetadataCheckAllowAutoCreationAsync (cherry picked from commit 2a7a855)
BewareMyPower
added a commit
that referenced
this pull request
Jul 29, 2022
4 tasks
Contributor
Author
|
Move the |
BewareMyPower
added a commit
that referenced
this pull request
Aug 1, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cherry-picked/branch-2.9
Archived: 2.9 is end of life
cherry-picked/branch-2.10
doc-not-needed
Your PR changes do not impact docs
release/2.9.4
release/2.10.2
type/enhancement
The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
When a partitioned topic was created concurrently, especially when
automatically created by many producers. This case can be reproduced
easily by configuring
allowAutoTopicCreationType=non-partitionedandstarting a Pulsar standalone. Then, run the following code:
We can see a lot of "Could not get connection while
getPartitionedTopicMetadata" warning logs at client side, while there
were more warning logs with full stack traces at broker side:
It's because when broker handles the partitioned metadata command, it
calls
fetchPartitionedTopicMetadataCheckAllowAutoCreationAsyncandwill try creating a partitioned topic if it doesn't exist. It's a race
condition that if many connections are established during a short time
interval and one of them created successfully, the following will fail
with the
AlreadyExistsException.Modifications
Handles the
MetadataStoreException.AlreadyExistsExceptioninunsafeGetPartitionedTopicMetadataAsync. In this case, invokefetchPartitionedTopicMetadataAsyncto get the partitioned metadataagain.
Verifying this change
Even if without this patch, the creation of producers could also succeed
because they will reconnect to broker again after 100 ms because broker
will return a
ServiceNotReadyerror in thiss case. The only way toverify this fix is reproducing the bug again with this patch, we can
see no reconnection will happen from the logs.
Documentation
Check the box below or label this PR directly.
Need to update docs?
doc-required(Your PR needs to update docs and you will update later)
doc-not-needed(Please explain why)
doc(Your PR contains doc changes)
doc-complete(Docs have been already added)