Skip to content
This repository was archived by the owner on Jan 24, 2024. It is now read-only.

Conversation

@BewareMyPower
Copy link
Collaborator

@BewareMyPower BewareMyPower commented Jul 26, 2021

Motivation

In K8s environment, the start of a KoP node might depend on a successful start of another KoP node. It's caused by the lookup request in loadOffsetTopics, here're the logs:

13:33:11.584 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 88
13:33:11.593 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 8
13:33:11.673 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 78
13:33:11.685 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 9
13:33:11.767 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 80
13:33:11.771 [main] ERROR io.streamnative.pulsar.handlers.kop.KafkaProtocolHandler - initGroupCoordinator failed with
org.apache.pulsar.client.admin.PulsarAdminException: java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: Could not complete the operation. Number of retries has been exhausted. Failed reason: test-broker-1.test-broker-headless.kop-test.svc.cluster.local
reason: test-broker-1.test-broker-headless.kop-test.svc.cluster.local
	at org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:247) ~[io.streamnative-pulsar-client-admin-original-2.8.0-rc-202106071430.jar:2.8.0-rc-202106071430]
	at org.apache.pulsar.client.admin.internal.LookupImpl$1.failed(LookupImpl.java:85) ~[io.streamnative-pulsar-client-admin-original-2.8.0-rc-202106071430.jar:2.8.0-rc-202106071430]

10.8.5.72 is the IP of another KoP node that had not started completely.

Actually loadOffsetTopics is not necessary, when a bundle load event happened, the offsets will be loaded in the bundle ownership listener. There's no reason to load the offsets in start() method, also when Kafka starts, it doesn't load offsets.

Modifications

The most important change is to remove the loadOffsetTopics method.

In addition, this PR does some code factors to KafkaProtocolHandler:

  • Remove unused fields
  • Don't catch unchecked exceptions to make start fail fast
  • Adjust the order of fields and methods to meet Codacy's requirement

@BewareMyPower BewareMyPower requested a review from jiazhai as a code owner July 26, 2021 12:48
@BewareMyPower BewareMyPower self-assigned this Jul 26, 2021
@BewareMyPower BewareMyPower added type/bug type/enhancement Indicates an improvement to an existing feature and removed type/bug labels Jul 26, 2021
@BewareMyPower BewareMyPower changed the title Optimize the protocol handler start [WIP] Optimize the protocol handler start Jul 27, 2021
@BewareMyPower
Copy link
Collaborator Author

Need to confirm if the tests became flaky after my change.

@BewareMyPower
Copy link
Collaborator Author

The GroupCoordinatorTest and GroupMetadataTest are flaky because of TwoPhaseCompactor$MockitoMock issues and can be fixed by #628. I'd rebase to master once #628 was merged.

@BewareMyPower BewareMyPower force-pushed the bewaremypower/optimize-kop-start branch from f969eae to 5cd99dc Compare July 29, 2021 03:54
@BewareMyPower BewareMyPower changed the title [WIP] Optimize the protocol handler start Optimize the protocol handler start Jul 29, 2021
@BewareMyPower
Copy link
Collaborator Author

@jiazhai PTAL

@jiazhai jiazhai merged commit c054a02 into streamnative:master Aug 2, 2021
@BewareMyPower BewareMyPower deleted the bewaremypower/optimize-kop-start branch August 2, 2021 10:19
@codelipenghui
Copy link
Collaborator

@BewareMyPower We need this one in the following 2.8.x release, do we need to cherry-pick it to branch-2.8?

BewareMyPower added a commit that referenced this pull request Aug 5, 2021
### Motivation

In K8s environment, the start of a KoP node might depend on a successful start of another KoP node. It's caused by the lookup request in `loadOffsetTopics`, here're the logs:

```
13:33:11.584 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 88
13:33:11.593 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 8
13:33:11.673 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 78
13:33:11.685 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 9
13:33:11.767 [pulsar-2-1] INFO  org.eclipse.jetty.server.RequestLog - 10.8.5.72 - - [08/Jul/2021:13:33:11 +0000] "GET /lookup/v2/topic/persistent/public/__kafka/__consumer_offsets-partition-0 HTTP/1.1" 307 0 "-" "Pulsar-Java-v2.8.0-rc-202106071430" 80
13:33:11.771 [main] ERROR io.streamnative.pulsar.handlers.kop.KafkaProtocolHandler - initGroupCoordinator failed with
org.apache.pulsar.client.admin.PulsarAdminException: java.util.concurrent.CompletionException: org.apache.pulsar.client.admin.internal.http.AsyncHttpConnector$RetryException: Could not complete the operation. Number of retries has been exhausted. Failed reason: test-broker-1.test-broker-headless.kop-test.svc.cluster.local
reason: test-broker-1.test-broker-headless.kop-test.svc.cluster.local
	at org.apache.pulsar.client.admin.internal.BaseResource.getApiException(BaseResource.java:247) ~[io.streamnative-pulsar-client-admin-original-2.8.0-rc-202106071430.jar:2.8.0-rc-202106071430]
	at org.apache.pulsar.client.admin.internal.LookupImpl$1.failed(LookupImpl.java:85) ~[io.streamnative-pulsar-client-admin-original-2.8.0-rc-202106071430.jar:2.8.0-rc-202106071430]
```

10.8.5.72 is the IP of another KoP node that had not started completely.

Actually `loadOffsetTopics` is not necessary, when a bundle load event happened, the offsets will be loaded in the bundle ownership listener. There's no reason to load the offsets in `start()` method, also when Kafka starts, it doesn't load offsets.

### Modifications

The most important change is to remove the `loadOffsetTopics` method.

In addition, this PR does some code factors to `KafkaProtocolHandler`:
- Remove unused fields
- Don't catch unchecked exceptions to make `start` fail fast
- Adjust the order of fields and methods to meet Codacy's requirement
BewareMyPower pushed a commit that referenced this pull request Nov 3, 2021
Fixes: #761

### Motivation

In K8S environment, load transaction topics when start KafkaProtocolHandler might crash, because to start a KoP node might depend on a successful start of another KoP node.

See Related PR : #627

### Modification

The most important change is to remove the `loadTxnLogTopics ` method.
Addition change is add a method to wait txnLog load in units test.
BewareMyPower pushed a commit that referenced this pull request Nov 5, 2021
Fixes: #761

### Motivation

In K8S environment, load transaction topics when start KafkaProtocolHandler might crash, because to start a KoP node might depend on a successful start of another KoP node.

See Related PR : #627

### Modification

The most important change is to remove the `loadTxnLogTopics ` method.
Addition change is add a method to wait txnLog load in units test.
BewareMyPower pushed a commit that referenced this pull request Nov 5, 2021
Fixes: #761

### Motivation

In K8S environment, load transaction topics when start KafkaProtocolHandler might crash, because to start a KoP node might depend on a successful start of another KoP node.

See Related PR : #627

### Modification

The most important change is to remove the `loadTxnLogTopics ` method.
Addition change is add a method to wait txnLog load in units test.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

type/enhancement Indicates an improvement to an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants