Skip to content

KAFKA-3645: Fix ConsumerGroupCommand and ConsumerOffsetChecker to correctly read endpoint info from ZK#1301

Closed
arunmahadevan wants to merge 2 commits intoapache:trunkfrom
arunmahadevan:cg_kerb_fix
Closed

KAFKA-3645: Fix ConsumerGroupCommand and ConsumerOffsetChecker to correctly read endpoint info from ZK#1301
arunmahadevan wants to merge 2 commits intoapache:trunkfrom
arunmahadevan:cg_kerb_fix

Conversation

@arunmahadevan
Copy link
Copy Markdown
Contributor

The host and port entries under /brokers/ids/ gets filled only for PLAINTEXT security protocol. For other protocols the host is null and the actual endpoint is under "endpoints". This causes NPE when running the consumer group and offset checker scripts in a kerberized env. By always reading the host and port values from the "endpoint", a more meaningful exception would be thrown rather than a NPE.

The host and port entries under /brokers/ids/<bid> gets filled only for PLAINTEXT security protocol. For other protocols the host is null
and the actual endpoint is under "endpoints". This causes NPE when running the consumer group and offset checker scripts in a
kerberized env. By always reading the host and port values from the "endpoint", a more meaningful exception would be thrown rather than a NPE.
@harshach
Copy link
Copy Markdown

harshach commented May 2, 2016

+1

brokerInfo.getBrokerEndPoint(SecurityProtocol.PLAINTEXT).port,
10000, 100000, "ConsumerGroupCommand"))
case None =>
throw new BrokerNotAvailableException("Broker id %d does not exist".format(brokerId))
Copy link
Copy Markdown
Member

@ijuma ijuma May 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be written a bit more nicely along these lines:

zkUtils.getBrokerInfo(brokerId).map(_.getBrokerEndPoint(SecurityProtocol.PLAINTEXT)).map { endPoint =>
  new SimpleConsumer(...)
}.getOrElse(throw new ....)

Same applies to the other class.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented May 3, 2016

Thanks for the PR, I left a minor comment, looks good otherwise. Can you please update the PR description to mention that this only happens with the old consumer (the PR description becomes the commit message)? With the new consumer (which only ConsumerGroupCommand supports), the NPE should not happen.

@arunmahadevan arunmahadevan changed the title KAFKA-3645: Fix NPE in ConsumerGroupCommand and ConsumerOffsetChecker KAFKA-3645: Fix NPE in ConsumerGroupCommand, ConsumerOffsetChecker with old consumer May 4, 2016
@arunmahadevan
Copy link
Copy Markdown
Contributor Author

@ijuma thanks for reviewing. Changed PR title and refactored to use map and orElse.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented May 4, 2016

LGTM

@ijuma
Copy link
Copy Markdown
Member

ijuma commented May 9, 2016

@arunmahadevan, because our test coverage for these tools is not good, we need to manually test changes like this. Can you please describe the steps to reproduce the NPE?

@arunmahadevan
Copy link
Copy Markdown
Contributor Author

arunmahadevan commented May 23, 2016

@ijuma sorry for not responding earlier, somehow I missed the email notification.

I was testing with a repo which had diverged from trunk and had some other fixes. The NPE does not seem to occur with the current code in trunk since the channel to the offset manager is opened first where it specifically requests for a PLAINTEXT endpoint - link. It then exits with a message "Error while executing consumer group command End point PLAINTEXT not found for broker 0". The method getZkConsumer does not get invoked at all.

To test this I ran the below command after enabling kerberos with SASL_PLAINTEXT listener.
bin/kafka-consumer-groups.sh --zookeeper hostname:2181 --describe --group test-consumer-group --command-config config/consumer.properties.

The proposed patch fixes getZkConsumer to read the endpoint info correctly which could potentially cause NPE if invoked from a different context in future, but right now there is no NPE since the code that invokes this method fails fast.

@ijuma
Copy link
Copy Markdown
Member

ijuma commented May 28, 2016

@arunmahadevan, thanks for the explanation. Can you update the PR description to be accurate then? The PR description becomes the commit message in the merged commit, so its contents are important.

@arunmahadevan arunmahadevan changed the title KAFKA-3645: Fix NPE in ConsumerGroupCommand, ConsumerOffsetChecker with old consumer KAFKA-3645: Fix ConsumerGroupCommand and ConsumerOffsetChecker to correctly read endpoint info from ZK Jun 2, 2016
@arunmahadevan
Copy link
Copy Markdown
Contributor Author

@ijuma Updated the PR title. Let me know if it makes sense.

@asfgit asfgit closed this in ff300c9 Jun 4, 2016
@ijuma
Copy link
Copy Markdown
Member

ijuma commented Jun 4, 2016

Thanks for the PR, LGTM. Merged to trunk.

jsancio pushed a commit to jsancio/kafka that referenced this pull request Aug 6, 2019
Conflicts:
GroupCoordinatorIntegrationTest.scala - line 50
SocketServerTest.scala - TestableSocketServer class, removed verifyAcceptorIdlePercent() method, apache#1301 removed pollBlockMs var used in apache#1344 poll() method; testConnectionRateLimit() method changed
SocketServer.scala - apache#82, removed SocketServerMetricsGroup var, apache#455 removed idlePercentMeter
KafkaApis.scala - apache#50 imports
RequestQuotaTest.scala - apache#28 imports
mumrah pushed a commit to mumrah/kafka that referenced this pull request Aug 14, 2024
* Override doWork to poll requests every 10 ms
* Needs more analysis for large scale deployment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants