KAFKA-7655 Metadata spamming requests from Kafka Streams under some circumstances, potential DOS by Pasvaz · Pull Request #5929 · apache/kafka

Pasvaz · 2018-11-19T15:32:51Z

Re-validate and make sure the topic either exists or it's gone by using a delay.

There is a bug in the InternalTopicManager that makes the client believe that a topic exists even though it doesn't, it occurs mostly in those few seconds between when a topic is marked for deletion and when it is actually deleted. In that timespan, the Broker gives inconsistent information, first it hides the topic but then it refuses to create a new one therefore the client believes the topic was existing already and it starts polling for metadata.

The consequence is that the client goes into a loop where it polls for topic metadata and if this is done by many threads it can take down a small cluster or degrade greatly its performances.

The real life scenario is probably a reset gone wrong. Reproducing the issue is fairly simple, these are the steps:

Stop a Kafka streams application
Delete one of its changelog and the local store
Restart the application immediately after the topic delete
You will see the Kafka streams application hanging after the bootstrap saying something like: INFO Metadata - Cluster ID: xxxx

I am attaching a patch that fixes the issue client side but my personal opinion is that this should be tackled on the broker as well, metadata requests seem expensive and it would be easy to craft a DDOS that can potentially take down an entire cluster in seconds just by flooding the brokers with metadata requests.

The patch kicks in only when a topic that wasn't existing in the first call to getNumPartitions triggers a TopicExistsException. When this happens it forces the re-validation of the topic and if it still looks like doesn't exists plan a retry with some delay, to give the broker the necessary time to sort it out.

I think this patch makes sense beside the above mentioned use case where a topic it's not existing, because, even if the topic was actually created, the client should not blindly trust it and should still re-validate it by checking the number of partitions. IE: a topic can be created automatically by the first request and then it would have the default partitions rather than the expected ones.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…ng a delay.

mjsax

Thanks for the patch.

I think it would be great to get a system test for this, too. Not sure, how easy it is to reproduce the issue with a system test.

guozhangwang

There's a correlated JIRA (https://issues.apache.org/jira/browse/KAFKA-6928) for removing the retry logic completely but delayed due to admin client's own gap on the retries: https://issues.apache.org/jira/browse/KAFKA-6928

I've synced with @hachikuji and I'd like to suggest we merge this PR as-is while I can work on KAFKA-6928 right away while keeping this fix in mind.

mjsax · 2018-11-26T23:44:55Z

Retest this please

dguy · 2018-12-13T14:49:04Z

@mjsax @guozhangwang any reason why this hasn't been merged?

…ircumstances, potential DOS (#5929) Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>

mjsax · 2018-12-13T16:32:26Z

@dguy No reason -- thanks for the reminder!

@Pasvaz Thanks for the patch! Merged to trunk and cherry-picked to 2.1 and 2.0 branches.

miguno · 2018-12-14T09:11:11Z

Thanks for the patch, @Pasvaz!

…ircumstances, potential DOS (apache#5929) Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>

Re-validate and make sure the topic either exists or it's gone by usi…

0b3ae81

…ng a delay.

mjsax reviewed Nov 19, 2018

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java Outdated

pqlmwam added 2 commits November 20, 2018 09:35

Add RETRY_BACKOFF_MS_CONFIG to the admin prefixed properties

c33757f

Removed tab

21b3def

mjsax reviewed Nov 20, 2018

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java

Comment thread streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java Outdated

guozhangwang approved these changes Nov 21, 2018

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java Outdated

Better error message

e341d4c

mjsax reviewed Nov 23, 2018

View reviewed changes

Comment thread streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java

Comment thread streams/src/main/java/org/apache/kafka/streams/processor/internals/InternalTopicManager.java

Updated tests and renamed createTopicNames to createdTopicNames

23dfca5

mjsax added the streams label Dec 13, 2018

mjsax merged commit dffce6e into apache:trunk Dec 13, 2018

mjsax pushed a commit that referenced this pull request Dec 13, 2018

KAFKA-7655 Metadata spamming requests from Kafka Streams under some c…

71dc4b7

…ircumstances, potential DOS (#5929) Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>

mjsax pushed a commit that referenced this pull request Dec 13, 2018

KAFKA-7655 Metadata spamming requests from Kafka Streams under some c…

46a2e50

…ircumstances, potential DOS (#5929) Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <guozhang@confluent.io>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-7655 Metadata spamming requests from Kafka Streams under some circumstances, potential DOS#5929

KAFKA-7655 Metadata spamming requests from Kafka Streams under some circumstances, potential DOS#5929
mjsax merged 5 commits intoapache:trunkfrom
Pasvaz:KAFKA-7655

Pasvaz commented Nov 19, 2018

Uh oh!

mjsax left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guozhangwang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mjsax commented Nov 26, 2018

Uh oh!

dguy commented Dec 13, 2018

Uh oh!

mjsax commented Dec 13, 2018

Uh oh!

miguno commented Dec 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

Pasvaz commented Nov 19, 2018

Committer Checklist (excluded from commit message)

Uh oh!

mjsax left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guozhangwang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mjsax commented Nov 26, 2018

Uh oh!

dguy commented Dec 13, 2018

Uh oh!

mjsax commented Dec 13, 2018

Uh oh!

miguno commented Dec 14, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants