KAFKA-9751: Forward CreateTopicsRequest for FindCoordinator/Metadata when topic creation is needed by abbccdda · Pull Request #9579 · apache/kafka

abbccdda · 2020-11-09T17:52:50Z

This PR forward the entire FindCoordinator request to the active controller when the internal topic being queried is not ready to be served yet.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

hachikuji

Did a quick overview and just want to make sure I understand the high-level details. It looks like this patch is handling two cases:

automatic topic creation through Metadata
automatic internal topic creation through FindCoordinator

When the broker encounters one of these cases, it sends a CreateTopic request to the controller. However, the broker does not wait for the response before returning the client response. In the case of FindCoordinator, we return COORDINATOR_NOT_AVAILABLE; for Metadata, we return LEADER_NOT_AVAILABLE.

Do I have that right?

One question that occurred to me is whether we need logic to avoid spamming the controller with CreateTopic requests while we are waiting for the initial request to go through? For example, after seeing COORDINATOR_NOT_AVAILABLE, the consumer will backoff a few ms and then retry. Do we need somewhere to track the topics that are already awaiting creation?

hachikuji · 2021-01-21T00:59:34Z

An alternative that we have done elsewhere would be to introduce an AbstractMetadataRequestTest which we can pull the common cases up to. A more elegant option might be to figure out how to use @ParameterizedTest so that we can provide config overrides. This would be a little difficult at the moment because we initialize brokers in a @Before method. Probably means we need to move away from this approach long term. For now, the abstract class seems preferable. Similar for CreateTopicsRequestWithForwardingTest.

hachikuji · 2021-01-21T01:16:13Z

The controller should have these configurations as well. Perhaps it is better to use -1 for this and replication factor and let the controller fill them in?

I agree it's equivalent, but I think we could be conservative here to keep the logic on broker side for now, to reduce logical change in this PR.

I don't think it is equivalent, at least not completely. My thought was to reduce the reliance on the broker's configuration, which is more likely to be stale than the controller. This actually raises an interesting question about the CreateTopic API which I had not thought of before. If we receive a CreateTopic request for an internal topic, which configuration should we use? Currently it looks like we will apply the standard topic defaults, but that does not seem right. I filed https://issues.apache.org/jira/browse/KAFKA-12280, so we can consider this later.

hachikuji · 2021-01-21T01:29:31Z

We seem to have lost this handling or am I missing something?

We intentionally avoid using adminZkClient so that we could go through topic creation rules through zkAdminManager. TopicExistsException is handled there.

Inside ZkAdminManager.createTopics, I see that we catch TopicExistsException. However, I do not see any logic to translate it to LEADER_NOT_AVAILABLE. Can you show me where this happens?

The problem we have is that ZkAdminManager.createTopics only takes a callback instead of responding to you in realtime whether we hit TopicExists. Right now we are doing the topic creation async, so unless this is necessary to be fixed (which today we would just return UNKNOWN_PARTITION which seems to be semantically similar to LEADER_NOT_AVAILABLE), I think we could just returning unknown partition immediately without waiting for the async creation?

hachikuji · 2021-01-21T01:31:32Z

Hmm.. In the old logic, we would attempt topic creation through zookeeper first. Then, if the topic was created successfully, we would return LEADER_NOT_AVAILABLE to give time for the controller to elect a leader. Now we return LEADER_NOT_AVAILABLE immediately and we send the CreateTopic request asynchronously. We don't know if the CreateTopic request will ultimately succeed or not. Perhaps it would be better to keep returning UNKNOWN_TOPIC_OR_PARTITION until we see that the topic exists.

That makes sense

hachikuji · 2021-02-03T01:51:59Z

+
+    val channelManager =
+      if (enableForwarding)
+        Some(new BrokerToControllerChannelManager(


Let's leave this for a follow-up, but just want to mention that it is probably better if we can reuse the same BrokerToControllerChannelManager as ForwardingManager. Can you file a JIRA for a follow-up?

Sure, we do have https://issues.apache.org/jira/browse/KAFKA-10348 to track.

hachikuji · 2021-02-03T02:45:46Z

I don't think it is equivalent, at least not completely. My thought was to reduce the reliance on the broker's configuration, which is more likely to be stale than the controller. This actually raises an interesting question about the CreateTopic API which I had not thought of before. If we receive a CreateTopic request for an internal topic, which configuration should we use? Currently it looks like we will apply the standard topic defaults, but that does not seem right. I filed https://issues.apache.org/jira/browse/KAFKA-12280, so we can consider this later.

hachikuji · 2021-02-03T05:47:23Z

+  }
+
+  override def createTopics(topics: Set[CreatableTopic],
+                            controllerMutationQuota: ControllerMutationQuota): Unit = {


Eventually we need to figure out quota behavior for forwarded requests. I am wondering if it makes sense to apply the quota on each broker separately before sending the CreateTopic to the controller or if we rely on the controller exclusively.

cc @dajac

@hachikuji Sorry for my late reply. I've missed the notification. We have to enforce the quota on the controller exclusively. It is a global quota and we can't really distribute it fairly in the cluster. In this case, it would be great if we could propagate the principal and clientId to the controller to enforce the quota. However, I wonder how we could propagate the error and the delay to the client if the topic creation is throttled. Perhaps, we could reply with UNKNOW_TOPIC_OR_PARTITION until the topic can be created.

hachikuji · 2021-02-03T07:00:35Z

@abbccdda I had one additional thought here. When we receive a normal CreateTopic request, we forward it to the controller through the Envelope request. This allows us to preserve the original client principal, which is useful for auditing. We are losing that here for auto-created topics, which is unfortunate. One idea I was thinking about is whether we can wrap the CreateTopics request for the auto-created topic in an Envelope in order to preserve the client principal. So basically we take the implicit topic creation request from the client and turn it into an explicit forwarded request on behalf of the client.

hachikuji · 2021-02-05T22:39:33Z

@abbccdda Thanks for the updates. I opened a PR with a few fixes to speed this along since we're trying to get it checked in today: abbccdda#6. The tests that were previously failing now seem to be passing (at least when testing locally).

always define auto topic creation manager

…tion

hachikuji

LGTM

…)" This reverts commit 0bc394c.

hachikuji · 2021-02-06T05:35:10Z

        }
    }

+    clearInflightRequests(creatableTopics)


Can you use a try/finally here?

ijuma · 2021-02-06T16:02:52Z

I triggered a new build. Is there a reason why you aborted the previous one @abbccdda?

abbccdda · 2021-02-06T21:02:29Z

only one no-related test failure in connect. Verified on local, will merge the PR.

An accidental change of logging level for streams from #9579, correcting it. Reviewers: Bill Bejeck <bbejeck@gmail.com>

An accidental change of logging level for streams from apache#9579, correcting it. Reviewers: Bill Bejeck <bbejeck@gmail.com>

An accidental change of logging level for streams from #9579, correcting it. Reviewers: Bill Bejeck <bbejeck@gmail.com>

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch from c60f843 to b5f0b2f Compare November 10, 2020 20:14

abbccdda mentioned this pull request Nov 10, 2020

KAFKA-10350: add forwarding manager implementation with metrics #9580

Merged

3 tasks

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch from b5f0b2f to 5e0a499 Compare November 10, 2020 21:33

abbccdda changed the title ~~KAFKA-9751: Forward FindCoordinator request when topic creation is needed~~ KAFKA-9751: Forward FindCoordinator/Metadata request when topic creation is needed Nov 10, 2020

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch 2 times, most recently from cdc9037 to df0cc62 Compare November 12, 2020 22:04

abbccdda requested review from cmccabe, hachikuji and ijuma November 13, 2020 03:41

abbccdda changed the title ~~KAFKA-9751: Forward FindCoordinator/Metadata request when topic creation is needed~~ KAFKA-9751: Forward CreateTopicsRequest for FindCoordinator/Metadata when topic creation is needed Nov 13, 2020

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch 2 times, most recently from 7982773 to 309d45a Compare November 17, 2020 02:48

rondagostino mentioned this pull request Jan 14, 2021

Add support for auto topic creation in KIP-500 confluentinc/kafka#486

Merged

3 tasks

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch 3 times, most recently from becebdb to ab42e9f Compare January 19, 2021 19:40

hachikuji reviewed Jan 21, 2021

View reviewed changes

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch 2 times, most recently from 0dcd806 to b303f12 Compare January 27, 2021 05:16

abbccdda commented Jan 27, 2021

View reviewed changes

Comment thread core/src/main/scala/kafka/server/BrokerToControllerChannelManager.scala Outdated

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch 8 times, most recently from 9b4521a to 2848e8d Compare February 3, 2021 01:26

hachikuji reviewed Feb 3, 2021

View reviewed changes

hachikuji added the kraft label Feb 3, 2021

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch 2 times, most recently from 2fc0011 to c68c7d0 Compare February 5, 2021 01:49

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch 3 times, most recently from 7e745da to 1559647 Compare February 6, 2021 01:36

abbccdda and others added 6 commits February 5, 2021 17:46

forward FindCoordinator/Metadata request when topic creation is needed

8ef0b5a

always define auto topic creation manager

Fix a few lingering problems

cb919e9

Still need to do topic validation

3bdba7e

CreatTopics request test should not allow auto-creation during valida…

6a2df64

…tion

test fix and logical fix after rebase

878a4ee

add topic validation

abf30d8

abbccdda force-pushed the find_coordinator_forwarding-KAFKA-9751 branch from 1559647 to abf30d8 Compare February 6, 2021 02:26

hachikuji approved these changes Feb 6, 2021

View reviewed changes

abbccdda added 3 commits February 5, 2021 18:43

fix scala 2.12 missing API

829cec7

Revert "KAFKA-9274: handle TimeoutException on task reset (apache#10000…

65310eb

…)" This reverts commit 0bc394c.

clear inflight requests

e0dedb3

hachikuji reviewed Feb 6, 2021

View reviewed changes

abbccdda added 2 commits February 5, 2021 22:21

try finally for clear topics

0796658

fix seq ordering for scala 2.12

92e5cda

rewrite testAutoCreateOfCollidingTopics to not depend on ordering

1d21fc4

abbccdda merged commit d2cb2dc into apache:trunk Feb 6, 2021

abbccdda mentioned this pull request Mar 2, 2021

KAFKA-12381: remove live broker checks for forwarding topic creation #10240

Merged

3 tasks

abbccdda mentioned this pull request Mar 15, 2021

MINOR: revert stream logging level back to ERROR #10320

Merged

3 tasks

abbccdda pushed a commit that referenced this pull request Mar 15, 2021

revert stream logging level back to ERROR (#10320)

8da6593

An accidental change of logging level for streams from #9579, correcting it. Reviewers: Bill Bejeck <bbejeck@gmail.com>

lct45 pushed a commit to confluentinc/kafka that referenced this pull request Mar 15, 2021

revert stream logging level back to ERROR (apache#10320)

10f892a

An accidental change of logging level for streams from apache#9579, correcting it. Reviewers: Bill Bejeck <bbejeck@gmail.com>

abbccdda pushed a commit that referenced this pull request Mar 15, 2021

revert stream logging level back to ERROR (#10320)

f4e7898

An accidental change of logging level for streams from #9579, correcting it. Reviewers: Bill Bejeck <bbejeck@gmail.com>

Conversation

abbccdda commented Nov 9, 2020

Committer Checklist (excluded from commit message)

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abbccdda Jan 21, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji Feb 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hachikuji commented Feb 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hachikuji commented Feb 5, 2021

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma commented Feb 6, 2021

Uh oh!

abbccdda commented Feb 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

abbccdda Jan 21, 2021 •

edited

Loading

hachikuji Feb 3, 2021 •

edited

Loading

hachikuji commented Feb 3, 2021 •

edited

Loading