KAFKA-13836: Improve KRaft broker heartbeat logic by dengziming · Pull Request #11951 · apache/kafka

dengziming · 2022-03-25T15:39:58Z

More detailed description of your change

Don't advertise an offset to the controller until it has been published

Summary of testing strategy (including rationale)
This is really arduous work, I added a test in BrokerLifecycleManagerTest to test the whole state conversion.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

dengziming · 2022-03-28T11:41:28Z

Changing the heartbeat request to advertise an offset only after it has been published, would make the test less flaky but can't completely remove the flakiness, because the QuorumController will mark a broker as unfenced as soon as the broker catches up to the LEO the first time, at this point, the UnfenceBrokerRecord has not yet been generated.

This can't be perfectly solved since we will encounter the "chicken and egg" problem here: QuorumController waits for the broker to publish UnfenceBrokerRecord to unfenced it, the broker waits for the QuorumController to unfenced it and generate UnfenceBrokerRecord.

So we'd better try both solutions here, postpone advertising an offset until it is published and wait for MetadataCache has acknowledged all brokers.

cc @showuon @hachikuji

showuon · 2022-03-29T08:15:55Z

I'm not sure the change to let the heartbeat request advertise an offset only after it has been published is a workaround for these flaky tests, or it's actually the expected behavior we want. Might need @hachikuji 's comment here. Thanks.

hachikuji

@dengziming Thanks for the patch. If I understand correctly, the issue you're addressing is the race between the time an offset is received in the metadata listener and the time it gets published. Basically you're saying that this is not enough. There will always be some latency between the time that a broker gets unfenced and the time that that unfencing can be reflected in the metadata cache. However, by waiting for broker's own registration to be received, at least we can reduce the window. Do I have that right?

dengziming · 2022-04-19T15:00:00Z

Yeah @hachikuji , it's impossible to remove the gap but we can still minimize it by:

Don't advertise an offset to the controller until it has been published
Only unfence a broker when it has seen its own registration

I also added a test case for these 2 logics in BrokerLifecycleManagerTest, it seems a little messy, PTAL.

cmccabe · 2022-04-21T21:12:40Z

Thanks for the PR, @dengziming .

I don't think we need to add the custom logic to look for the broker's own registration record. That should be covered by the check done on the controller side, right? We know that the broker will register prior to sending any heartbeats, and we know that the controller will not allow the broker to join the cluster until its metadata offset has caught up to the high water mark. So those two things, taken together, mean that the existing code is already verifying that the broker has read past its own registration.

The only hole is the bug where we reported the offset from the raft layer rather than from the broker metadata publisher. So I'd say, let's fix that hole and leave the other stuff the same.

dengziming · 2022-04-22T03:14:18Z

I don't think we need to add the custom logic to look for the broker's own registration record. That should be covered by the check done on the controller side, right?

Yes, @cmccabe . 👍 we can make sure the Broker has seen its own registration record if it's unfenced by the controller. I reverted this redundant logic and you can take a look again.

hachikuji · 2022-04-22T21:40:10Z

Why not use the published offset consistently in heartbeats?

Following our current design in BrokerServer, when BrokerServer restarts, it will not start publishing metadata until it has caught up with the current high watermark, so the published offset will always be zero when starting.
This is necessary because when starting there may be some outdated metadata. for example, we created a topic and deleted it but the BrokerServer only read parts of the metadata log, the ReplicaManager will initialize some unnecessary replicas if it publishes at this point.

I might still be missing something, but it seems ok if the published offset remains 0 initially and we send that in the heartbeat until the broker is ready to begin publishing. What do you think?

This seems like the "Chicken and egg" problem. If we advertised 0 to the controller, the controller will always set isCaughtUp=false in HeartbeatResponse, the broker will never be ready to begin publishing.

The time sequence currently is:

LifecycleManager repeatedly sends HeartBeatRequest

The controller set isCaughtUp=true if the advertised offset = high watermark

LifecycleManager marks initialCatchUpFuture is completed isCaughtUp=true

The BrokerServer starts initial publishing after initialCatchUpFuture is completed

Ok, I think I get it now. We don't begin publishing until the controller lets us know that we are caught up. That's the part I was missing. We might need to step back and reconsider this. Letting the reported offset in BrokerHeartbeat mean different things depending on the context seems undesirable. Does it give us any additional guarantees? Even after the controller marks the follower as "caught up," there is still a delay before the broker can publish the data. Additionally, would it be possible for the offset to go backwards when we switch to the published offset? It would be odd if we could end up going back to an offset which was not considered caught up. What do you think?

Currently, this offset is only used when starting the broker, first to ensure the broker has catchup so we can start initialization, second to ensure the broker has catchup so we can unfenced, if we use the same metadata offset here, it seems unnecessary to ensure twice, but It's indeed peculiar to have an offset going back, maybe adding a new field to the HeartbeatRequest, or just leave it as it is now?

@dengziming Yeah, I was considering a second field as well. That would allow a startup transition looking something like this:

Broker catches up: fetch_offset=N, publish_offset=0

Controller returns is_caught_up=true, is_fenced=true

Broker begins publishing: fetch_offset=N, publish_offset=N

Controller returns is_caught_up=true, is_fenced=false

Basically the controller uses fetch_offset to tell whether the broker is caught up and it uses publish_offset to tell whether it should be unfenced. I guess the main advantage is that it gives us some more protection from exposing old metadata when brokers rejoin the cluster. Or at least we can put a time bound on the staleness of the exposed metadata. The value of this may be a bit limited though since we don't do self-fencing.

@cmccabe What do you think?

This would need us to bump MetadataVersion so I will wait for KIP-778 first.

dengziming · 2022-05-20T08:01:54Z

Bumped BrokerHeartbeat and use version 2 if MetadataVersion is above IBP_3_3_IV1, this is the first use case of FeatureVersion, PTAL. @hachikuji @mumrah

dengziming · 2023-08-14T11:00:00Z

This has been fixed in #12983, we use sharedServer.loader.lastAppliedOffset() which is the published offset.

dengziming changed the title ~~Broker unfence [WIP]~~ KAFKA-13737: Fix flaky test in LeaderElectionCommandTest Mar 26, 2022

dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch 5 times, most recently from 14e6332 to 6b55ded Compare March 28, 2022 11:05

dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from 6b55ded to 5480126 Compare March 28, 2022 23:49

hachikuji reviewed Apr 18, 2022

View reviewed changes

dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from 5480126 to 418e9d5 Compare April 19, 2022 13:10

dengziming changed the title ~~KAFKA-13737: Fix flaky test in LeaderElectionCommandTest~~ KAFKA-13836: Improve KRaft broker heartbeat logic Apr 19, 2022

dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from e30e7ff to 2921e25 Compare April 22, 2022 03:10

hachikuji reviewed Apr 22, 2022

View reviewed changes

dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from 2921e25 to d5b0a2c Compare May 20, 2022 07:25

Postpone unfence broker, Bump heartbeat version

cec75f2

dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from d5b0a2c to cec75f2 Compare May 20, 2022 07:58

dengziming closed this Aug 14, 2023

Conversation

dengziming commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

dengziming commented Mar 28, 2022

Uh oh!

showuon commented Mar 29, 2022

Uh oh!

hachikuji left a comment

Choose a reason for hiding this comment

Uh oh!

dengziming commented Apr 19, 2022

Uh oh!

cmccabe commented Apr 21, 2022

Uh oh!

dengziming commented Apr 22, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dengziming commented May 20, 2022

Uh oh!

dengziming commented Aug 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dengziming commented Mar 25, 2022 •

edited

Loading