Skip to content

KAFKA-13836: Improve KRaft broker heartbeat logic#11951

Closed
dengziming wants to merge 1 commit intoapache:trunkfrom
dengziming:KAFKA-13737-postpone-unfence-broker
Closed

KAFKA-13836: Improve KRaft broker heartbeat logic#11951
dengziming wants to merge 1 commit intoapache:trunkfrom
dengziming:KAFKA-13737-postpone-unfence-broker

Conversation

@dengziming
Copy link
Copy Markdown
Member

@dengziming dengziming commented Mar 25, 2022

More detailed description of your change

Don't advertise an offset to the controller until it has been published

Summary of testing strategy (including rationale)
This is really arduous work, I added a test in BrokerLifecycleManagerTest to test the whole state conversion.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@dengziming dengziming changed the title Broker unfence [WIP] KAFKA-13737: Fix flaky test in LeaderElectionCommandTest Mar 26, 2022
@dengziming dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch 5 times, most recently from 14e6332 to 6b55ded Compare March 28, 2022 11:05
@dengziming
Copy link
Copy Markdown
Member Author

Changing the heartbeat request to advertise an offset only after it has been published, would make the test less flaky but can't completely remove the flakiness, because the QuorumController will mark a broker as unfenced as soon as the broker catches up to the LEO the first time, at this point, the UnfenceBrokerRecord has not yet been generated.

This can't be perfectly solved since we will encounter the "chicken and egg" problem here: QuorumController waits for the broker to publish UnfenceBrokerRecord to unfenced it, the broker waits for the QuorumController to unfenced it and generate UnfenceBrokerRecord.

So we'd better try both solutions here, postpone advertising an offset until it is published and wait for MetadataCache has acknowledged all brokers.

cc @showuon @hachikuji

@dengziming dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from 6b55ded to 5480126 Compare March 28, 2022 23:49
@showuon
Copy link
Copy Markdown
Member

showuon commented Mar 29, 2022

I'm not sure the change to let the heartbeat request advertise an offset only after it has been published is a workaround for these flaky tests, or it's actually the expected behavior we want. Might need @hachikuji 's comment here. Thanks.

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dengziming Thanks for the patch. If I understand correctly, the issue you're addressing is the race between the time an offset is received in the metadata listener and the time it gets published. Basically you're saying that this is not enough. There will always be some latency between the time that a broker gets unfenced and the time that that unfencing can be reflected in the metadata cache. However, by waiting for broker's own registration to be received, at least we can reduce the window. Do I have that right?

@dengziming dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from 5480126 to 418e9d5 Compare April 19, 2022 13:10
@dengziming
Copy link
Copy Markdown
Member Author

Yeah @hachikuji , it's impossible to remove the gap but we can still minimize it by:

  1. Don't advertise an offset to the controller until it has been published
  2. Only unfence a broker when it has seen its own registration

I also added a test case for these 2 logics in BrokerLifecycleManagerTest, it seems a little messy, PTAL.

@dengziming dengziming changed the title KAFKA-13737: Fix flaky test in LeaderElectionCommandTest KAFKA-13836: Improve KRaft broker heartbeat logic Apr 19, 2022
@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Apr 21, 2022

Thanks for the PR, @dengziming .

I don't think we need to add the custom logic to look for the broker's own registration record. That should be covered by the check done on the controller side, right? We know that the broker will register prior to sending any heartbeats, and we know that the controller will not allow the broker to join the cluster until its metadata offset has caught up to the high water mark. So those two things, taken together, mean that the existing code is already verifying that the broker has read past its own registration.

The only hole is the bug where we reported the offset from the raft layer rather than from the broker metadata publisher. So I'd say, let's fix that hole and leave the other stuff the same.

@dengziming dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from e30e7ff to 2921e25 Compare April 22, 2022 03:10
@dengziming
Copy link
Copy Markdown
Member Author

I don't think we need to add the custom logic to look for the broker's own registration record. That should be covered by the check done on the controller side, right?

Yes, @cmccabe . 👍 we can make sure the Broker has seen its own registration record if it's unfenced by the controller. I reverted this redundant logic and you can take a look again.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the published offset consistently in heartbeats?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following our current design in BrokerServer, when BrokerServer restarts, it will not start publishing metadata until it has caught up with the current high watermark, so the published offset will always be zero when starting.
This is necessary because when starting there may be some outdated metadata. for example, we created a topic and deleted it but the BrokerServer only read parts of the metadata log, the ReplicaManager will initialize some unnecessary replicas if it publishes at this point.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might still be missing something, but it seems ok if the published offset remains 0 initially and we send that in the heartbeat until the broker is ready to begin publishing. What do you think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the "Chicken and egg" problem. If we advertised 0 to the controller, the controller will always set isCaughtUp=false in HeartbeatResponse, the broker will never be ready to begin publishing.

The time sequence currently is:

  1. LifecycleManager repeatedly sends HeartBeatRequest
  2. The controller set isCaughtUp=true if the advertised offset = high watermark
  3. LifecycleManager marks initialCatchUpFuture is completed isCaughtUp=true
  4. The BrokerServer starts initial publishing after initialCatchUpFuture is completed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I get it now. We don't begin publishing until the controller lets us know that we are caught up. That's the part I was missing. We might need to step back and reconsider this. Letting the reported offset in BrokerHeartbeat mean different things depending on the context seems undesirable. Does it give us any additional guarantees? Even after the controller marks the follower as "caught up," there is still a delay before the broker can publish the data. Additionally, would it be possible for the offset to go backwards when we switch to the published offset? It would be odd if we could end up going back to an offset which was not considered caught up. What do you think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this offset is only used when starting the broker, first to ensure the broker has catchup so we can start initialization, second to ensure the broker has catchup so we can unfenced, if we use the same metadata offset here, it seems unnecessary to ensure twice, but It's indeed peculiar to have an offset going back, maybe adding a new field to the HeartbeatRequest, or just leave it as it is now?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dengziming Yeah, I was considering a second field as well. That would allow a startup transition looking something like this:

  1. Broker catches up: fetch_offset=N, publish_offset=0
  2. Controller returns is_caught_up=true, is_fenced=true
  3. Broker begins publishing: fetch_offset=N, publish_offset=N
  4. Controller returns is_caught_up=true, is_fenced=false

Basically the controller uses fetch_offset to tell whether the broker is caught up and it uses publish_offset to tell whether it should be unfenced. I guess the main advantage is that it gives us some more protection from exposing old metadata when brokers rejoin the cluster. Or at least we can put a time bound on the staleness of the exposed metadata. The value of this may be a bit limited though since we don't do self-fencing.

@cmccabe What do you think?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need us to bump MetadataVersion so I will wait for KIP-778 first.

@dengziming dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from 2921e25 to d5b0a2c Compare May 20, 2022 07:25
@dengziming dengziming force-pushed the KAFKA-13737-postpone-unfence-broker branch from d5b0a2c to cec75f2 Compare May 20, 2022 07:58
@dengziming
Copy link
Copy Markdown
Member Author

Bumped BrokerHeartbeat and use version 2 if MetadataVersion is above IBP_3_3_IV1, this is the first use case of FeatureVersion, PTAL. @hachikuji @mumrah

@dengziming
Copy link
Copy Markdown
Member Author

This has been fixed in #12983, we use sharedServer.loader.lastAppliedOffset() which is the published offset.

@dengziming dengziming closed this Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants