KAFKA-7395; Add fencing to replication protocol (KIP-320) by hachikuji · Pull Request #5661 · apache/kafka

hachikuji · 2018-09-18T23:03:32Z

This patch contains the broker-side support for the fencing improvements from KIP-320. This includes the leader epoch validation in the ListOffsets, OffsetsForLeaderEpoch, and Fetch APIs as well as the changes needed in the fetcher threads to maintain and use the current leader epoch. The client changes from KIP-320 will be left for a follow-up.

One notable change worth mentioning is that we now require the read lock in Partition in order to read from the log or to query offsets. This is necessary to ensure the safety of the leader epoch validation. Additionally, we forward all leader epoch changes to the replica fetcher thread and go through the truncation phase. This is needed to ensure the fetcher always has the latest epoch and to guarantee that we cannot miss needed truncation if we missed an epoch change.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

junrao

@hachikuji : Thanks for the patch. Looks good overall. A few comments below.

Also, is it true that this PR hasn't added the logic for the consumer to use OffsetForLeaderEpoch yet?

junrao · 2018-09-28T18:50:40Z

This is an existing issue but readOnlyCommitted is a bit confusing given the transactional commit in isolation level. Perhaps naming it readUpToHighWatermark?

junrao · 2018-09-28T23:14:59Z

The comment doesn't read right with two "either"?

junrao · 2018-09-28T23:27:01Z

Hmm, the leader epoch will be 1, which doesn't match the leader epoch of 0 set in line 180. Do we need to extend MockFetcherThread to support currentLeaderEpoch?

Thanks, good catch. I meant to do this, but forgot about it.

junrao · 2018-09-28T23:27:43Z

Should we also assert that the offset is back to 3 now?

junrao · 2018-09-28T23:43:08Z

leaderEpoch of 0 doesn't seem to be consistent with the leader epoch in the log.

junrao · 2018-09-29T01:12:42Z

The error msg should say leaders not changed yet?

hachikuji · 2018-10-04T06:32:32Z

retest this please

hachikuji · 2018-10-04T16:47:43Z

@junrao This is ready for another look. To answer your question, it was indeed my intention to do the client-side implementation in a separate PR. For now, I have removed the error handling in Fetcher since we are not sending the current leader epoch anyway.

…not sent

hachikuji · 2018-10-04T23:15:32Z

@lindong28 I think this PR will be wrapped up soon. Do you think we can still get it into 2.1?

junrao

@hachikuji : Thanks for the updated patch. A few more comments below.

junrao · 2018-10-04T17:25:23Z

This doesn't seem to be accurate. We only go to the prior epoch if the requested epoch is not present. Otherwise, the requested epoch and its last offset will be returned.

junrao · 2018-10-04T21:09:40Z

The default for minOneMessage is changed to true. This means that caller convertToOffsetMetadata() will fetch one message, but it doesn't need to.

junrao · 2018-10-04T21:51:34Z

Should leaderEpoch be 0 to match what's in line 163?

junrao · 2018-10-04T22:13:16Z

Hmm, not sure that I understand the "unless" part.

junrao · 2018-10-04T22:46:14Z

Should we verify that tp doesn't exist in firstLeaderFetcher? Also, should we call EasyMock.verify(fetcher) at the end?

junrao · 2018-10-04T23:16:59Z

-            metadataCache.getAliveBrokers.find(_.id == partition.leaderReplicaIdOpt.get).get.brokerEndPoint(config.interBrokerListenerName),
-            partition.getReplica().get.highWatermark.messageOffset)).toMap
-        replicaFetcherManager.addFetcherForPartitions(partitionsToMakeFollowerWithLeaderAndOffset)
+        val partitionsToMakeFollowerWithLeaderAndOffset = partitionsToMakeFollower.map { partition =>


The issue of not calling replicaFetcherManager.removeFetcherForPartitions() in line 1278 is that in the case of controlled shutdown, we avoid adding the partitions to the fetcher. This means that existing partitions won't be removed from the fetcher. This may cause replicas removed from ISR during controlled shutdown to be added back to ISR again.

Also, if we do this, the state-change logging after replicaFetcherManager.removeFetcherForPartitions() probably needs to be moved too.

Ok, let's revert this change. I think it was not strictly needed and I was not too happy about the additional bookkeeping in AbstractFetcherManager.

lindong28 · 2018-10-04T23:50:04Z

@hachikuji Would user be able to benefit from this PR if we do not implement the client-side part of KIP-320? Strictly speaking we are not supposed to commit large PR after feature freeze date of Oct 1st. I am trying to understand whether the benefit of this PR is worth breaking this plan.

hachikuji · 2018-10-05T07:03:43Z

@lindong28 The main benefit is the improved fencing on the brokers. Without it, we will still have the possibility of data loss when brokers turn zombie. These edge cases are rare in practice, so probably not too much damage if it slips, though it would be kind of a pity since we already bumped the protocols.

lindong28 · 2018-10-05T07:10:17Z

@hachikuji I see. Sure, if you and @junrao are confident in this PR, please feel free to commit it into 2.1 branch as well :)

hachikuji · 2018-10-05T07:21:31Z

@lindong28 Sounds good. I feel pretty good about it, but let's see what Jun thinks tomorrow after he's had a chance to see the latest updates.

junrao

@hachikuji : Thanks for the latest patch. LGTM. Just a few minor comments below.

Since we already bumped up the request version and this patch seems less risky, I am fine with merging this to 2.1.

junrao · 2018-10-05T15:16:12Z

      expectDeletedFiles)
  }

+  def readLog(log: Log, startOffset: Long, maxLength: Int,


Could this be private?

junrao · 2018-10-05T15:22:46Z

+        }
+
+        fetcherThread.addPartitions(initialOffsetAndEpochs)
+        info(s"Added fetcher to broker ${brokerAndFetcherId.broker} for partitions $initialOffsetAndEpochs")


The original version logs one partition per line. Perhaps that's easier to parse when debugging?

Hmm.. I've found the original message to be too big to be useful in practice. This was an attempt to break it down so that at least there would be a separate message per broker. It was also a tad annoying that we had to build a whole new collection just to print a log message.

Ok, we can keep it this way then.

junrao · 2018-10-05T15:34:56Z

-      partition.makeLeader(controllerId, new LeaderAndIsrRequest.PartitionState(controllerEpoch, leader, leaderEpoch,
-        isr, 1, replicas, true), 0))
-
+    val partition = setupPartitionWithMocks(leaderEpoch = leaderEpoch, isLeader = true)


Hmm, I am not sure how this test works now. setupPartitionWithMocks() creates a Log that's different from the one created in line 105. So not sure how they have the same log end offset. Also, do we need both Log?

Hmm yeah, this is strange. I think it works because even though they are separate Log instances, they use the same directory and log files. Let me try to fix.

hachikuji · 2018-10-05T16:45:20Z

@junrao Thanks for reviewing. I will plan to merge to trunk and 2.1 once the build completes.

hachikuji · 2018-10-05T20:23:34Z

The SuppressionIntegrationTest failure is known to be flaky. I will go ahead and merge to trunk and 2.1.

This patch contains the broker-side support for the fencing improvements from KIP-320. This includes the leader epoch validation in the ListOffsets, OffsetsForLeaderEpoch, and Fetch APIs as well as the changes needed in the fetcher threads to maintain and use the current leader epoch. The client changes from KIP-320 will be left for a follow-up. One notable change worth mentioning is that we now require the read lock in `Partition` in order to read from the log or to query offsets. This is necessary to ensure the safety of the leader epoch validation. Additionally, we forward all leader epoch changes to the replica fetcher thread and go through the truncation phase. This is needed to ensure the fetcher always has the latest epoch and to guarantee that we cannot miss needed truncation if we missed an epoch change. Reviewers: Jun Rao <junrao@gmail.com>

junrao reviewed Sep 29, 2018

View reviewed changes

hachikuji added 8 commits October 4, 2018 16:05

KAFKA-7395; Add fencing to replication protocol (KIP-320)

81ac724

Reduce redundancy in internal fetch parameters

88456a1

Add fencing support to MockFetcherThread plus some test cases

8f2f97a

Test fenced/unknown leader epoch scenarios in fetcher

86a15b2

Add unit tests for AbstractFetcherManager

923a132

Fix test broken by default parameter change

5bd72fe

Missing a couple lock acquisitions in AbstractFetcherThread

f661335

No need to handle fenced/unknown epoch errors since current epoch is …

a21f659

…not sent

hachikuji force-pushed the KAFKA-7395 branch from d3c6038 to a21f659 Compare October 4, 2018 23:08

junrao reviewed Oct 4, 2018

View reviewed changes

Address review comments

67c50ea

junrao approved these changes Oct 5, 2018

View reviewed changes

Minor test case tweaks

f9bc468

hachikuji force-pushed the KAFKA-7395 branch from b061946 to f9bc468 Compare October 5, 2018 16:40

hachikuji merged commit ed3bd79 into apache:trunk Oct 5, 2018

apovzner mentioned this pull request Jan 11, 2019

KAFKA-7040: Ignore OffsetsForLeaderEpoch response if leader epoch changed while request in flight #6122

Closed

3 tasks

Conversation

hachikuji commented Sep 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji commented Oct 4, 2018

Uh oh!

hachikuji commented Oct 4, 2018

Uh oh!

hachikuji commented Oct 4, 2018

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lindong28 commented Oct 4, 2018

Uh oh!

hachikuji commented Oct 5, 2018

Uh oh!

lindong28 commented Oct 5, 2018

Uh oh!

hachikuji commented Oct 5, 2018

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hachikuji commented Oct 5, 2018

Uh oh!

hachikuji commented Oct 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

hachikuji commented Sep 18, 2018 •

edited

Loading