KAFKA-6361: Fix log divergence between leader and follower after fast leader fail over#4882
KAFKA-6361: Fix log divergence between leader and follower after fast leader fail over#4882junrao merged 24 commits intoapache:trunkfrom
Conversation
|
@lindong28 and @junrao Regarding the truncation logic for future replica (ReplicaAlterLogDirsThread): I thought about this a bit more, and I think we don't need the same truncation logic as in the replica fetcher. We can implement a much simpler logic based on the truncation offset of the local replica. Here are main points:
Based on the above, unless I missed something or my assumptions are wrong, I think we should truncate future replica to the truncation offset of the local replica. This will result in a simpler code/logic which is easier to reason about and debug in the future. The PR is also ready to review. |
|
@apovzner Thanks for the patch! I will finish the first round of review this week. |
lindong28
left a comment
There was a problem hiding this comment.
Thanks for the patch! LGTM. I only have some minor comments regarding the Java doc and the consistency between ReplicaFetcherThread and ReplicaAlterDirThread.
There was a problem hiding this comment.
nits: leader replied with an offset $offsetToTruncateTo not (logEndOffset)
There was a problem hiding this comment.
Oh that's a good catch, and also made me realize it's not a completely correct log message for all the cases, will fix.
There was a problem hiding this comment.
Since we are changing this Java doc, can we make this Java doc a bit more consistent with the actual implementation. For example, If the leader replied with undefined epoch can probably be the first case.
There was a problem hiding this comment.
nits: This is not related to this patch. Can we add one comment saying that the initial offset in this case will be the high watermark, so that it is more consistent with the Java doc?
There was a problem hiding this comment.
Added comment above the logging
There was a problem hiding this comment.
nits: Would it be a bit more accurate a bit more accurate to replace <= with <?
There was a problem hiding this comment.
nits: It probably does not affect the correctness of the code. But I am wondering if we can make ReplicaFetcherThread.maybeTruncate() a bit more consistent with the ReplicaAlterDirThread.maybeTruncate(). Currently ReplicaFetcherThread.maybeTruncate() will specifically handle the scenario that epochOffset.leaderEpoch == UNDEFINED_EPOCH whereas ReplicaAlterDirThread.maybeTruncate() handles the scenario through futureEndOffset == UNDEFINED_EPOCH_OFFSET.
Also, should we also consider to min with the initial offset here just like this patch does in ReplicaAlterDirThread.maybeTruncate()? I am wondering whether we can make them more consistent, or whether there is reason that the initial offset is needed in only one of them.
There was a problem hiding this comment.
About the question in the second paragraph, it would be incorrect to do min with the initial offset (high watermark) here, because this will fallback to pre-KIP-101 implementation and we can actually lose a committed message (see scenario 1 in KIP-101) . This particular case can happen if the leader is on the protocol version of pre this KIP but post-KIP-101, so it replies with the valid offset but invalid leader epoch. In this case, we want to do KIP-101 implementation of truncating to leader's offset, rather than falling back to pre-KIP-101 implementation.
Regarding a bigger question of making AlterLogDirThread more consistent with ReplicaAlterLogDirThread, I wanted to discuss the possibility of ReplicaAlterLogDirThread using only initial offset (which is truncation offset of the main replica) for truncation instead of following offset for leader epoch logic. I left the comment earlier in this PR and wanted to get your opinion.
-- The initial offset in AlterLogDirThread is main replica truncation offset (if main replica is a follower) or main replica's high watermark (if main replica is the leader), which is different from initial offset in ReplicaFetcherThread which is a high watermark.
-- There is no way that future replica needs to truncate further back than initial offset, because it is always a follower of the main replica, and if the main replica truncated and re-fetched offsets from the leader causing temporary log divergence with the future replica, we already force truncation on the future replica setting future replica's initial offset to the main replica truncation offset.
Otherwise, I will change ReplicaAlterDirThread.maybeTruncate() to be more consistent with ReplicaAlterDirThread.maybeTruncate() in the case mentioned above. Although the current implementation is correct, because falling back to the initial offset for the future replica is safe (vs. for follower replica), because the future replica is always a follower. The whole reason for KIP-101 and KIP-279 is to deal with replicas changing their leader/follower status, while the future replica is always a follower.
There was a problem hiding this comment.
@apovzner Thanks for the explanation. The case that the leader is on the protocol version of pre this KIP but post-KIP-101 AND this patch is used in some broker, can only happen if the Kafka cluster being upgraded to use this patch. The time window of this state is very small and maybe we do not need to take care of this scenario.
Regarding the possibility of ReplicaAlterLogDirThread using only initial offset, Jun has provided a very good example. Basically this approach is not reliable if the future replica is offline when leader replica is truncated.
There was a problem hiding this comment.
Actually, to be more precise, both leader and follower could be on pre- this KIP protocol version, if the user upgrades the brokers but do not bump the protocol version. So I think we want post KIP-101 behavior, which is what's implemented, vs. going back to pre-KIP-101.
OK, offline future replica is a good example I did not consider. I agree we should use the same algorithm in ReplicaAlterLogDirThread. I will make it more consistent.
There was a problem hiding this comment.
Since this affects inter broker protocol, we need to (1) document this api change for "2.0-IV0" in ApiVersion.scala, (2) update the upgrade section in the doc, (3) only use the new protocol if the inter broker protocol is 2.0-IV0 or above.
There was a problem hiding this comment.
Regarding (3), the fetcher falls back to KIP-101 logic if inter-broker protocol version < KAFKA_2_0_IV0 (ignores leader epoch returned in the response and uses end offset).
There was a problem hiding this comment.
To be more precise, 1) should be "the leader is still using message format older than KAFKA_0_11_0_IV2".
There was a problem hiding this comment.
KAFKA_1_1_IV0 should be KAFKA_2_0_IV0
There was a problem hiding this comment.
I think in this case, it's probably better to fall back to high watermark. That way, if the leader epoch logic doesn't apply, we always consistently fall back to the old method.
There was a problem hiding this comment.
My concern about falling back to high watermark in this particular case is that post-KIP-101 code (and pre-2.0) behaves exactly as described, since the leader does not send leader epoch, so we don't check it and use leader's offset to truncate. And also if brokers upgrade to 2.0, but do not upgrade protocol version. Then, we upgrade to 2.0 protocol version, and we are back to high watermark in this case.
There was a problem hiding this comment.
The above logic may still be needed in the following sequence: (1) future replica copies data above HW from current replica; (2) future replica is offline (e.g. disk failure); (3) current replica truncates data above HW and re-replicated new data from leader on the truncated offsets. To avoid duplicates, perhaps we can share the code between ReplicaFetchThread and here?
There was a problem hiding this comment.
That's a good example I did not consider. In that case, I agree, we need the leader epoch logic. I will try to move this code out into a common method that both fetchers re-use.
There was a problem hiding this comment.
typo "the the". Also, we are now returning the leader epoch and the end offset.
There was a problem hiding this comment.
Would it be simpler to just initialize updatedOffsetsOpt to offsets and make it a none Option?
There was a problem hiding this comment.
This is probably because the broker port changes on restart?
There was a problem hiding this comment.
Change -1 to UNDEFINED_EPOCH_OFFSET?
|
@junrao and @lindong28 Thanks a lot for your comments. I addressed all of them. Based on the use case of future replica being offline and missing "mark for truncation" event, I agree that |
junrao
left a comment
There was a problem hiding this comment.
@apovzner : Thanks for the updated patch. A few more comments below. A couple of other things.
- Could you also run the system tests?
- This is not an issue directly related to this patch. But I noticed that in Log.truncateTo(), if the truncation point is in the middle of a message set, we will actually be truncating to the first offset of the message set. In that case, the replica fetcher thread should adjust the fetch offset to the actual truncated offset. Typically, the truncation point should never be in the middle of a message set. However, this could potentially happen during message format upgrade. We can tighten this up in a separate jira.
There was a problem hiding this comment.
Hmm, in ReplicaManager.alterReplicaLogDirs(), the initial offset for future replica is also set to its HW. We update future replica's HW In ReplicaAlterLogDirsThread.processPartitionData(). We probably want to bound it by future replica's log end offset.
There was a problem hiding this comment.
I fixed the comment to say it is either high watermark for future replica, or current replica's truncation offset.
Also, changed ReplicaAlterLogDirsThread.processPartitionData() to bound future replica's high watermark to its log end offset.
There was a problem hiding this comment.
Do we need to test useLeaderEpochInResponse? The only case we want to cover here is that the follower uses version 0 of OffsetForLeaderEpoch.
There was a problem hiding this comment.
I am testing it in ReplicaFetcherThreadTest.shouldUseLeaderEndOffsetIfInterBrokerVersionBelow20
There was a problem hiding this comment.
not tracking offsets => not tracking leader epochs ?
There was a problem hiding this comment.
This is a normal behavior. So the logging probably should be info.
There was a problem hiding this comment.
Now that we can truncate in more than 1 step, it's probably useful to always bound the truncation point by the replica's log end offset.
There was a problem hiding this comment.
We want to mention that we are returning both the epoch and the offset.
There was a problem hiding this comment.
This exists in line 23 already.
There was a problem hiding this comment.
removed dup from like 23
There was a problem hiding this comment.
This test should actually fail right now since the version of the leaderEpochRequest is alway version 1. So, we probably want to check the latestAllowedVersion() in the builder in ReplicaFetcherMockBlockingSend.
There was a problem hiding this comment.
Right, this test ended up testing local broker on 0.11 and remote broker on latest version, which actually does not fail because we don't check leader epoch in leaderEpochResponse, and the truncation is done using KIP-101 approach (which is verified by this test). I will update the test to use undefined leader epoch in response to simulate another broker also on the older protocol version.
There was a problem hiding this comment.
replicaLeaderEpoch and leaderEpochOffset may be confusing. How about followerEpoch and leaderEpochOffset?
There was a problem hiding this comment.
KAFKA_0_11_0_IV2 => KAFKA_0_11_0
… leader fail over
…CH (same -1 as before)
lindong28
left a comment
There was a problem hiding this comment.
Thanks for the update! Left a few comments
| private static final Schema OFFSET_FOR_LEADER_EPOCH_REQUEST_V0 = new Schema( | ||
| new Field(TOPICS_KEY_NAME, new ArrayOf(OFFSET_FOR_LEADER_EPOCH_REQUEST_TOPIC_V0), "An array of topics to get epochs for")); | ||
|
|
||
| /* v2 request is the same as v1. Per-partition leader epoch has been added to response */ |
There was a problem hiding this comment.
typo. Probably should be v1 instead of v2.
| // OFFSET_FOR_LEADER_EPOCH_RESPONSE_PARTITION_V1 added a per-partition leader epoch field, | ||
| // which specifies which leader epoch the end offset belongs to | ||
| private static final Schema OFFSET_FOR_LEADER_EPOCH_RESPONSE_PARTITION_V1 = new Schema( | ||
| ERROR_CODE, |
There was a problem hiding this comment.
nits: can we make the indention same as the existing indention in this file?
| // and KafkaStorageException for fetch requests. | ||
| "1.1-IV0" -> KAFKA_1_1_IV0, | ||
| "1.1" -> KAFKA_1_1_IV0, | ||
| // Introduced OffsetsForLeaderEpochRequest/OffsetsForLeaderEpochResponse V1 via KIP-279 |
There was a problem hiding this comment.
nits: to be more consistent with the existing comment, we can just say Introduced OffsetsForLeaderEpochRequest V1 via KIP-279
| /** | ||
| * @param leaderEpoch Requested leader epoch | ||
| * @return The last offset of messages published under this leader epoch. | ||
| * @return The requested leader epoch and the last offset of messages published under this |
There was a problem hiding this comment.
It looks like the existing Java doc (prior to this patch) of this method is not correct. According to Java doc of LeaderEpochFileCache.endOffsetFor(...), it says The End Offset is the end offset of this epoch, which is defined as the start offset of the first Leader Epoch larger than the Leader Epoch requested, or else the Log End Offset if the latest epoch was requested.
There was a problem hiding this comment.
Yes, I think the prior description is more of a shortcut, which is actually not correct. I just realized that we should use "end offset" instead of the last offset of messages published here -- the description in LeaderEpochFileCache is more precise. I will update this comment accordingly.
| val followerName = if (isFutureReplica) "future replica" else "follower" | ||
|
|
||
| // Called when 'offsetToTruncateTo' is the final offset to truncate to. | ||
| def finalFetchLeaderEpochOffset(offsetToTruncateTo: Long, offsetFromLeader: Long): OffsetTruncationState = { |
There was a problem hiding this comment.
nits: now that we don't have any logging in finalFetchLeaderEpochOffset(), we can probably remove this method and replace its usage with one line. For example, finalFetchLeaderEpochOffset(leaderEpochOffset.endOffset, leaderEpochOffset.endOffset) is equivalent to OffsetTruncationState(math.min(offsetToTruncateTo, replica.logEndOffset.messageOffset), truncationCompleted = true)
| isInterruptible = false, | ||
| includeLogTruncation = true) { | ||
| includeLogTruncation = true, | ||
| useLeaderEpochInResponse = brokerConfig.interBrokerProtocolVersion >= KAFKA_2_0_IV0) { |
There was a problem hiding this comment.
In ReplicaFetcherThread.fetchEpochsFromLeader(), the version of OffsetsForLeaderEpochRequest should probably be determined based on the interBrokerProtocolVersion. We can use OffsetsForLeaderEpochRequest V1 only if the interBrokerProtocolVersion >= KAFKA_2_0_IV0. Otherwise, we can rolling bounce the cluster to upgrade the code, the leader may still be running the old code and not recognizes OffsetsForLeaderEpochRequest V1.
There was a problem hiding this comment.
The OffsetsForLeaderEpoch request is exactly the same in v0, so we don't need to explicitly check the protocol version when building the requests. If the leader is on older version, it will send v0 response, which will not include leader epoch field, which is handled in OffsetsForLeaderEpochResponse constructor by setting leader epoch field to undefined. In the fetcher thread, we handle this case (where the leader epoch is undefined) in maybeTruncate() and fall back to KIP-101 behavior, same as when this broker is on older protocol version.
There was a problem hiding this comment.
Hmm.. my understanding is that the version of the response should always match the version of the request. Thus in order to receive OffsetsForLeaderEpochResponse V1, the broker needs to send OffsetsForLeaderEpochRequest V1. And the broker should reject the request if the version of the request is not recognized. Did I miss something?
There was a problem hiding this comment.
Yes, correct. I meant nothing different to do in the fetcher thread. If I understood the code correctly, ReplicaFetcherThread.fetchEpochsFromLeader() passes the OffsetsForLeaderEpochRequest.Builder to sendRequest(), and then build() is called on that builder with with a version in NetworkClient.doSend. It looks like the proper version will be used in that case.
There was a problem hiding this comment.
Currently if we do not explicitly specify the version for AbstractRequest.Builder(), the latest version of this request, as determined by ApiKeys.latestVersion(), will be used for this request. The latest version for OffsetsForLeaderEpochRequest will be V1 after this patch. we probably need to explicitly pass the version (determined by the IBP) to OffsetsForLeaderEpochRequest.Builder, similar to what we do for UpdateMetadataRequest.Builder() in ControllerBrokerRequestBatch.sendRequestsToBrokers().
There was a problem hiding this comment.
Oh I see, thank you, let me take a look.
There was a problem hiding this comment.
Thanks a lot for your help, I updated the code to use the OffsetsForLeaderEpochRequest version when building a request.
| */ | ||
| def getOffsetTruncationState(tp: TopicPartition, leaderEpochOffset: EpochEndOffset, replica: Replica, isFutureReplica: Boolean = false): OffsetTruncationState = { | ||
| // to make sure we can distinguish log output for fetching from remote leader or local replica | ||
| val followerName = if (isFutureReplica) "future replica" else "follower" |
There was a problem hiding this comment.
nits: this replica can be either follower or future replica. Maybe the variable can be named replicaName?
There was a problem hiding this comment.
Yeah, I already went back and forth couple of times regarding "replica" vs. "follower" (also re: your comment below). Jun commented (in this PR) that replica is also confusing in a way that leader is also a replica. And in case of future replica, it is also a follower, but of a different type. I propose to keep this name as is, but replace replicaEndOffset with followerEndOffset re: your comment below.
| } else { | ||
| // get (leader epoch, end offset) pair that corresponds to the largest leader epoch | ||
| // less than or equal to the requested epoch. | ||
| val (followerEpoch, replicaEndOffset) = replica.epochs.get.endOffsetFor(leaderEpochOffset.leaderEpoch) |
There was a problem hiding this comment.
nits: would the name replicaEpoch be more consistent with the name replicaEndOffset?
|
|
||
| case class OffsetTruncationState(offset: Long, truncationCompleted: Boolean) { | ||
|
|
||
| def this (offset: Long) = this(offset, true) |
There was a problem hiding this comment.
nits: can we remove the space after this?
| isInterruptible: Boolean = true, | ||
| includeLogTruncation: Boolean) | ||
| includeLogTruncation: Boolean, | ||
| useLeaderEpochInResponse: Boolean = true) |
There was a problem hiding this comment.
My personal opinion is that it may be more general to just pass the interBrokerProtocolVersion to the constructor of AbstactFetcherThread. And we use this variable to determine the version of OffsetsForLeaderEpochRequest when we actually generate the builder for OffsetsForLeaderEpochRequest. It is more consistent with the existing usage KafkaConfig.interBrokerProtocolVersion in the code base. And if in the future there is some other logic that relies on the interBrokerProtocol in the AbstractFetcherThread, we won't need to add more variable to the constructor.
There was a problem hiding this comment.
I agree. However, I just tried it, and it requires changes to ConsumerFetcherThread constructor, and then ConsumerFetcherManager, and so on. I think it would be easy to change later when we need more logic dependent on inter broker protocol version, and especially once we remove old consumer fetcher code.
| * @param fetchOffsets the partitions to mark truncation complete | ||
| */ | ||
| private def markTruncationCompleteAndUpdateFetchOffset(fetchOffsets: Map[TopicPartition, Long]) { | ||
| private def markTruncationCompleteAndUpdateFetchOffset(fetchOffsets: Map[TopicPartition, OffsetTruncationState]) { |
There was a problem hiding this comment.
The method name now is not very accurate. It doesn't always mark truncation as completed.
| warn(s"Based on $followerName's leader epoch, leader replied with an unknown offset in ${replica.topicPartition}. " + | ||
| s"The initial fetch offset ${partitionStates.stateValue(tp).fetchOffset} will be used for truncation.") | ||
| OffsetTruncationState(partitionStates.stateValue(tp).fetchOffset, truncationCompleted = true) | ||
| } else if (leaderEpochOffset.leaderEpoch == UNDEFINED_EPOCH || !useLeaderEpochInResponse) { |
There was a problem hiding this comment.
It seems that we don't really need the flag useLeaderEpochInResponse. If interBrokerProtocolVersion < KAFKA_2_0_IV0, it's guaranteed that leaderEpochOffset.leaderEpoch is UNDEFINED_EPOCH.
There was a problem hiding this comment.
@junrao I have thought about this as well. But for future replica, even if interBrokerProtocolVersion < KAFKA_2_0_IV0, ReplicaAlterDirThread.fetchEpochsFromLeader() may still return EpochEndOffset whose leaderEpoch is not UNDEFINED_EPOCH. Maybe this method should also return UNDEFINED_EPOCH if interBrokerProtocolVersion < KAFKA_2_0_IV0?
There was a problem hiding this comment.
Oh, thanks for raising this, Dong. I think we should then make ReplicaAlterDirThread.fetchEpochsFromLeader() return response with UNDEFINED_EPOCH to match "older protocol response".
| * truncate the leader's offset (and do not send any more leader epoch requests). | ||
| * -- Otherwise, truncate to min(leader's offset, end offset on the follower for epoch that | ||
| * leader replied with, follower's Log End Offset). | ||
| */ |
There was a problem hiding this comment.
It seems that this comment is really for AbstractFetcherThread.getOffsetTruncationState(). If we move the comment there, we can also simplify the comment in ReplicaAlterLogDirsThread.maybeTruncate().
| // less than or equal to the requested epoch. | ||
| val (followerEpoch, followerEndOffset) = replica.epochs.get.endOffsetFor(leaderEpochOffset.leaderEpoch) | ||
| if (followerEndOffset == UNDEFINED_EPOCH_OFFSET) { | ||
| // This can happen if replica was not tracking leader epochs at that point (before the |
There was a problem hiding this comment.
Since the code uses follower, perhaps we can say "if follower was not"
| //We should have truncated to the offsets in the response | ||
| assertTrue(truncateToCapture.getValues.asScala.contains(156)) | ||
| assertTrue(truncateToCapture.getValues.asScala.contains(172)) | ||
| assertTrue("Expected offset 156 in captured truncation offsets " + truncateToCapture.getValues, |
There was a problem hiding this comment.
Perhaps we can change the text to sth like "Expect partition t1p0 to truncate to offset 156".
|
|
||
|
|
||
| //We should have truncated to the offsets in the first response | ||
| assertTrue("Expected offset 155 in captured truncation offsets " + truncateToCapture.getValues, |
There was a problem hiding this comment.
Should we further assert that the builder for OFFSET_FOR_LEADER_EPOCH in ReplicaFetcherMockBlockingSend.sendRequest() is set with the right version?
There was a problem hiding this comment.
I modified ReplicaFetcherMockBlockingSend to save the version of OffsetsForLeaderEpochRequest and added couple of checks in the test.
| tp -> new EpochEndOffset(Errors.NONE, replicaMgr.getReplicaOrException(tp).epochs.get.endOffsetFor(epoch)) | ||
| val (leaderEpoch, leaderOffset) = replicaMgr.getReplicaOrException(tp).epochs.get.endOffsetFor(epoch) | ||
| val leaderEpochInResponse: Int = | ||
| if (brokerConfig.interBrokerProtocolVersion >= KAFKA_2_0_IV0) leaderEpoch |
There was a problem hiding this comment.
Do we need this check? Since we are getting the leader epoch from the current replica's log directly, even when IBP < KAFKA_2_0_IV0, it seems that we can just return leaderEpoch.
There was a problem hiding this comment.
If we are on protocol < 2.0, then the local replica will be fetching from leader based on older protocol (not using leader epoch). If we don't check here, the future replica will be fetching from the local replica based on leader epoch. Seems inconsistent? On the other hand, it should still work for the future replica to truncate using leader epoch in that case too.
There was a problem hiding this comment.
Yes, preserving the leader epoch always gives better outcome. So, if we can do it, there is no reason to switch to a worse method. We have no choice btw follower and leader because of IBP. However, here, everything is local. So, there is no need to be constraint by IBP.
There was a problem hiding this comment.
If we keep the leader epoch here for better outcome, should we still check useLeaderEpochInResponse in getOffsetTruncationState() so that it returns OffsetTruncationState(min(leaderEpochOffset.endOffset, replica.logEndOffset.messageOffset), truncationCompleted = true) if useLeaderEpochInResponse is false?
There was a problem hiding this comment.
If we use leader epoch, then we should go all the way using the new protocol, i.e., continue truncating until finding the consistent point.
Ok, I will change back to using leader epoch if available for future replica.
| OffsetTruncationState(intermediateOffsetToTruncateTo, truncationCompleted = false) | ||
| } else { | ||
| val offsetToTruncateTo = min(followerEndOffset, leaderEpochOffset.endOffset) | ||
| OffsetTruncationState(min(offsetToTruncateTo, replica.logEndOffset.messageOffset), truncationCompleted = true) |
There was a problem hiding this comment.
In general, we don't expect the truncation point to be < local HW. So, it would be useful to log a warning when this happens. Not sure what's the easiest way since now we can have intermediate truncation point.
| import java.util | ||
|
|
||
| import AbstractFetcherThread.ResultWithPartitions | ||
| import kafka.api._ |
|
@junrao I added the warning about truncating below HW to ReplicaFetcherThread.maybeTruncate. I explicitly compare replica.highWatermark to the offset we are truncating. If we truncate several times, and more than once below HW, we will output the warning multiple times, which I think is ok. I ran system tests yesterday (https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1746/) and there was only one failure in kafkatest.benchmarks.streams.streams_simple_benchmark_test.StreamsSimpleBenchmarkTest.test_simple_benchmark.test=streams-join.scale=1 which was due to stream test process took too long to exit. I don't think it is related to any changes in this PR. |
|
|
||
| <script id="upgrade-template" type="text/x-handlebars-template"> | ||
|
|
||
| <h4><a id="upgrade_2_0_0" href="#upgrade_2_0_0">Upgrading from 0.8.x, 0.9.x, 0.10.0.x, 0.10.1.x, 0.10.2.x, 0.11.0.x, 1.0.x, 1.1.x, or 1.2.x to 2.0.0</a></h4> |
…-record-version * apache-github/trunk: KAFKA-6894: Improve err msg when connecting processor with global store (apache#5000) KAFKA-6893; Create processors before starting acceptor in SocketServer (apache#4999) MINOR: Fix typo in ConsumerRebalanceListener JavaDoc (apache#4996) MINOR: Remove deprecated valueTransformer.punctuate (apache#4993) MINOR: Update dynamic broker configuration doc for truststore update (apache#4954) KAFKA-6870 Concurrency conflicts in SampledStat (apache#4985) KAFKA-6361: Fix log divergence between leader and follower after fast leader fail over (apache#4882) KAFKA-6813: Remove deprecated APIs in KIP-182, Part II (apache#4976) KAFKA-6878 Switch the order of underlying.init and initInternal (apache#4988) KAFKA-6299; Fix AdminClient error handling when metadata changes (apache#4295) KAFKA-6878: NPE when querying global state store not in READY state (apache#4978) KAFKA 6673: Implemented missing override equals method (apache#4745) KAFKA-6834: Handle compaction with batches bigger than max.message.bytes (apache#4953)
…t leader fail over (apache#4882) Implementation of KIP-279 as described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-279%3A+Fix+log+divergence+between+leader+and+follower+after+fast+leader+fail+over In summary: - Added leader_epoch to OFFSET_FOR_LEADER_EPOCH_RESPONSE - Leader replies with the pair( largest epoch less than or equal to the requested epoch, the end offset of this epoch) - If Follower does not know about the leader epoch that leader replies with, it truncates to the end offset of largest leader epoch less than leader epoch that leader replied with, and sends another OffsetForLeaderEpoch request. That request contains the largest leader epoch less than leader epoch that leader replied with. Reviewers: Dong Lin <lindong28@gmail.com>, Jun Rao <junrao@gmail.com>
Implementation of KIP-279 as described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-279%3A+Fix+log+divergence+between+leader+and+follower+after+fast+leader+fail+over
In summary:
Added integration test EpochDrivenReplicationProtocolAcceptanceTest.logsShouldNotDivergeOnUncleanLeaderElections that does 3 fast leader changes where unclean leader election is enabled and min isr is 1. The test failed before the fix was implemented.
Committer Checklist (excluded from commit message)