KAFKA-5758: Don't fail fetch request if replica is no longer a follower for a partition by ijuma · Pull Request #3954 · apache/kafka

ijuma · 2017-09-25T07:34:17Z

We log a warning instead, which is what we also do if the partition
hasn't been created yet.

A few other improvements:

Return updated high watermark if fetch is returned immediately.
This seems to be more intuitive and is consistent with the case
where the fetch request is served from the purgatory.
Centralise offline partition handling
Remove unnecessary tryCompleteDelayedProduce that would
already have been done by the called method
A few other minor clean-ups

ijuma · 2017-09-25T07:34:38Z

@junrao, does this look reasonable?

ijuma · 2017-09-29T11:56:38Z

retest this please

junrao

@ijuma : Thanks for the patch. Great cleanup as always. Just a few minor comments below.

junrao · 2017-09-30T00:43:32Z

The line probably should be above line 792 now?

Good catch that this comment is in the wrong place. I updated the code a little and I think the comment doesn't add much after that, so I removed it. Let me know if you disagree.

junrao · 2017-09-30T01:11:16Z

This seems to be an existing problem, but it seems that we only need to call tryCompleteDelayedRequests() when the HW increments.

I think tryCompleteDelayedDeleteRecords is affected by leaderLWIncremented.

private def tryCompleteDelayedRequests() { val requestKey = new TopicPartitionOperationKey(topicPartition) replicaManager.tryCompleteDelayedFetch(requestKey) replicaManager.tryCompleteDelayedProduce(requestKey) replicaManager.tryCompleteDelayedDeleteRecords(requestKey) }

We could make the tryComplete calls more specific depending on what changed, if you think that's worth it.

junrao · 2017-09-30T01:30:19Z

Shouldn't we set the broker.id property since ReplicaManager needs config.brokerId?

Yes, createBrokerConfig does it:

if (nodeId >= 0) props.put(KafkaConfig.BrokerIdProp, nodeId.toString)

junrao · 2017-09-30T01:33:24Z

If the replica is not in the assigned replica list, I am wondering if we should just send an empty fetched data back.

Yes, I agree that it would be better and I was a bit unhappy about the wastefulness of sending a non empty fetch response in that case. Will fix.

…er for a partition We log a warning instead, which is what we also do if the partition hasn't been created yet.

…dundant comment

…plicas

ijuma · 2017-10-01T17:06:09Z

retest this please

ijuma · 2017-10-02T10:08:47Z

@junrao, I've addressed your comments.

junrao · 2017-10-02T17:31:15Z

@ijuma : Thanks for the updated patch. LGTM. I will let you merge it.

tedyu · 2017-10-02T23:52:05Z

+    val logStartOffsets = assignedReplicas.collect {
+      case replica if replicaManager.metadataCache.isBrokerAlive(replica.brokerId) => replica.logStartOffset
+    }
+    CoreUtils.min(logStartOffsets, 0L)


Wouldn't this return 0 in most cases ?

No, it only returns 0 if the collection is empty.

Do not update LogReadResult after it is initially populated when returning fetches immediately (i.e. without hitting the purgatory). This was done in #3954 as an optimization so that the followers get the potentially updated high watermark. However, since many things can happen (like deleting old segments and advancing log start offset) between initial creation of LogReadResult and the update, we can hit issues like log start offset in fetch response being higher than the last offset in fetched records. Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>

…5305) Do not update LogReadResult after it is initially populated when returning fetches immediately (i.e. without hitting the purgatory). This was done in apache#3954 as an optimization so that the followers get the potentially updated high watermark. However, since many things can happen (like deleting old segments and advancing log start offset) between initial creation of LogReadResult and the update, we can hit issues like log start offset in fetch response being higher than the last offset in fetched records. Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>

KAFKA-5758; Don't fail fetch request if replica is no longer a follower for a partition We log a warning instead, which is what we also do if the partition hasn't been created yet. A few other improvements: - Return updated high watermark if fetch is returned immediately. This seems to be more intuitive and is consistent with the case where the fetch request is served from the purgatory. - Centralise offline partition handling - Remove unnecessary `tryCompleteDelayedProduce` that would already have been done by the called method - A few other minor clean-ups Author: Ismael Juma <ismael@juma.me.uk> Reviewers: Jun Rao <junrao@gmail.com> Closes apache#3954 from ijuma/kafka-5758-dont-fail-fetch-request-if-replica-is-not-follower

ijuma force-pushed the kafka-5758-dont-fail-fetch-request-if-replica-is-not-follower branch 2 times, most recently from 909c689 to 6c73665 Compare September 25, 2017 13:30

junrao reviewed Sep 30, 2017

View reviewed changes

ijuma added 8 commits October 1, 2017 01:36

KAFKA-5758: Don't fail fetch request if replica is no longer a follow…

09ece7e

…er for a partition We log a warning instead, which is what we also do if the partition hasn't been created yet.

Centralise some of the offline partition handling

5361233

Remove unnecessary tryCompleteDelayedProduce

4eb1774

Add test

a533a7d

Return updated hw in the immediate return case

011cdf7

Fix test failures in GroupMetadataManagerTest

b2e52fe

Tweak ReplicaManager.fetchMessages to make it clearer and remove re…

46c9397

…dundant comment

Tweak lowWatermarkIfLeader to be a bit more direct.

45f52b3

ijuma force-pushed the kafka-5758-dont-fail-fetch-request-if-replica-is-not-follower branch from 84ffc4b to 45f52b3 Compare October 1, 2017 01:14

Return empty fetch response if follower is not one of the assigned re…

babd8a0

…plicas

asfgit closed this in e110e1c Oct 2, 2017

tedyu reviewed Oct 2, 2017

View reviewed changes

ijuma deleted the kafka-5758-dont-fail-fetch-request-if-replica-is-not-follower branch December 22, 2017 18:24

apovzner mentioned this pull request Jun 28, 2018

KAFKA-7104: Consistent leader's state in fetch response #5305

Merged

3 tasks

Conversation

ijuma commented Sep 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ijuma commented Sep 25, 2017

Uh oh!

ijuma commented Sep 29, 2017

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma commented Oct 1, 2017

Uh oh!

ijuma commented Oct 2, 2017

Uh oh!

junrao commented Oct 2, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ijuma commented Sep 25, 2017 •

edited

Loading