Skip to content

KAFKA-7104: Handle leader's log start offset beyond last fetched offset#5302

Closed
apovzner wants to merge 1 commit intoapache:trunkfrom
apovzner:kafka-7104
Closed

KAFKA-7104: Handle leader's log start offset beyond last fetched offset#5302
apovzner wants to merge 1 commit intoapache:trunkfrom
apovzner:kafka-7104

Conversation

@apovzner
Copy link
Copy Markdown
Contributor

Leader replica may return log start offset in the fetch response that is higher then last fetched offset. This may happen if the log start offset on the leader moves while the fetch response is being build (eg., due to rolling and deleting old segments, or deleting records). This PR limits setting log start offset on the follower to its LEO.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown
Member

@ijuma ijuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not do this check on the leader instead? Seems like clients could also be exposed to this broken invariant.

@apovzner
Copy link
Copy Markdown
Contributor Author

@ijuma clients will not have that issue -- the race happens because the leader first reads log & creates log result LogReadResult , but then updates LogReadResult only for the follower. The log start offset can change before the second update happens. For clients, the log read result is created once and not updated.

Specifically, in ReplicaManager.fetchMessages():

    def readFromLog(): Seq[(TopicPartition, LogReadResult)] = {
      val result = readFromLocalLog(.....)
      if (isFromFollower) updateFollowerLogReadResults(replicaId, result)
      else result
    }

@apovzner
Copy link
Copy Markdown
Contributor Author

apovzner commented Jun 27, 2018

Build failed: kafka.api.SaslSslAdminClientIntegrationTest > testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords FAILED, which looks unrelated:

org.apache.kafka.common.errors.SaslAuthenticationException: An error:
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Request is a replay (34) - Request is a replay)]) occurred when evaluating SASL token received from the Kafka Broker. Kafka Client will go to AUTHENTICATION_FAILED state.
Caused by:
16:03:46      KrbException: Identifier doesn't match expected value (906)

@hachikuji
Copy link
Copy Markdown
Contributor

The java consumer doesn't really use the start offset, but that may not be true for all clients? I was wondering if we should just use min(fetchOffset, startOffset) as the start offset in the fetch response.

@hachikuji
Copy link
Copy Markdown
Contributor

Sorry, that's not quite what I meant. I guess the problem is that it is difficult to determine the first offset of the fetched data in the general case. It is easy for the new message format, but we also have that edge case where the start offset falls in the middle of a batch to be concerned about... Maybe the requirement should be that the start offset cannot be larger than end offset of the first batch or something like that.

@apovzner
Copy link
Copy Markdown
Contributor Author

@hachikuji as I mentioned in the earlier comment, the issue with inconsistency in fetch result happens only for the fetch from the follower. For the client, the fetch result is creates once when log is read and represents the state at that time, which means that log start offset cannot be higher than first fetched offset returned to the client.

Are you suggesting to bound leader's log start offset in fetch response on the leader instead of the follower? Or both?
I am a bit concerned to bound log start offset on the leader, because I am not sure yet if there will be any side effects. In this PR, we effectively make log start offset = LEO on the follower, which is an empty log, which I think reasonably follows leader state. Another option maybe not to update leader start offset in fetch response once the data is read, but I did not investigate yet why we had to do that in the first place.

Another option on the follower would be treat the offset out of range from maybeIncrementLogStartOffset on the follower same way as if we handle OFFSET_OUT_OF_RANGE fetch response and reset fetch offset to leader's start offset. That would match the behavior when the log start offset on the leader moves beyond LEO of the follower --- If the fetch happened just after incrementing log start offset on the leader instead of during, the follower would get OFFSET_OUT_OF_RANGE and reset the fetch offset to leader's start offset.

@hachikuji
Copy link
Copy Markdown
Contributor

@apovzner Yeah, I was suggesting to fix the problem on the leader. It seems more ideal if the fetch response is at least internally consistent. Out of curiosity, why are the behaviors different for the follower and the consumer?

@apovzner
Copy link
Copy Markdown
Contributor Author

I opened another PR to fix the problem on the leader instead of follower: #5305

@apovzner apovzner closed this Jun 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants