KAFKA-12465: Logic about inconsistent cluster id#11209
KAFKA-12465: Logic about inconsistent cluster id#11209dengziming wants to merge 3 commits intoapache:trunkfrom
Conversation
|
Hi @hachikuji @jsancio , PTAL. I also moved the logic for |
jsancio
left a comment
There was a problem hiding this comment.
@dengziming, Thanks for the changes and apologies for the delayed comment.
Can you update the description so that it contains all of the information needed to understand this PR without having to read the comment you linked in the description?
There was a problem hiding this comment.
How about changing this to return an Errors?
INVALID_REQUESTif there is more than one topic partitionUNKNOWN_TOPIC_OR_PARTITIONif the topic partition doesn't match the log's name and partitionNONEotherwise
There was a problem hiding this comment.
I think this is a viable improvement, there are 11 similar methods here, so I changed them all and add a unit test for them.
There was a problem hiding this comment.
I am trying to understand this comment. Can you please explain why this is true? And why do you think that this comment is important in this test?
This comment applies to a few places.
There was a problem hiding this comment.
Sorry I made a wrong comment here, I tested 2 cases here:
- Receive INCONSISTENT_CLUSTER_ID in the first response after starting, which is fatal.
- Receive INCONSISTENT_CLUSTER_ID in the second response after starting, which is not fatal.
However, the first response after starting can't be FetchSnapshotResponse, so I added a comment here, so do BeginQuorumResponse and EndQuorumResponse.
b2806a3 to
67edb7e
Compare
|
This PR is being marked as stale since it has not had any activity in 90 days. If you If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact). If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed. |
|
This PR has been closed since it has not had any activity in 120 days. If you feel like this |
More detailed description of your change
When handling a response, we treat INCONSISTENT_CLUSTER_ID as a fatal error unless a previous response contained a valid cluster id, this solution can catch misconfiguration as early as possible and also avoid cases when a misconfigured node kills a stable cluster. However, the node will continue executing with the misconfiguration in some edge cases, for example, 3 nodes with 3 different cluster ids.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)