Skip to content

KAFKA-12465: Logic about inconsistent cluster id#11209

Closed
dengziming wants to merge 3 commits intoapache:trunkfrom
dengziming:KAFKA-12465-invalid-cluster-id
Closed

KAFKA-12465: Logic about inconsistent cluster id#11209
dengziming wants to merge 3 commits intoapache:trunkfrom
dengziming:KAFKA-12465-invalid-cluster-id

Conversation

@dengziming
Copy link
Copy Markdown
Member

@dengziming dengziming commented Aug 13, 2021

More detailed description of your change
When handling a response, we treat INCONSISTENT_CLUSTER_ID as a fatal error unless a previous response contained a valid cluster id, this solution can catch misconfiguration as early as possible and also avoid cases when a misconfigured node kills a stable cluster. However, the node will continue executing with the misconfiguration in some edge cases, for example, 3 nodes with 3 different cluster ids.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@dengziming
Copy link
Copy Markdown
Member Author

Hi @hachikuji @jsancio , PTAL. I also moved the logic for UNKNOWN_TOPIC_OR_PARTITION in handleFetchSnapshot to RaftUtil.java.

@ijuma ijuma requested a review from jsancio February 5, 2022 05:50
Copy link
Copy Markdown
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dengziming, Thanks for the changes and apologies for the delayed comment.

Can you update the description so that it contains all of the information needed to understand this PR without having to read the comment you linked in the description?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing this to return an Errors?

  1. INVALID_REQUEST if there is more than one topic partition
  2. UNKNOWN_TOPIC_OR_PARTITION if the topic partition doesn't match the log's name and partition
  3. NONE otherwise

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a viable improvement, there are 11 similar methods here, so I changed them all and add a unit test for them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand this comment. Can you please explain why this is true? And why do you think that this comment is important in this test?

This comment applies to a few places.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I made a wrong comment here, I tested 2 cases here:

  1. Receive INCONSISTENT_CLUSTER_ID in the first response after starting, which is fatal.
  2. Receive INCONSISTENT_CLUSTER_ID in the second response after starting, which is not fatal.

However, the first response after starting can't be FetchSnapshotResponse, so I added a comment here, so do BeginQuorumResponse and EndQuorumResponse.

@dengziming dengziming force-pushed the KAFKA-12465-invalid-cluster-id branch from b2806a3 to 67edb7e Compare February 7, 2022 12:53
@github-actions
Copy link
Copy Markdown

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions Bot added the stale Stale PRs label Dec 24, 2024
@github-actions
Copy link
Copy Markdown

This PR has been closed since it has not had any activity in 120 days. If you feel like this
was a mistake, or you would like to continue working on it, please feel free to re-open the
PR and ask for a review.

@github-actions github-actions Bot added the closed-stale PRs that were closed due to inactivity label Jan 24, 2025
@github-actions github-actions Bot closed this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

closed-stale PRs that were closed due to inactivity stale Stale PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants