Backport of docs - remove use of consul leave during upgrade instructions as it caused leadership changes into release/1.15.x#17772
Merged
hc-github-team-consul-core merged 1 commit intoJun 15, 2023
Conversation
7c81037 to
08e7c23
Compare
github-team-consul-core-pr-approver
approved these changes
Jun 15, 2023
Collaborator
github-team-consul-core-pr-approver
left a comment
There was a problem hiding this comment.
Auto approved Consul Bot automated PR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport
This PR is auto-generated from #17758 to be assessed for backporting due to the inclusion of the label backport/1.15.
The below text is copied from the body of the original PR.
Note: this needs to be updated on all versions of docs, so backport labels for 1.13 - 1.16 are in place. I will manually cherry pick this into PRS to the docs release branches prior to 1.13.
Description
A customer ran into an issue where leadership elections occurred multiple times for each server that they were upgrading when the initial goal of the process is to ensure the leader is upgraded last. This was caused by the use of
consul leaveduring the upgrade process as they upgraded from consul to consul enterprise.When upgrading it is important that the leader goes last, so that the leader is replicating raft logs on the lower consul version to servers that are either at the same level or at a higher level and are aware of all fields that are within the raft log.
When using consul leave during the upgrade process, the following was observed.
Observed when shutting down
The following occurred when
consul leavewas issued:termindex` (ex: cluster has a term of 100 and server being upgraded has a term of 104) until it shuts downThis happened on multiple servers and the server being upgraded had a
termthat was several greater than the leader and the rest of the cluster.At this point the server is shut down and has the new consul binary.
Observed when restarting
The instructions then have the user start the server using something like
systemctl start. At this point, the following was observed:This loop of losing leadership / starting new elections / electing a new leader will continue until the
termof the cluster matches thetermof the upgraded server. In the example previously mentioned where the cluster had a term of100and the upgraded server has atermof 104, this loop would occur 4 times.At this point, the upgrade process has encountered multiple leader election and the process has been destabilized because it is highly probable that your leader is now different and overall your upgrade process is compromised and not set up for success.
Overview of commits