Bug 1797244: test/extended/dr: update test to reflect new DR process#25071
Bug 1797244: test/extended/dr: update test to reflect new DR process#25071openshift-merge-robot merged 1 commit intoopenshift:masterfrom
Conversation
|
@hexfusion: No Bugzilla bug is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-aws-disruptive |
b197579 to
7b7a905
Compare
|
/test e2e-aws-disruptive |
1 similar comment
|
/test e2e-aws-disruptive |
|
/test e2e-aws-disruptive |
|
flake |
Passed \o/ test failed because of what I thought was #25087 but CI merged already |
|
/test e2e-aws-disruptive |
|
@hexfusion: This pull request references Bugzilla bug 1797244, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test e2e-aws-disruptive |
|
/test e2e-aws-disruptive |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hexfusion, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
2 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/bugzilla refresh |
|
@hexfusion: This pull request references Bugzilla bug 1797244, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@hexfusion: This pull request references Bugzilla bug 1797244, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/refresh |
|
/cherry-pick release-4.5 |
|
@hexfusion: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/test e2e-cmd |
|
@hexfusion: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@hexfusion: All pull requests linked via external trackers have merged: openshift/origin#25071. Bugzilla bug 1797244 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@hexfusion: new pull request created: #25147 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.4 |
|
@retroflexer: #25071 failed to apply on top of branch "release-4.4": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The addition of cluster etcd operator introduced a new disaster recovery workflow that broke the disruptive tests. The previous workflow involved doing a snapshot restore of the etcd data file. The advantage to this approach is that once etcd starts the process is complete. The downside is a restore action must take place on each master node.
In 4.4+ we automate this process and require only a single invocation of the cluster-backup.sh script and cluster-restore.sh script. Given new machines the operator then scales etcd on the new machines resulting in a 3 node etcd cluster.
The downside of this process is that it is more disruptive as etcd and the rest of the control plane scales. For this reason, we needed to adjust not only some of the timings of the old test allowing for certain steps to take longer. But also to allow for transient etcd timeouts. During scaling from 1 to 2 etcd members we must lose quorum. The resulting leader election will result in client timeouts.
This is a starting place where we can begin to harden the requirements and improve. But for now lets make sure the workflow itself completes as expected.