Skip to content

Bug 1797244: test/extended/dr: update test to reflect new DR process#25071

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
hexfusion:fix-dr-tests
Jun 18, 2020
Merged

Bug 1797244: test/extended/dr: update test to reflect new DR process#25071
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
hexfusion:fix-dr-tests

Conversation

@hexfusion
Copy link
Copy Markdown
Contributor

@hexfusion hexfusion commented Jun 5, 2020

The addition of cluster etcd operator introduced a new disaster recovery workflow that broke the disruptive tests. The previous workflow involved doing a snapshot restore of the etcd data file. The advantage to this approach is that once etcd starts the process is complete. The downside is a restore action must take place on each master node.

In 4.4+ we automate this process and require only a single invocation of the cluster-backup.sh script and cluster-restore.sh script. Given new machines the operator then scales etcd on the new machines resulting in a 3 node etcd cluster.

The downside of this process is that it is more disruptive as etcd and the rest of the control plane scales. For this reason, we needed to adjust not only some of the timings of the old test allowing for certain steps to take longer. But also to allow for transient etcd timeouts. During scaling from 1 to 2 etcd members we must lose quorum. The resulting leader election will result in client timeouts.

This is a starting place where we can begin to harden the requirements and improve. But for now lets make sure the workflow itself completes as expected.

@openshift-ci-robot
Copy link
Copy Markdown

@hexfusion: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

[wip] test/extended/dr: update test to reflect new DR process

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 5, 2020
@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-disruptive

@hexfusion hexfusion changed the base branch from release-4.5 to release-4.6 June 8, 2020 21:40
@hexfusion hexfusion force-pushed the fix-dr-tests branch 3 times, most recently from b197579 to 7b7a905 Compare June 10, 2020 00:59
@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-disruptive

1 similar comment
@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-disruptive

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-disruptive

@hexfusion
Copy link
Copy Markdown
Contributor Author

flake
/test e2e-aws-disruptive

@hexfusion
Copy link
Copy Markdown
Contributor Author

[sig-etcd][Feature:DisasterRecovery][Disruptive] [Feature:EtcdRecovery] Cluster should restore itself after quorum loss [Disabled:Broken] [Serial] [Suite:openshift]

Passed \o/

test failed because of what I thought was #25087 but CI merged already

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-disruptive

@hexfusion hexfusion changed the title [wip] test/extended/dr: update test to reflect new DR process Bug 1797244: [wip] test/extended/dr: update test to reflect new DR process Jun 11, 2020
@openshift-ci-robot openshift-ci-robot added bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 11, 2020
@openshift-ci-robot
Copy link
Copy Markdown

@hexfusion: This pull request references Bugzilla bug 1797244, which is invalid:

  • expected Bugzilla bug 1797244 to depend on a bug in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Bug 1797244: [wip] test/extended/dr: update test to reflect new DR process

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jun 11, 2020
@hexfusion hexfusion changed the title Bug 1797244: [wip] test/extended/dr: update test to reflect new DR process [wip] Bug 1797244: test/extended/dr: update test to reflect new DR process Jun 11, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 11, 2020
@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-disruptive

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-aws-disruptive

Comment thread test/extended/dr/quorum_restore.go
Comment thread test/extended/dr/quorum_restore.go Outdated
Comment thread test/extended/dr/quorum_restore.go Outdated
Comment thread test/extended/dr/quorum_restore.go Outdated
@smarterclayton
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 17, 2020
@openshift-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hexfusion, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2020
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Copy Markdown
Contributor Author

/bugzilla refresh

@openshift-ci-robot
Copy link
Copy Markdown

@hexfusion: This pull request references Bugzilla bug 1797244, which is invalid:

  • expected Bugzilla bug 1797244 to depend on a bug in one of the following states: VERIFIED, RELEASE_PENDING, CLOSED (ERRATA), but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion hexfusion changed the base branch from release-4.6 to master June 17, 2020 17:10
@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jun 17, 2020
@openshift-ci-robot
Copy link
Copy Markdown

@hexfusion: This pull request references Bugzilla bug 1797244, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1797244: test/extended/dr: update test to reflect new DR process

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Jun 17, 2020
@hexfusion
Copy link
Copy Markdown
Contributor Author

/refresh

@hexfusion
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-4.5

@openshift-cherrypick-robot
Copy link
Copy Markdown

@hexfusion: once the present PR merges, I will cherry-pick it on top of release-4.5 in a new PR and assign it to you.

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Copy Markdown
Contributor Author

/test e2e-cmd

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Jun 17, 2020

@hexfusion: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-conformance-k8s 051ca88 link /test e2e-conformance-k8s
ci/prow/e2e-aws-jenkins 051ca88 link /test e2e-aws-jenkins
ci/prow/e2e-aws 051ca88 link /test e2e-aws
ci/prow/e2e-vsphere 051ca88 link /test e2e-vsphere
ci/prow/launch-vsphere 051ca88 link /test launch-vsphere
ci/prow/e2e-gcp-builds 69ee373 link /test e2e-gcp-builds

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit c5076c9 into openshift:master Jun 18, 2020
@openshift-ci-robot
Copy link
Copy Markdown

@hexfusion: All pull requests linked via external trackers have merged: openshift/origin#25071. Bugzilla bug 1797244 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1797244: test/extended/dr: update test to reflect new DR process

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@hexfusion: new pull request created: #25147

Details

In response to this:

/cherry-pick release-4.5

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@retroflexer
Copy link
Copy Markdown

/cherry-pick release-4.4

@openshift-cherrypick-robot
Copy link
Copy Markdown

@retroflexer: #25071 failed to apply on top of branch "release-4.4":

error: Failed to merge in the changes.
Using index info to reconstruct a base tree...
M	test/extended/dr/common.go
M	test/extended/dr/quorum_restore.go
Falling back to patching base and 3-way merge...
Auto-merging test/extended/dr/quorum_restore.go
CONFLICT (content): Merge conflict in test/extended/dr/quorum_restore.go
Auto-merging test/extended/dr/common.go
CONFLICT (content): Merge conflict in test/extended/dr/common.go
Patch failed at 0001 test/extended/dr: update test to reflect new DR process

Details

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. vendor-update Touching vendor dir or related files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants