Skip to content

Conversation

@petr-muller
Copy link
Member

This is a cherry-pick of #27645 and #27678

  • upgrade/adminack: guarantee one admin ack check post-upgrade
  • upgrade/adminack: optimize the post-upgrade check
  • upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check
  • upgrade/adminack: wait up to 4m until gate propagates to upgradeable

/hold

While looking into OCPBUGS-5505 I discovered that some 4.10->4.11
upgrade job runs perform an Admin Ack check, while some do not. 4.11 has
a `ack-4.11-kube-1.25-api-removals-in-4.12` gate, so these upgrade jobs
sometimes test that `Upgradeable` goes `false` after the ugprade, and
sometimes they do not. This is only determined by the polling race
condition: the check is executed once per 10 minutes, and we cancel the
polling after upgrade is completed. This means that in some cases we are
lucky and manage to run one check before the cancel, and sometimes we
are not and only check while still on the base version.

Add a guaranteed single check execution after the upgrade, so that admin
ack is always checked at least once with the upgrade target version.
Doing checks after `done` is signalled has prior art in the alert test.
The `done` signal is either a timeout or "upgrade finished, stop testing". We do not need to perform the last check in the former case. Track versions that we check and when we get the signal, check whether the current version was checked at least once, and if not, check it before terminating.
…ade check

openshift#27645 intended to add a guaranteed post-upgrade check but I have overlooked how exactly the polling is implemented and terminated, leading to the post-upgrade check never actually execute.

Previously the test used `PollImmediateWithContext` for the each-10-minutes check. The `ConditionFunc` never actually returned `true` or non-nil `err`, so the `PollImmediateWithContext` never terminated by the means of `ConditionFunc`: it was always terminated by the `ctx.Done()` that the framework does on finished upgrade (or a test timeout). This means that `PollImmediateWithContext` always terminated with `err=wait.ErrWaitTimeout` and the `Test` method immediately returned, so the "guaranteed" check code is never reached.

Given our `ConditionFunc` never terminates the polling, we can simplify and use the `wait.UntilWithContext` instead, which is a simpler version that precisely implements the desired loop (poll until context is done).
During testing of OCPBUGS-5505, it was discovered that even with
shortening the CVO cache TTL, CVO may still only update `Upgradeable`
in its sync interval, which may be as high as 4 minutes. Hence the
tests needs to wait for that time (I added 5 second buffer on top of
that).
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2023

@petr-muller: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

upgrade/adminack: guarantee one admin ack check post-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 26, 2023
@petr-muller petr-muller changed the title upgrade/adminack: guarantee one admin ack check post-upgrade [release-4.11] upgrade/adminack: guarantee one admin ack check post-upgrade Jan 26, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 26, 2023

@petr-muller: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

[release-4.11] upgrade/adminack: guarantee one admin ack check post-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/retest

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2023

@petr-muller: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-single-node-upgrade 0b633b2 link false /test e2e-aws-single-node-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@petr-muller
Copy link
Member Author

/jira cherrypick OCPBUGS-6850

@openshift-ci-robot
Copy link

@petr-muller: Jira Issue OCPBUGS-6850 has been cloned as Jira Issue OCPBUGS-6851. Retitling PR to link against new bug.
/retitle OCPBUGS-6851: [release-4.11] upgrade/adminack: guarantee one admin ack check post-upgrade

Details

In response to this:

/jira cherrypick OCPBUGS-6850

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/skip

@petr-muller
Copy link
Member Author

/retitle OCPBUGS-6851: [release-4.11] upgrade/adminack: guarantee one admin ack check post-upgrade

@openshift-ci openshift-ci bot changed the title [release-4.11] upgrade/adminack: guarantee one admin ack check post-upgrade OCPBUGS-6851: [release-4.11] upgrade/adminack: guarantee one admin ack check post-upgrade Jan 31, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 31, 2023

@petr-muller: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Details

In response to this:

OCPBUGS-6851: [release-4.11] upgrade/adminack: guarantee one admin ack check post-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jan 31, 2023
@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6851, which is invalid:

  • expected dependent Jira Issue OCPBUGS-6850 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but it is New instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is a cherry-pick of #27645 and #27678

  • upgrade/adminack: guarantee one admin ack check post-upgrade
  • upgrade/adminack: optimize the post-upgrade check
  • upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check
  • upgrade/adminack: wait up to 4m until gate propagates to upgradeable

/hold

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, petr-muller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2023
@stbenjam
Copy link
Member

stbenjam commented Feb 7, 2023

/label backport-risk-assessed
/label cherry-pick-approved

@openshift-ci openshift-ci bot added the backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. label Feb 7, 2023
@openshift-ci openshift-ci bot added the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Feb 7, 2023
@petr-muller
Copy link
Member Author

/jira refresh

@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6851, which is invalid:

  • expected dependent Jira Issue OCPBUGS-6850 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@petr-muller
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 9, 2023
@openshift-ci-robot
Copy link

@petr-muller: This pull request references Jira Issue OCPBUGS-6851, which is valid. The bug has been moved to the POST state.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.11.z) matches configured target version for branch (4.11.z)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-6850 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE))
  • dependent Jira Issue OCPBUGS-6850 targets the "4.12.z" version, which is one of the valid target versions: 4.12.0, 4.12.z
  • bug has dependents

Requesting review from QA contact:
/cc @jiajliu

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from jiajliu February 9, 2023 13:10
@petr-muller
Copy link
Member Author

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 9, 2023
@openshift-merge-robot openshift-merge-robot merged commit ac2791c into openshift:release-4.11 Feb 9, 2023
@openshift-ci-robot
Copy link

@petr-muller: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-6851 has been moved to the MODIFIED state.

Details

In response to this:

This is a cherry-pick of #27645 and #27678

  • upgrade/adminack: guarantee one admin ack check post-upgrade
  • upgrade/adminack: optimize the post-upgrade check
  • upgrade/adminack: simplify polling and unblock "guaranteed" post-upgrade check
  • upgrade/adminack: wait up to 4m until gate propagates to upgradeable

/hold

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. backport-risk-assessed Indicates a PR to a release branch has been evaluated and considered safe to accept. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.