Skip to content

OCPBUGS-13547: Promote Azure CCM from TPNU to default#307

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
JoelSpeed:promote-azure-ccm-feature-gate
Jul 7, 2023
Merged

OCPBUGS-13547: Promote Azure CCM from TPNU to default#307
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
JoelSpeed:promote-azure-ccm-feature-gate

Conversation

@JoelSpeed
Copy link
Copy Markdown
Contributor

This promotes the Azure CCM from TechPreviewNoUpgrade to default

/hold

We need to thoroughly test this and make sure all prior known fixes are merged before this can be merged

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels May 17, 2023
@openshift-ci-robot
Copy link
Copy Markdown

@JoelSpeed: This pull request references Jira Issue OCPBUGS-13547, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sunzhaohua2

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This promotes the Azure CCM from TechPreviewNoUpgrade to default

/hold

We need to thoroughly test this and make sure all prior known fixes are merged before this can be merged

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot requested a review from sunzhaohua2 May 17, 2023 10:34
@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 17, 2023
@openshift-ci openshift-ci Bot requested review from mfojtik and tkashem May 17, 2023 10:34
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 17, 2023
@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/test ?

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 17, 2023

@JoelSpeed: The following commands are available to trigger required jobs:

  • /test e2e-aws-ovn
  • /test e2e-upgrade
  • /test images
  • /test unit
  • /test verify
  • /test verify-deps

The following commands are available to trigger optional jobs:

  • /test e2e-azure
  • /test e2e-gcp

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-cluster-config-operator-master-e2e-aws-ovn
  • pull-ci-openshift-cluster-config-operator-master-e2e-upgrade
  • pull-ci-openshift-cluster-config-operator-master-images
  • pull-ci-openshift-cluster-config-operator-master-unit
  • pull-ci-openshift-cluster-config-operator-master-verify
  • pull-ci-openshift-cluster-config-operator-master-verify-deps
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/test e2e-azure

@JoelSpeed JoelSpeed changed the title OCPBUGS-13547: Promote Azure CCM from TPNU to default [WIP] OCPBUGS-13547: Promote Azure CCM from TPNU to default May 17, 2023
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 17, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2023
@JoelSpeed JoelSpeed force-pushed the promote-azure-ccm-feature-gate branch from c7dbdea to 9f129da Compare June 6, 2023 11:47
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2023
@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/test e2e-azure

3 similar comments
@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/test e2e-azure

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/test e2e-azure

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/test e2e-azure

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/test e2e-azure

Seems there's some infrastructure related issue causing failures

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

Network operator is going degraded because of some panic in the internals of the http server in the last run, doesn't look like that's desirable
/test e2e-azure

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

Azure E2E failed this time in cluster teardown, but the tests themselves passed, lets try some payload tests
/payload-job periodic-ci-openshift-cluster-control-plane-machine-set-operator-release-4.14-periodics-e2e-azure periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn periodic-ci-openshift-release-master-nightly-4.14-e2e-azure-sdn periodic-ci-openshift-release-master-ci-4.14-e2e-azure-sdn-upgrade periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 7, 2023

@JoelSpeed: trigger 4 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-azure-sdn
  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/ed5779e0-0511-11ee-85e1-27a2f38f2b56-0

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

Having reviewed the previous payload jobs, looks like a couple of minor/unrelated flakes caused failures, lets run some payload blocking tests to see if they find any issues
/payload 4.14 nightly blocking
/payload 4.14 ci blocking

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 7, 2023

@JoelSpeed: trigger 7 job(s) of type blocking for the nightly release of OCP 4.14

  • periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-serial
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-ovn-ipv6
  • periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/477145d0-0537-11ee-8da9-2869c796f213-0

trigger 4 job(s) of type blocking for the ci release of OCP 4.14

  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/477145d0-0537-11ee-8da9-2869c796f213-1

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2023
@JoelSpeed JoelSpeed force-pushed the promote-azure-ccm-feature-gate branch from 9f129da to 0e78099 Compare June 16, 2023 16:46
@JoelSpeed
Copy link
Copy Markdown
Contributor Author

The upgrade aggregation went green!
The regular one did not, 6 jobs passed, 4 failures:

  • 1 run failed after 10 minutes - tried to install to a region unavailable for the account, infrastructure issue
  • 1 run had issues with prometheus by the looks of it, something about reporting telemetry late, suspect it's unrelated
  • 1 had only 5/6 machines become nodes, the 6th machine failed to boot with the following error, this looks like it could be a MAPI issue
"failed to reconcile machine \"ci-op-m5y1z9gy-37915-km9cv-worker-westus-9d4gk\": failed to create vm ci-op-m5y1z9gy-37915-km9cv-worker-westus-9d4gk: failure sending request for machine ci-op-m5y1z9gy-37915-km9cv-worker-westus-9d4gk: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status=\u003cnil\u003e Code=\"ConflictingUserInput\" Message=\"Disk ci-op-m5y1z9gy-37915-km9cv-worker-westus-9d4gk_OSDisk already exists in resource group CI-OP-M5Y1Z9GY-37915-KM9CV-RG. Only CreateOption.Attach is supported.\" Target=\"/subscriptions/72e3a972-58b0-4afc-bd4f-da89b39ccebd/resourceGroups/ci-op-m5y1z9gy-37915-km9cv-rg/providers/Microsoft.Compute/disks/ci-op-m5y1z9gy-37915-km9cv-worker-westus-9d4gk_OSDisk\"
  • The final failed job seemed to fail spectacularly. It failed with 101 errors, most of which look like networking cancelled related errors. It failed after 90 minutes which seems a lot short than the other runs, perhaps the test pod got cancelled?

Going to try that again
/payload-aggregate periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn 10

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 29, 2023

@JoelSpeed: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/5659ebc0-165c-11ee-9fae-e683f19d8364-0

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

All went green this time, I think this is good to go

@sunzhaohua2
Copy link
Copy Markdown

I tested this pr and enable TPNU then ran e2e all passed.
Upgrade cluster from 4.13.4 to the 307 built image succeed.
Another windows cluster windows-machine-config-operator pod ImagePullBackOff seems not related to this pr.

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

Spoke to TRT, they would like us to hold on this until the next EC build has been cut, will merge next week

@JoelSpeed JoelSpeed force-pushed the promote-azure-ccm-feature-gate branch from c398578 to c850a0d Compare July 6, 2023 15:16
@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 6, 2023

@JoelSpeed: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/8ac5b010-1c10-11ee-91ec-cac1a43eded5-0

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 6, 2023

@JoelSpeed: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure 9f129da link false /test e2e-azure

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/payload-job periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn

Test pod failed to schedule

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 6, 2023

@JoelSpeed: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/60578bc0-1c22-11ee-85af-1380f3933a85-0

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/payload-aggregate periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn 10

The canary I ran is using the latest CCM, so let's do this one more time to double check my latest fix hasn't broken anything

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 6, 2023

@JoelSpeed: trigger 1 job(s) for the /payload-(job|aggregate) command

  • periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/49610930-1c33-11ee-80af-91c4c0cbd798-0

@JoelSpeed JoelSpeed force-pushed the promote-azure-ccm-feature-gate branch from c850a0d to f93b1a7 Compare July 7, 2023 07:37
@JoelSpeed JoelSpeed changed the title [WIP] OCPBUGS-13547: Promote Azure CCM from TPNU to default OCPBUGS-13547: Promote Azure CCM from TPNU to default Jul 7, 2023
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 7, 2023
Copy link
Copy Markdown
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jul 7, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 7, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damdo, JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/hold cancel

API PR has merged and test runs from previous job show that the CCM is working as expected and is not causing the disruption that previously caused it to be reverted

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 7, 2023
@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 2a00cff and 2 for PR HEAD f93b1a7 in total

@JoelSpeed
Copy link
Copy Markdown
Contributor Author

/override ci/prow/e2e-aws-ovn-techpreview

This PR does not affect AWS in any way, and especially not in techpreview. Failures were minor and related to OLM, so I'm confident not caused by this PR

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 7, 2023

@JoelSpeed: Overrode contexts on behalf of JoelSpeed: ci/prow/e2e-aws-ovn-techpreview

Details

In response to this:

/override ci/prow/e2e-aws-ovn-techpreview

This PR does not affect AWS in any way, and especially not in techpreview. Failures were minor and related to OLM, so I'm confident not caused by this PR

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 0451db0 into openshift:master Jul 7, 2023
@openshift-ci-robot
Copy link
Copy Markdown

@JoelSpeed: Jira Issue OCPBUGS-13547 is in an unrecognized state (ON_QA) and will not be moved to the MODIFIED state.

Details

In response to this:

This promotes the Azure CCM from TechPreviewNoUpgrade to default

/hold

We need to thoroughly test this and make sure all prior known fixes are merged before this can be merged

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JoelSpeed JoelSpeed deleted the promote-azure-ccm-feature-gate branch July 7, 2023 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants