Skip to content

Conversation

@qiliRedHat
Copy link
Contributor

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2024
@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-24nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-24nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@qiliRedHat
Copy link
Contributor Author

qiliRedHat commented Nov 21, 2024

First 24 nodes job failed.
gather-extra did not finish in 8 hours.

INFO[2024-11-20T11:14:13Z] Running step loaded-upgrade-416to418-24nodes-gather-extra. 
{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 8h0m0s timeout","severity":"error","time":"2024-11-20T15:26:22Z"}

And chainupgrade-toimage step failed because of timeout

Upgrade checking timeout at 2024-11-20 11:08:42

Eclipsed Time: 122m

INFO[2024-11-05T08:47:14Z] Step loaded-upgrade-cpou-aws-416to418-24nodes-cucushift-chainupgrade-toimage succeeded after 2h58m3s. 

TIMEOUT env in cucushift-chainupgrade-toimage ref, default is 120

  - name: TIMEOUT
    default: "120"
    documentation: Time to wait for upgrade finish
  - name: UPGRADE_CCO_MANUAL_MODE
    default: ""
    documentation: |-
      Detemine what's the cco manual mode of the cluster to be upgraded

Before the timeout, the upgrade was 97% completed.

Completion:      97% (32 operators updated, 1 updating, 0 waiting)
....
Upgrade checking timeout at 2024-11-20 11:08:42

Eclipsed Time: 122m

Increase the TIMEOUT to 150 and retry


TIMEOUT = 150 test also failed
job

Completion:      97% (32 operators updated, 1 updating, 0 waiting)
....
Upgrade checking timeout at 2024-11-21 12:23:20
Eclipsed Time: 152m

Increase the TIMEOUT to 240 and retry


TIMEOUT = 240 test also failed
job

Completion:      97% (32 operators updated, 1 updating, 0 waiting)
....
Upgrade checking timeout at 2024-11-22 17:10:36
Eclipsed Time: 244m

@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-24nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat, pj-rehearse: unable to determine affected jobs ERROR:

could not load configuration from candidate revision of release repo: failed to load ci-operator configuration from release repo: invalid ci-operator config: configuration has 4 errors:

 * tests[2]: job timeout is limited to 8h0m0s
 * tests[3]: job timeout is limited to 8h0m0s
 * tests[4]: job timeout is limited to 8h0m0s
 * tests[5]: job timeout is limited to 8h0m0s

If the problem persists, please contact Test Platform.

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat, pj-rehearse: unable to determine affected jobs. This could be due to a branch that needs to be rebased. ERROR:

could not load configuration from candidate revision of release repo: failed to load ci-operator configuration from release repo: invalid ci-operator config: configuration has 4 errors:

 * tests[2]: job timeout is limited to 8h0m0s
 * tests[3]: job timeout is limited to 8h0m0s
 * tests[4]: job timeout is limited to 8h0m0s
 * tests[5]: job timeout is limited to 8h0m0s

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-24nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-24nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@qiliRedHat
Copy link
Contributor Author

The failure is not related to the TIMEOUT. The Cluster Operator machine-config is degraded because ' MachineConfigPool infra has not progressed to latest configuration'.

= Update Health =
Message: Cluster Operator machine-config is degraded (RequiredPoolsFailed)
  Since:       58m9s
  Level:       Error
  Impact:      API Availability
  Reference:   https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md
  Resources:
    clusteroperators.config.openshift.io: machine-config
  Description: Unable to apply 4.17.0-0.nightly-2024-11-21-052346: error during syncRequiredMachineConfigPools: [context deadline exceeded, MachineConfigPool infra has not progressed to latest configuration: controller version mismatch for rendered-infra-6c171d9d397c09f3d4b0b81d46df2c05 expected 39e1cd3c3b04229c48988be1fb7f99b95856aff3 has 4bb3364914c4dbcdfcc08b0914f402cdd38f014f: <unknown>, retrying]
Message: Cluster Version version is failing to proceed with the update (ClusterOperatorDegraded)
  Since:       3m58s
  Level:       Warning
  Impact:      Update Stalled
  Reference:   https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/ClusterOperatorDegraded.md
  Resources:
    clusterversions.config.openshift.io: version
  Description: Cluster operator machine-config is degraded
Message: Outdated nodes in a paused pool 'infra' will not be updated
  Since:       -
  Level:       Warning
  Impact:      Update Stalled
  Reference:   https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-operator-issues.html#troubleshooting-disabling-autoreboot-mco_troubleshooting-operator-issues
  Resources:
    machineconfigpools.machineconfiguration.openshift.io: infra
  Description: Pool is paused, which stops all changes to the nodes in the pool, including updates. The nodes will not be updated until the pool is unpaused by the administrator.
Message: Outdated nodes in a paused pool 'worker' will not be updated
  Since:       -
  Level:       Warning
  Impact:      Update Stalled
  Reference:   https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-operator-issues.html#troubleshooting-disabling-autoreboot-mco_troubleshooting-operator-issues
  Resources:
    machineconfigpools.machineconfiguration.openshift.io: worker
  Description: Pool is paused, which stops all changes to the nodes in the pool, including updates. The nodes will not be updated until the pool is unpaused by the administrator.

@qiliRedHat qiliRedHat marked this pull request as draft November 26, 2024 13:57
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2024
@qiliRedHat
Copy link
Contributor Author

@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-24nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@qiliRedHat qiliRedHat marked this pull request as ready for review November 28, 2024 09:37
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 28, 2024
@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-120nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci openshift-ci bot requested a review from shahsahil264 November 28, 2024 09:38
@qiliRedHat
Copy link
Contributor Author

@jiajliu Hi Jia, please help to review the change to 'extend the timeout of cucushift-upgrade-cpou-unpause-worker-mcp from 1h10m to 2h to support 120 worker nodes'. Thanks. You previous pr extended it to 1h10m.

ITERATION_MULTIPLIER_ENV: "6"
MAX_UNAVAILABLE_WORKER: "3"
MCO_CONF_DAY2_CUSTOM_MCP: '[{"mcp_name": "infra"}]'
PAUSED_MCP_NAME: worker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see only worker nodes paused, it seems not a control-plane-only update, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiajliu Yes, the job runs control-plane-only update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so both worker and infra mcp should be paused?

Copy link
Contributor Author

@qiliRedHat qiliRedHat Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiajliu I only paused worker mcp PAUSED_MCP_NAME: worker
For MCO_CONF_DAY2_CUSTOM_MCP: '[{"mcp_name": "infra"}]', it is to overwrite expected_mcp that only allows master and worker mcps by default https://github.com/openshift/release/blob/master/ci-operator/step-registry/cucushift/upgrade/cpou/pause-worker-mcp/cucushift-upgrade-cpou-pause-worker-mcp-commands.sh#L45-L47. This step 'check all actual mcp, if any of them unknown then break the job.'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only paused worker mcp PAUSED_MCP_NAME: worker

Hmm, for a control-plane-only update, all non-master mcp should be paused before upgrade. If your test requirement is to do cpou update, I guess both infra and worker are expected to be paused.

Copy link
Contributor Author

@qiliRedHat qiliRedHat Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiajliu From perfscale test team's point of view, either is ok for us. It will be good if I can get some guidance about what is recommended officially.
The current way I did (infra mcp not paused) is based on a pr 21175, and related Jira OTA-448: Add upgrade tests for a cluster with infra nodes. From the description of pr #21175, my understanding is the infra mcp is expected to be unpaused.
I will start a slack thread between the pr owner and us to clarify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiajliu
Copy link
Contributor

jiajliu commented Dec 6, 2024

please help to review the change to 'extend the timeout of cucushift-upgrade-cpou-unpause-worker-mcp from 1h10m to 2h to support 120 worker nodes'.

this part lgtm

@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-120nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@qiliRedHat
Copy link
Contributor Author

qiliRedHat commented Dec 9, 2024

In the new test job with infra mcp paused as well, the timeout 2h is not enough. I'll extend it to 2h30m.

Unpause the paused mcp...
machineconfigpool.machineconfiguration.openshift.io/infra patched
No resources found
Checking mcp infra, expected status True...
Checking infra pool status #0...
infra rendered-infra-0e86f002c0c08c002681211cb4b548de False True False 3 0 0 0 6h20m
Checking infra pool status #1...
infra rendered-infra-0e86f002c0c08c002681211cb4b548de False True False 3 1 1 0 6h25m
Checking infra pool status #2...
infra rendered-infra-0e86f002c0c08c002681211cb4b548de False True False 3 2 2 0 6h30m
{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2024-12-06T13:28:39Z"}
Checking infra pool status #3...
infra rendered-infra-4d03293c4ee7921f26dceb71f3501751 True False False 3 3 3 0 6h35m
infra pool status check passed
{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:264","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process gracefully exited before 10m0s grace period","severity":"error","time":"2024-12-06T13:28:48Z"}

extend the timeout of cucushift-upgrade-cpou-unpause-worker-mcp from 1h10m to 2h to support 120 worker nodes
@qiliRedHat
Copy link
Contributor Author

/pj-rehearse pull-ci-openshift-qe-ocp-qe-perfscale-ci-main-aws-4.18-nightly-x86-cpou-loaded-upgrade-from-4.16-loaded-upgrade-416to418-120nodes

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot
Copy link
Contributor

[REHEARSALNOTIFIER]
@qiliRedHat: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-aws-ovn-medium-cluster-density openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-aws-ovn-small-cluster-density openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-aws-ovn-medium-node-density-cni openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-aws-ovn-small-node-density-cni openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-aws-ovn-small-udn-density-l3 openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-master-qe-perfscale-aws-ovn-small-udn-density-l2 openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.20-qe-perfscale-aws-ovn-medium-cluster-density openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.20-qe-perfscale-aws-ovn-small-cluster-density openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.20-qe-perfscale-aws-ovn-medium-node-density-cni openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.20-qe-perfscale-aws-ovn-small-node-density-cni openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.20-qe-perfscale-aws-ovn-small-udn-density-l3 openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.20-qe-perfscale-aws-ovn-small-udn-density-l2 openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.19-qe-perfscale-aws-ovn-medium-cluster-density openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.19-qe-perfscale-aws-ovn-small-cluster-density openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.19-qe-perfscale-aws-ovn-medium-node-density-cni openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.19-qe-perfscale-aws-ovn-small-node-density-cni openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.19-qe-perfscale-aws-ovn-small-udn-density-l3 openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.19-qe-perfscale-aws-ovn-small-udn-density-l2 openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.18-qe-perfscale-aws-ovn-medium-cluster-density openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.18-qe-perfscale-aws-ovn-small-cluster-density openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.18-qe-perfscale-aws-ovn-medium-node-density-cni openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.18-qe-perfscale-aws-ovn-small-node-density-cni openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.18-qe-perfscale-aws-ovn-small-udn-density-l3 openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.18-qe-perfscale-aws-ovn-small-udn-density-l2 openshift/ovn-kubernetes presubmit Registry content changed
pull-ci-openshift-ovn-kubernetes-release-4.17-qe-perfscale-aws-ovn-medium-cluster-density openshift/ovn-kubernetes presubmit Registry content changed

A total of 529 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@qiliRedHat
Copy link
Contributor Author

@jiajliu I took your suggestion to pause infra mcp as well to make it a real 'cpou' upgrade. Added infra nodes, the unpause step needs more time to complete, I updated the timeout to 2h30m.
The new test job passed.
Please help to review again. TIA!

@qiliRedHat
Copy link
Contributor Author

/pj-rehearse ack

@openshift-ci-robot
Copy link
Contributor

@qiliRedHat: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Dec 9, 2024
@jiajliu
Copy link
Contributor

jiajliu commented Dec 12, 2024

lgtm

@jiajliu
Copy link
Contributor

jiajliu commented Dec 12, 2024

/lgtm

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 12, 2024
@liqcui
Copy link
Contributor

liqcui commented Dec 12, 2024

/lgtm

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 12, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jiajliu, liqcui, qiliRedHat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 12, 2024

@qiliRedHat: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 234a28f into openshift:master Dec 12, 2024
yingzhanredhat pushed a commit to yingzhanredhat/release that referenced this pull request Dec 18, 2024
* add infra to cpou upgrade

extend the timeout of cucushift-upgrade-cpou-unpause-worker-mcp from 1h10m to 2h to support 120 worker nodes

* remove required-for-upgrade and pause infra mcp

* update the timeout to 2h30m for 120 workers and 3 infra nodes cpou upgrade
yingzhanredhat pushed a commit to yingzhanredhat/release that referenced this pull request Dec 24, 2024
* add infra to cpou upgrade

extend the timeout of cucushift-upgrade-cpou-unpause-worker-mcp from 1h10m to 2h to support 120 worker nodes

* remove required-for-upgrade and pause infra mcp

* update the timeout to 2h30m for 120 workers and 3 infra nodes cpou upgrade
krishvoor pushed a commit to krishvoor/release that referenced this pull request Jan 29, 2025
* add infra to cpou upgrade

extend the timeout of cucushift-upgrade-cpou-unpause-worker-mcp from 1h10m to 2h to support 120 worker nodes

* remove required-for-upgrade and pause infra mcp

* update the timeout to 2h30m for 120 workers and 3 infra nodes cpou upgrade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. rehearsals-ack Signifies that rehearsal jobs have been acknowledged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants