Skip to content

MCO-1652: Add MCO disruptive suite#29776

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
yuqi-zhang:add-mco-disruptive-suite
Jun 9, 2025
Merged

MCO-1652: Add MCO disruptive suite#29776
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
yuqi-zhang:add-mco-disruptive-suite

Conversation

@yuqi-zhang
Copy link
Copy Markdown
Contributor

Many MCO tests require node disruption, so it was determined that these would best live as a separate suite.

Create the MCO suite and move existing MCO tests in origin to it. The next goal is to add On Cluster Layering tests to this as well, potentially via OTE.

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 9, 2025

@yuqi-zhang: This pull request references MCO-1652 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set.

Details

In response to this:

Many MCO tests require node disruption, so it was determined that these would best live as a separate suite.

Create the MCO suite and move existing MCO tests in origin to it. The next goal is to add On Cluster Layering tests to this as well, potentially via OTE.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 9, 2025
@openshift-ci openshift-ci Bot requested review from p0lyn0mial and umohnani8 May 9, 2025 20:17
@yuqi-zhang
Copy link
Copy Markdown
Contributor Author

/hold

Not 100% decided on whether we'd like to move existing tests here yet, or we should just set this up for layering

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 9, 2025
@yuqi-zhang yuqi-zhang force-pushed the add-mco-disruptive-suite branch from ebeaa62 to f67ce4d Compare May 9, 2025 20:47
@yuqi-zhang yuqi-zhang force-pushed the add-mco-disruptive-suite branch from f67ce4d to edf191f Compare May 12, 2025 13:53
@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented May 12, 2025

Job Failure Risk Analysis for sha: edf191f

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-disruptive Medium
[bz-Etcd] clusteroperator/etcd should not change condition/Available
Potential external regression detected for High Risk Test analysis
---
[bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
This test has passed 96.30% of 5760 runs on release 4.20 [Overall] in the last week.
---
[sig-node] static pods should start after being created
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-gcp-fips-serial IncompleteTests
Tests for this run (106) are below the historical average (1705): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

)

var _ = g.Describe("[sig-mco][OCPFeatureGate:MachineConfigNodes]", func() {
var _ = g.Describe("[Suite:openshift/machine-config-operator/disruptive][sig-mco][OCPFeatureGate:MachineConfigNodes]", func() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as we can tell, this would cause the test to only run in the new MCO suite, and not for serial runs anymore.

We're depending on this for signal, so while we're transitioning between the tests, would it be possible to run the test on both Serial and the new MCO suite? Would we have to define this test twice with different labels?

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2025
@yuqi-zhang yuqi-zhang force-pushed the add-mco-disruptive-suite branch from edf191f to 41dc8ba Compare May 15, 2025 01:13
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 15, 2025
@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented May 15, 2025

Job Failure Risk Analysis for sha: 41dc8ba

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive High
[sig-arch][Late] operators should not create watch channels very often
This test has passed 99.73% of 5277 runs on release 4.20 [Overall] in the last week.

Open Bugs
operators should not create watch channels very often regression
operators should not create watch channels very often regression
ResilientWatchCacheInitialization (Re)enablement - operator watch counts from component readiness
---
[sig-node] node-lifecycle detects unexpected not ready node
This test has passed 99.80% of 5501 runs on release 4.20 [Overall] in the last week.

Open Bugs
node-lifecycle detects unexpected not ready node failing on azure serial and upgrade jobs
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Low
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-disruptive Medium
[sig-node] static pods should start after being created
Potential external regression detected for High Risk Test analysis
---
[bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
This test has passed 96.18% of 5501 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
[bz-kube-storage-version-migrator] clusteroperator/kube-storage-version-migrator should not change condition/Available
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented May 16, 2025

Job Failure Risk Analysis for sha: 8cefbf9

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-disruptive Medium
[sig-node] static pods should start after being created
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Medium
[sig-api-machinery] disruption/oauth-api apiserver/oauth-apiserver connection/new should be available throughout the test
Potential external regression detected for High Risk Test analysis
---
[sig-api-machinery] disruption/openshift-api apiserver/openshift-apiserver connection/new should be available throughout the test
Potential external regression detected for High Risk Test analysis
---
[sig-api-machinery] disruption/kube-api apiserver/kube-apiserver connection/new should be available throughout the test
Potential external regression detected for High Risk Test analysis
---
[sig-api-machinery] disruption/cache-openshift-api apiserver/openshift-apiserver connection/new should be available throughout the test
Potential external regression detected for High Risk Test analysis

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 8cefbf9

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-gcp-fips-serial Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: 8cefbf9

  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

@yuqi-zhang yuqi-zhang force-pushed the add-mco-disruptive-suite branch from 8cefbf9 to b625c64 Compare May 16, 2025 15:45
@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented May 16, 2025

Job Failure Risk Analysis for sha: b625c64

Job Name Failure Risk
pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback MissingData
pull-ci-openshift-origin-main-e2e-aws-disruptive IncompleteTests
Tests for this run (104) are below the historical average (1440): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:azure SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: b625c64

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-gcp-fips-serial Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: b625c64

  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

Comment on lines +84 to +92
g.It("[Suite:openshift/conformance/serial][Serial][Slow]Should properly report MCN conditions on node degrade [apigroup:machineconfiguration.openshift.io]", func() {
if IsSingleNode(oc) { //handle SNO clusters
ValidateMCNConditionOnNodeDegrade(oc, invalidMasterMCFixture, true)
} else { //handle standard, non-SNO, clusters
ValidateMCNConditionOnNodeDegrade(oc, invalidWorkerMCFixture, false)
}
})

g.It("[Serial][Slow]Should properly create and remove MCN on node creation and deletion [apigroup:machineconfiguration.openshift.io]", func() {
g.It("[Suite:openshift/conformance/serial][Serial][Slow]Should properly create and remove MCN on node creation and deletion [apigroup:machineconfiguration.openshift.io]", func() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two tests don't currently run in the default serial suite (or any default suite) since they are labeled with slow, as can be seen here. So if it's possible to run some in the new suite and not others, these would be good candidates since they are not currently being used for our component readiness signal.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, thanks! Should I remove the serial suite label entirely then? Theoretically it shouldn't make a difference? Or would this actually cause it to run in serial instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove the serial suite label entirely then? Theoretically it shouldn't make a difference?

I don't think the serial suite label would do anything here, but I'm not 100% sure. We can maybe do a rehearsal payload job and see if this test shows up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried building locally and using --dry-run to see what tests are included with which suites?

I feel like if you don't expect it to be included in serial suite due to the [slow] annotation (which I believe is correct) you shouldn't add [Suite:openshift/conformance/serial]. Adding extra annotation now will just make it harder to rename later if you go back and cleanup and want to matchup the old tests with the new names.

Also I'm wondering if you need both [Suite:openshift/conformance/serial] and [Serial] or [Serial] alone will do it.

It's an open question since you are adding [Suite:openshift/machine-config-operator/disruptive] but you can test locally with something like

./openshift-tests run "openshift/conformance/serial" -o "${ARTIFACT_DIR}/e2e.log" --junit-dir "${ARTIFACT_DIR/junit}" --dry-run

and
./openshift-tests run "openshift/machine-config-operator/disruptive" -o "${ARTIFACT_DIR}/e2e.log" --junit-dir "${ARTIFACT_DIR/junit}" --dry-run

to see what tests would run. You can skip the --dry-run as well and actually run the tests if you like. Unfortunately you do need a cluster even for the --dry-run

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I'm wondering if you need both [Suite:openshift/conformance/serial] and [Serial] or [Serial] alone will do it.

I've tried with the existing tests via e2e-aws-ovn-serial-1of2/e2e-aws-ovn-serial-2of2 below, and [Serial] alone does not make the tests show up, whereas adding the full suite does (this is with the bootimage tests we have already, all the other tests are techpreview only so far)

And yes, let me remove the serial suite tag from the slow test.

@isabella-janssen
Copy link
Copy Markdown
Member

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2025

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/7ab6ea20-35a1-11f0-8eda-2866fda3ec57-0

@yuqi-zhang yuqi-zhang force-pushed the add-mco-disruptive-suite branch from b625c64 to e10d904 Compare May 21, 2025 00:42
@yuqi-zhang yuqi-zhang force-pushed the add-mco-disruptive-suite branch from e10d904 to c2bcd59 Compare May 28, 2025 01:25
@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented May 28, 2025

Job Failure Risk Analysis for sha: c2bcd59

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive Medium
[bz-Etcd] clusteroperator/etcd should not change condition/Available
Potential external regression detected for High Risk Test analysis
---
[sig-node] static pods should start after being created
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: c2bcd59

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2 Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: c2bcd59

  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImagesAWS][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" [Total: 2, Pass: 2, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

@yuqi-zhang
Copy link
Copy Markdown
Contributor Author

/hold cancel

Let's try to add these tests for the MCO suite, it should be fine since they will still run via serial/parallel so we shouldn't lose any signal.

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 5, 2025
@isabella-janssen
Copy link
Copy Markdown
Member

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

Running these payloads to confirm that the same MCO tests continue running in the same test suites as before.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2025

@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/323a05b0-4222-11f0-8517-40a4de105581-0

@yuqi-zhang
Copy link
Copy Markdown
Contributor Author

Looks like the list of tests is correct although the run didn't pass

@isabella-janssen
Copy link
Copy Markdown
Member

Looks like the list of tests is correct although the run didn't pass

I am only seeing one of the MCN parallel tests in the payload rehearsal (only Should have MCN properties matching associated node properties for nodes in default MCPs). Can you try adding the [Suite:openshift/conformance/parallel] tag to the three other MCN parallel tests, please @yuqi-zhang?

Otherwise, the test failure does not look related to this work, so hopefully that was just a bad run.

Many MCO tests require node disruption, so it was determined that these
would best live as a separate suite.

Create the MCO suite and move existing MCO tests in origin to it. The
next goal is to add On Cluster Layering tests to this as well,
potentially via OTE.
@yuqi-zhang yuqi-zhang force-pushed the add-mco-disruptive-suite branch from c2bcd59 to c4a38fc Compare June 6, 2025 13:03
@yuqi-zhang
Copy link
Copy Markdown
Contributor Author

Ack, sorry, missed those for some reason, should be fixed now, thanks!

@yuqi-zhang yuqi-zhang force-pushed the add-mco-disruptive-suite branch from c4a38fc to 5c9e36d Compare June 6, 2025 13:05
@isabella-janssen
Copy link
Copy Markdown
Member

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

Running a rehearsal with the MCN parallel tests tagged.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 6, 2025

@isabella-janssen: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-ovn-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f74bf430-42d6-11f0-8593-bfa7613291a3-0

@isabella-janssen
Copy link
Copy Markdown
Member

/lgtm

Looks good to me from the MCO side of things! All tests seem to be running in the same suites as previously.

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 6, 2025
@neisw
Copy link
Copy Markdown
Contributor

neisw commented Jun 6, 2025

/approve
/retest-required

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 6, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: isabella-janssen, neisw, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2025
@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 5b76b54 and 2 for PR HEAD 5c9e36d in total

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 869f237 and 1 for PR HEAD 5c9e36d in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 7, 2025

@yuqi-zhang: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-serial-publicnet b625c64 link false /test e2e-aws-ovn-serial-publicnet
ci/prow/e2e-gcp-fips-serial b625c64 link false /test e2e-gcp-fips-serial
ci/prow/e2e-metal-ipi-serial b625c64 link false /test e2e-metal-ipi-serial
ci/prow/e2e-aws-ovn-single-node-serial 5c9e36d link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-gcp-csi 5c9e36d link false /test e2e-gcp-csi
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 5c9e36d link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-vsphere-ovn-etcd-scaling 5c9e36d link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-azure-ovn-etcd-scaling 5c9e36d link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-serial-publicnet-1of2 5c9e36d link false /test e2e-aws-ovn-serial-publicnet-1of2
ci/prow/e2e-gcp-ovn-etcd-scaling 5c9e36d link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 5c9e36d link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
ci/prow/e2e-gcp-fips-serial-1of2 5c9e36d link false /test e2e-gcp-fips-serial-1of2
ci/prow/e2e-openstack-ovn 5c9e36d link false /test e2e-openstack-ovn
ci/prow/e2e-azure-ovn-upgrade 5c9e36d link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-aws-ovn-etcd-scaling 5c9e36d link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-aws-ovn-single-node-upgrade 5c9e36d link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-disruptive 5c9e36d link false /test e2e-aws-disruptive
ci/prow/e2e-gcp-fips-serial-2of2 5c9e36d link false /test e2e-gcp-fips-serial-2of2
ci/prow/okd-e2e-gcp 5c9e36d link false /test okd-e2e-gcp
ci/prow/e2e-aws-ovn-single-node 5c9e36d link false /test e2e-aws-ovn-single-node
ci/prow/e2e-gcp-disruptive 5c9e36d link false /test e2e-gcp-disruptive

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt
Copy link
Copy Markdown

openshift-trt Bot commented Jun 7, 2025

Job Failure Risk Analysis for sha: 5c9e36d

Job Name Failure Risk
pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback IncompleteTests
Tests for this run (13) are below the historical average (205): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-aws-disruptive Medium
[bz-Etcd] clusteroperator/etcd should not change condition/Available
Potential external regression detected for High Risk Test analysis
---
[sig-node] static pods should start after being created
This test has passed 97.32% of 4329 runs on release 4.20 [Overall] in the last week.

Open Bugs
[sig-node] static pods should start after being created
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 50.00% of 2 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling Medium
[bz-etcd][invariant] alert/etcdMembersDown should not be at or above info
Potential external regression detected for High Risk Test analysis
---
[bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
This test has passed 97.74% of 4329 runs on release 4.20 [Overall] in the last week.

Open Bugs
CI: API is broken in periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-single-node-techpreview-serial

Risk analysis has seen new tests most likely introduced by this PR.
Please ensure that new tests meet guidelines for naming and stability.

New Test Risks for sha: 5c9e36d

Job Name New Test Risk
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2 Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2 Medium - "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" is a new test, and was only seen in one job.
pull-ci-openshift-origin-main-e2e-openstack-serial Medium - "[sig-installer][Suite:openshift/openstack] The OpenShift cluster should allow the manual setting of enable_topology to false [Serial]" is a new test, and was only seen in one job.

New tests seen in this PR at sha: 5c9e36d

  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should not update boot images on any MachineSet when not configured [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should stamp coreos-bootimages configmap with current MCO hash and release version [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images on all MachineSets when configured [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[Suite:openshift/machine-config-operator/disruptive][Suite:openshift/conformance/serial][sig-mco][OCPFeatureGate:ManagedBootImages][Serial] Should update boot images only on MachineSets that are opted in [apigroup:machineconfiguration.openshift.io]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]
  • "[sig-installer][Suite:openshift/openstack] The OpenShift cluster should allow the manual setting of enable_topology to false [Serial]" [Total: 1, Pass: 1, Fail: 0, Flake: 0]

@openshift-ci-robot
Copy link
Copy Markdown

/retest-required

Remaining retests: 0 against base HEAD 869f237 and 2 for PR HEAD 5c9e36d in total

@openshift-merge-bot openshift-merge-bot Bot merged commit 1cef4af into openshift:main Jun 9, 2025
41 of 59 checks passed
@openshift-bot
Copy link
Copy Markdown
Contributor

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-tests
This PR has been included in build openshift-enterprise-tests-container-v4.20.0-202506091515.p0.g1cef4af.assembly.stream.el9.
All builds following this will include this PR.

@yuqi-zhang
Copy link
Copy Markdown
Contributor Author

/cherry-pick release-4.19

@openshift-cherrypick-robot
Copy link
Copy Markdown

@yuqi-zhang: #29776 failed to apply on top of branch "release-4.19":

Applying: Add MCO disruptive suite
Using index info to reconstruct a base tree...
M	test/extended/machine_config/machine_config_node.go
M	test/extended/machine_config/pinnedimages.go
M	test/extended/util/annotate/generated/zz_generated.annotations.go
M	zz_generated.manifests/test-reporting.yaml
Falling back to patching base and 3-way merge...
Auto-merging zz_generated.manifests/test-reporting.yaml
Auto-merging test/extended/util/annotate/generated/zz_generated.annotations.go
CONFLICT (content): Merge conflict in test/extended/util/annotate/generated/zz_generated.annotations.go
Auto-merging test/extended/machine_config/pinnedimages.go
Auto-merging test/extended/machine_config/machine_config_node.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Add MCO disruptive suite

Details

In response to this:

/cherry-pick release-4.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants