Skip to content

machine-config-operator: Drop skip_if_only_changed for all optional: true jobs#27015

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
cgwalters:mco-trim-ci
Mar 15, 2022
Merged

machine-config-operator: Drop skip_if_only_changed for all optional: true jobs#27015
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
cgwalters:mco-trim-ci

Conversation

@cgwalters
Copy link
Copy Markdown
Member

The machine-config-operator is a key component of OpenShift 4 that
touches on many things, from OKD to single node to bare metal magic
to networking. We even used to handle etcd.

We have accumulated a giant pile of jobs due to this. But
not every job needs to run on every PR - it's at best a waste of money,
and at worst a huge distraction because every pull request is accompanied
by a giant pile of red "X"'s by default.

Now, many of these jobs are actually optional: true - their failure
will not block merges.

And in theory, they shouldn't run by default; but many of them got
a skip_if_only_changed annotation to avoid them only docs changed,
but that annotation isn't implemented in the way one would expect.
It appears to be a simple inverse of run_if_changed. Which
means that all these optional jobs are still triggered for not-documentation
PRs.

So by just dropping this, we should return to not running these jobs
by default. People touching things that might affect bare metal
specifically will need to e.g. invoke /test e2e-metal-ipi explicitly
now, and similar for the other jobs.

…l: true` jobs

The machine-config-operator is a key component of OpenShift 4 that
touches on many things, from OKD to single node to bare metal magic
to networking.  We even used to handle etcd.

We have accumulated a giant pile of jobs due to this.  But
not every job needs to run on every PR - it's at best a waste of money,
and at worst a huge distraction because every pull request is accompanied
by a giant pile of red "X"'s by default.

Now, many of these jobs are actually `optional: true` - their failure
will not block merges.

And in theory, they shouldn't run by default; but many of them got
a `skip_if_only_changed` annotation to avoid them only docs changed,
but that annotation isn't implemented in the way one would expect.
It appears to be a simple *inverse* of `run_if_changed`.  Which
means that all these optional jobs are still triggered for not-documentation
PRs.

So by just dropping this, we should return to not running these jobs
by default.  People touching things that might affect bare metal
specifically will need to e.g. invoke `/test e2e-metal-ipi` explicitly
now, and similar for the other jobs.
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 15, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 15, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 15, 2022
@cgwalters
Copy link
Copy Markdown
Member Author

Or to say it another, simpler way: The goal is to get all these jobs that are optional: true and always_run: false to only be run if explicitly triggered via e.g. /test e2e-metal-ipi and that's it.

(Alternatively in the future, we could add explicit run_if_changed: pkg/operator/metal.go type stuff)

@cgwalters cgwalters marked this pull request as ready for review March 15, 2022 21:29
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 15, 2022
@cgwalters
Copy link
Copy Markdown
Member Author

OK after a lot of private chat, I think this is probably good to go. Though, for some reason make jobs wasn't propagating the skip_if_changed changes (I guess because that is supported as manual configuration?) So I did that manually.

Copying a tidbit from myself for posterity:

Without digging a lot through the history I think what happened is someone was trying to ensure we consistently had skip_if_only_changed for our always_run: true jobs, but didn't realize that adding it everywhere would have this unintended effect for the always_run: false jobs

@openshift-ci openshift-ci Bot requested a review from jkyros March 15, 2022 21:30
@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

ooooh so The goal is to get all these jobs that are optional: true and always_run: false to only be run if explicitly triggered via e.g. /test e2e-metal-ipi and that's it. we had optional:true always_run:false jobs still runnning all the time?

@cgwalters
Copy link
Copy Markdown
Member Author

cgwalters commented Mar 15, 2022

we had optional:true always_run:false jobs still runnning all the time?

Exactly, that's my understanding. I studied openshift/machine-config-operator#3006 as an example - this is a recent PR, targeting git master. No jobs were explicitly invoked or skipped. IOW, this is a demo of what happens on default PRs today. Notice that ci/prow/e2e-vsphere-upgrade ran by default (along with a big pile of other jobs). But, e2e-vsphere didn't! And spot the difference:

- as: e2e-vsphere
  optional: true
  steps:
    cluster_profile: vsphere
    workflow: openshift-e2e-vsphere
- as: e2e-vsphere-upgrade
  optional: true
  skip_if_only_changed: ^docs/|\.md$|^(?:.*/)?(?:\.gitignore|OWNERS|OWNERS_ALIASES|LICENSE)$
  steps:
    cluster_profile: vsphere
    workflow: openshift-upgrade-vsphere

So this will drop out tons of things, notably permafail things like e2e-aws-disruptive.

@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

thanks for walking me through that - got it! 👍

let's give this a whirl!

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 15, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 15, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, kikisdeliveryservice

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit a29385e into openshift:master Mar 15, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 15, 2022

@cgwalters: Updated the following 2 configmaps:

  • job-config-master-presubmits configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-machine-config-operator-master-presubmits.yaml using file ci-operator/jobs/openshift/machine-config-operator/openshift-machine-config-operator-master-presubmits.yaml
  • ci-operator-master-configs configmap in namespace ci at cluster app.ci using the following files:
    • key openshift-machine-config-operator-master.yaml using file ci-operator/config/openshift/machine-config-operator/openshift-machine-config-operator-master.yaml
Details

In response to this:

The machine-config-operator is a key component of OpenShift 4 that
touches on many things, from OKD to single node to bare metal magic
to networking. We even used to handle etcd.

We have accumulated a giant pile of jobs due to this. But
not every job needs to run on every PR - it's at best a waste of money,
and at worst a huge distraction because every pull request is accompanied
by a giant pile of red "X"'s by default.

Now, many of these jobs are actually optional: true - their failure
will not block merges.

And in theory, they shouldn't run by default; but many of them got
a skip_if_only_changed annotation to avoid them only docs changed,
but that annotation isn't implemented in the way one would expect.
It appears to be a simple inverse of run_if_changed. Which
means that all these optional jobs are still triggered for not-documentation
PRs.

So by just dropping this, we should return to not running these jobs
by default. People touching things that might affect bare metal
specifically will need to e.g. invoke /test e2e-metal-ipi explicitly
now, and similar for the other jobs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 15, 2022

@cgwalters: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/openshift/machine-config-operator/master/e2e-aws-serial 197a380 link unknown /test pj-rehearse

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Mar 15, 2022
Right now Fedora doesn't ship Go 1.17, only Go 1.18beta.  That
version emits a different error message for incompatible TLS
versions.  Adjust our unit test to handle both.

(Also, a motivation for me is to cross-check the new CI configuration
 after openshift/release#27015 )
@cgwalters
Copy link
Copy Markdown
Member Author

OK, a note to ourselves and other people who are submitting patches to the MCO:

We now need to police ourselves a bit more and ensure that if a change to the code may affect e.g. bare metal, we'll need to remember to drop a /test e2e-metal-ipi and the like.

A future improvement is to add run_if_changed regexps for these contexts where we can.

@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

Agree! Most of those teams are owners of their changes and dirs and are very good about manually running tests before approving / merging so we're ahead of the game on that. 😺

cgwalters added a commit to cgwalters/release that referenced this pull request Mar 16, 2022
This is like openshift#27015
but for the layering branch; just `e2e-gcp-op-single-node`
was incorrect here though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants