core-services/release-controller/_releases/release-ocp-4.9-ci: Cred-request freeze informer#24177
Conversation
20f3b35 to
6dfe51d
Compare
64a5210 to
bc478b9
Compare
|
Internal discussion with @dgoodwin and the technical release team ended up with "no new blockers at the moment, use an informer", so 64a5210407 -> bc478b9bb9 pivots to that. I've also added Slack notifications to ping the patch manager (in charge of monitoring released 4.y CI health) when these fail, similarly to #24387. |
bc478b9 to
488903e
Compare
There was a problem hiding this comment.
What is this subteam syntax? Will this clearly indicate to ping the hive team?
There was a problem hiding this comment.
This is pinging the patch manager. From the 488903ef81 commit message:
Syntax described in [1]. SMZ7PJ1L0 is @Patch-Manager.
...
[1]: https://api.slack.com/reference/surfaces/formatting#mentioning-groups
There was a problem hiding this comment.
My concern was redirecting the patch manager quickly to the hive team for help figuring out who did what. Do you think that would be a good idea or can we assume the patch manager can read the failure and know exactly who to talk to quickly?
There was a problem hiding this comment.
I think the patch manager should be able to figure this out, and I've pushed 488903ef81 -> 7e1759c, rebasing on master and printing some context to help with interpreting and acting on failures.
…equest freeze informer This will increase the odds that we notice the 9e91c0d (ci-operator/config/openshift/release: Add an oldest-supported-credential-request job, 2021-11-30, openshift#24126) periodic dying before shipping a patch release with an accidental credentials change. The technical release team wouldn't be watching this 4.9 job, because they're focused on 4.dev (currently 4.10). But they are being very strict about accepting new blocking jobs today, so I'm adding this as an informer, per [1]. [1]: https://docs.ci.openshift.org/docs/architecture/release-gating/#add-the-job-to-the-release-gating-suite-as-optional
…ze failures Syntax described in [1]. SMZ7PJ1L0 is @Patch-Manager. This should get eyeballs on failures by the patch-manager (who's only remaining job is monitoring released 4.y health), without making the job formally blocking (which the technical release team isn't on board with, see 058214e8ec, core-services/release-controller/_releases/release-ocp-4.9-ci: Cred-request freeze informer, 2021-12-01, openshift#24177). [1]: https://api.slack.com/reference/surfaces/formatting#mentioning-groups
…de suggested next steps on failure The folks responding to a failing job may not be familiar with its intended purpose. Give them an overview, and suggest some possible next-steps, so they can drive resolution themselves, and don't need to track down a step/job expert to interpret for them.
488903e to
7e1759c
Compare
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dgoodwin, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@wking: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
@wking: Updated the following 2 configmaps:
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…y-4.10: Add oldest-* jobs Pulling e5e2d16 (ci-operator/config/openshift/release: Add 4.9 nightly to 4.9.0 rollback tests, 2021-10-19, openshift#22854) and 9e91c0d (ci-operator/config/openshift/release: Add an oldest-supported-credential-request job, 2021-11-30, openshift#24126) forward into 4.10, now that we have our first feature candidate to pin them to [1]. We'll keep bumping the pinned version forward until we get to our first GA 4.10 release. The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, openshift#24177). [1]: openshift/cincinnati-graph-data#1360
…y-4.11: Add oldest-* jobs Pulling e5e2d16 (ci-operator/config/openshift/release: Add 4.9 nightly to 4.9.0 rollback tests, 2021-10-19, openshift#22854) and 9e91c0d (ci-operator/config/openshift/release: Add an oldest-supported-credential-request job, 2021-11-30, openshift#24126) forward into 4.11, now that we have our first feature candidate to pin them to [1]. We'll keep bumping the pinned version forward until we get to our first GA 4.11 release. This is the 4.11 equivalent of 4.10's 923db4f (ci-operator/config/openshift/release/openshift-release-master__nightly-4.10: Add oldest-* jobs, 2022-01-12, openshift#25213). The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, openshift#24177). I also dropped "the canary", which I'd been copy/pasting around, because this isn't the canary job. Without it, messages will be rendered like: @ patch-manager, test job periodic-ci-openshift-release-master-nightly-4.11-credentials-request-freeze failed, see https://prow.ci.openshift.org/... where the job name is sufficient context without attempting to echo some portion of it in the earlier string. [1]: openshift/cincinnati-graph-data#2001
…y-4.11: Add oldest-* jobs (#30899) Pulling e5e2d16 (ci-operator/config/openshift/release: Add 4.9 nightly to 4.9.0 rollback tests, 2021-10-19, #22854) and 9e91c0d (ci-operator/config/openshift/release: Add an oldest-supported-credential-request job, 2021-11-30, #24126) forward into 4.11, now that we have our first feature candidate to pin them to [1]. We'll keep bumping the pinned version forward until we get to our first GA 4.11 release. This is the 4.11 equivalent of 4.10's 923db4f (ci-operator/config/openshift/release/openshift-release-master__nightly-4.10: Add oldest-* jobs, 2022-01-12, #25213). The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, #24177). I also dropped "the canary", which I'd been copy/pasting around, because this isn't the canary job. Without it, messages will be rendered like: @ patch-manager, test job periodic-ci-openshift-release-master-nightly-4.11-credentials-request-freeze failed, see https://prow.ci.openshift.org/... where the job name is sufficient context without attempting to echo some portion of it in the earlier string. [1]: openshift/cincinnati-graph-data#2001
…y-4.12: Add oldest-* jobs Like b7da7de (ci-operator/config/openshift/release/openshift-release-master__nightly-4.11: Add oldest-* jobs, 2022-07-29, openshift#30899), but for 4.12. I guess we could have done this back with ec.0, but it's probably good to wait until these later engineering candidates when the bigger changes have likely already landed. We could have waited until early release candidates, but I don't want to forget ;). The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, openshift#24177). Also, I seem to have neglected to actually add the reporter_config block in 4.11, despite claiming I'd added it in the commit message :/. Luckily, no changes have slipped in yet, and I'm catching up for that mistake now.
…y-4.12: Add oldest-* jobs (#33134) Like b7da7de (ci-operator/config/openshift/release/openshift-release-master__nightly-4.11: Add oldest-* jobs, 2022-07-29, #30899), but for 4.12. I guess we could have done this back with ec.0, but it's probably good to wait until these later engineering candidates when the bigger changes have likely already landed. We could have waited until early release candidates, but I don't want to forget ;). The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, #24177). Also, I seem to have neglected to actually add the reporter_config block in 4.11, despite claiming I'd added it in the commit message :/. Luckily, no changes have slipped in yet, and I'm catching up for that mistake now.
…y-4.13: Add oldest-* jobs Like f1e912d (ci-operator/config/openshift/release/openshift-release-master__nightly-4.12: Add oldest-* jobs, 2022-11-29, openshift#33134), but for 4.13. I guess we could have done this back with ec.0, but it's probably good to wait until later engineering candidates, or in this case, later release candidates, when the bigger changes have likely already landed. The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, openshift#24177).
…y-4.13: Add oldest-* jobs (#38728) Like f1e912d (ci-operator/config/openshift/release/openshift-release-master__nightly-4.12: Add oldest-* jobs, 2022-11-29, #33134), but for 4.13. I guess we could have done this back with ec.0, but it's probably good to wait until later engineering candidates, or in this case, later release candidates, when the bigger changes have likely already landed. The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, #24177).
…y-4.13: Add oldest-* jobs (openshift#38728) Like f1e912d (ci-operator/config/openshift/release/openshift-release-master__nightly-4.12: Add oldest-* jobs, 2022-11-29, openshift#33134), but for 4.13. I guess we could have done this back with ec.0, but it's probably good to wait until later engineering candidates, or in this case, later release candidates, when the bigger changes have likely already landed. The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, openshift#24177).
4.14, becasue we want to freeze these through the life of 4.14, following the existing pattern, most recently d666767 (ci-operator/config/openshift/release/openshift-release-master__nightly-4.13: Add oldest-* jobs, 2023-04-26, openshift#38728). I'm not pulling in the rollback job this time, because that's moving under QE and is in flight separately in [1]. I'm also adding a 4.15 cred-freeze job this time, to catch up with dbcbb85 (add explanation of blocking jobs in master before service streams, 2023-09-18, openshift#43418). As I pointed out in d666767, I'm still concerned about the amount of churn that I expect will land during the engineering candidate, but I'm not on the release-oversight team, and if they perfer having a blocker job in the development branch with occasional pin bumps, that's fine with me. The bulk of the ci-operator/jobs content is from: $ make jobs But then I manually edited to inject reporter_config, as described in 08db24d (ci-operator/jobs/openshift/release: Ping @Patch-Manager for cred-freeze failures, 2021-12-08, openshift#24177). [1]: openshift#43401
I'm skipping the cooking and optional phases[edit: now starting with the optional/informer phase], because this job should be very reliable, and reverts are cheap if I'm wrong.This will increase the odds that we notice the #24126 periodic dying before shipping a patch release with an accidental credentials change.