Skip to content

EUS Upgrades MVP#585

Merged
openshift-merge-robot merged 2 commits into
openshift:masterfrom
sdodson:eus-mvp
Feb 8, 2021
Merged

EUS Upgrades MVP#585
openshift-merge-robot merged 2 commits into
openshift:masterfrom
sdodson:eus-mvp

Conversation

@sdodson
Copy link
Copy Markdown
Member

@sdodson sdodson commented Jan 14, 2021

An enhancement that focuses on general upgrade safety, speed, and
a minimal amount of EUS to EUS specific work.

TODO:

  • Find a board / team to take on the disconnected upgrade planning tools

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated

We have pre-existing mechanisms to alert admins to their use of APIs which are
slated for removal in a future release. Currently those mechanisms do not look
forward far enough to span 4.6 to 4.10, we will need to add the ability to
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experience question: do you envision that we know the intention of the user to move from EUS to EUS or simply that we provide a way to see forward looking information if the admin wants to see it?

More concretely, just as an example, is the vision:

  1. showing the API removals, operator versioning (below), timing (below), or other safeguards for the current, explicitly known step prominently in the existing experience with a link to "Upgrading to the next EUS?" that provides all the info we have
  2. that we know more explicitly they've picked a future destination (EUS or not) and we need an experience that customizes to that choice?

or is it a phase 1 then 2? :-)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we want alerting to be different based on EUS vs not we need to have some mechanism to know that they're EUS. Right now I'd planned on that being a requirement that they upgrade into a 4.6 version that falls after the EUS transition and we'd only backport these alerting rules to those versions.

Requiring them to upgrade deep into 4.6.z may be a tough sell but I think if we move on to eventually removing reboots it's a decent exchange because it allows us to assume a higher baseline of maturity because they've got at least all of the quality improvements that happened during the first year of 4.6. This requirement also allows us to limit the upgrade graph and thus the versions we must test forcing them deep into 4.7, 4.8, and 4.9. Only 4.10 will be relatively new at that point and they'd also have the option of waiting another 3-6 months before completing the 4.9 and 4.10 upgrades if they wish.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring them to upgrade deep into 4.6.z may be a tough sell...

For folks running from 4.6.{shortly-after-EUS-but-before-some-forward-looking-warning}, there's really not all that much difference between the first hop taking them to 4.6.{huge} or 4.7.{huge} and picking up the warning then, if we decide that 4.6.* -> 4.7.{huge} seems safe enough to add an update edge. So this just comes down to our usual "how many update edges are we willing to stretch tests over?", which is a new-release-build-time decision, and doesn't need any enhancement-time decisions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we know the risk is low it's still a cluster roll and a lot of customers think "EUS" means "I only have to reboot or upgrade once every 18 months."

Based on the call with API Server team it sounds like we're going to pursue only Info level alerts whenever we know of an API removal happening in the next four minor releases. We'll also pursue Upgradeable=False whenever the removal is in the next minor, so an API removed in 4.10 would get Info alerts in 4.6, 4.7, 4.8, 4.9 and Upgradeable=False in 4.9, that way the admin knows while running 4.6 but isn't blocked from forward progress until something would actually break when they take that step.

That means it becomes safe to backport all of the logic related to these alerts to 4.6 regardless of whether or not it's an EUS cluster, therefore we can set a deadline of having this backported to 4.6 when 4.9 goes GA which will be at least 12 weeks before potential 4.10 upgrades. I believe we can reasonably assert that they must be no more than 12 weeks old at the time we start providing EUS upgrades.

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
we will require anyone upgrading along an EUS specific upgrade path to update to
a 4.6 version which is specific to EUS. This is necessary because only those who
follow the EUS upgrade path have 4.10 as their next logical target version. We
should not alert non EUS clusters of removals in 4.10 until they've upgraded to
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is the intention driven experience I was asking about above. It isn't that they've picked 4.10 (they still have to pick 4.7 as the next target) it's that they've upgraded to this 4.6 EUS version that has all the notification items backported to it?

In that case would there be an EUS 4.7, 4.8, and 4.9 or would the backported notifications show for people not on the EUS path too? (or the alternative above that only shows {next releases notifications} with expansion to {next EUS notifications}

Copy link
Copy Markdown
Member Author

@sdodson sdodson Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the intent signal.

The backporting to 4.7, 4.8, and 4.9 is a good point to raise. I hadn't thought about that, I would assume we leave alerting rules in those versions whatever is appropriate for non-EUS upgrades. We know that 4.6 EUS upgrades will be upgrading through those versions but if we raise the red flags in 4.6 EUS versions and they start the upgrade anyway I feel like they were given enough warning.

I'm also envisioning an escalation of alerts based on when the API is removed. APIs removed in N+1 are warning, N+2 or greater are info level only. When running an EUS specific version N+1 should be equivalent to the alert set of your eventual target minus one, ie: 4.9 in the case that 4.6 and 4.10 are the two EUS versions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now that we know we'll only use Info level alerts for all removals between 4.6 and 4.10 and then Upgradeable=False for removals between 4.N and 4.N+1 we no longer require this intent signal. It's safe to tell all 4.6 customers about removals between 4.6 and 4.10 and it's safe/required to inhibit upgrades from 4.6 to 4.7 when APIs removed in 4.7 are in use.


#### As an admin preparing for 4.6 to 4.10 EUS upgrade I'm alerted to API removals

We have pre-existing mechanisms to alert admins to their use of APIs which are
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming this is the notification and alert section.

For UX related teams: I think we need to clarify the experience on if this would be incorporated to part of the upgrade visualization (in addition to alerting) and if so how would that API work?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful if the CVO scraped up alerts of this nature and raised them and then the console could only display alerts firing from CVO on the upgrades panel.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to consider using upgradeable=false if we detect usage of an API that is removed in next version?

iow.:

4.6
4.7
4.8: API server go "upgradeable=false" if there are cronjobs in the cluster
4.9: CronJobs are gone, but cluster has still 1000 cron jobs created
4.10:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Info alerts are planned for 4.7, showing up in the console notification side-bar. We will have to backport to 4.6.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sdodson interesting idea from @sferich888 today: if the APIs are deprecated in what we generate our API docs from can we use that info to also generate big, scary release notes?

Another suggestion for collaboration: @jwmatthews and the container migration teams may have thoughts on how we can help admins work through deprecations and other ways to make sure they're front and center during our upgrade process.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil- FYI, we have some beginning validation work in context of Migrations from MTC to warn a user when they will face a problem that a GVK they relied on in the source cluster isn't present in the target cluster (https://github.com/konveyor/mig-controller/blob/master/pkg/gvk/gvk.go#L66) This is a problem space we are interested in, we expect to see it with future migrations, open to collaborating. (cc @djwhatle @eriknelson @shawn-hurley @jortel )

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s also has https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1693-warnings which kubectl & oc consume informing users that they are using deprecating APIs. The question is if/how we could also consume that elsewhere.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pweil- @sferich888 Yeah I definitely agree that we'll want to have a release notes if not an upgrade section of docs that's unique to 4.6 to 4.10 upgrades. I'll add an epic place holder to ensure that as part of 4.10 dev cycle we pull this together.

I'll find a slot next week to sync up with @sanchezl @deads2k and you, @jwmatthews.

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
- While I believe all of this work to be prerequisite to taking on work to skip
upgrade steps that may not be widely agreed upon.

- Do we need to consider CCO for similar treatment that we're providing for API
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgoodwin @joelddiaz Your thoughts here please

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we visit each minor release between 4.6->4.10, I think we're okay. We just need to make sure the minor release visited is a release where CCO knows about the next minor release requirements. So for example: you can't go from 4.6.10 (which can alert about upgradeability to 4.7) to 4.7.2 (which doesn't know about upgradeability to 4.8 b/c 4.7.2 was too early to know about what was coming in 4.8).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd always be respecting minimum version requirements, we'll be upgrading along very late 4.6.z, 4.7.z, and 4.8.z. While 4.9.z will be relatively young it must meet the minimum version requirements to upgrade to 4.10.

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated

### User Stories

#### As an admin preparing for 4.6 to 4.10 EUS upgrade I'm alerted to API removals
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k @sttts PTAL at this section as well as the enhancement in general.

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
backport that API metadata to 4.6 EUS versions once we have a firm understanding
of all removals between 4.6 and 4.10.

#### As an admin I cannot violate OpenShift's defined Kubelet to API version skew
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@darkmuggle PTAL at this proposed epic as well as the enhancement in full

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
@sdodson
Copy link
Copy Markdown
Member Author

sdodson commented Jan 15, 2021

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
node_count_per_failure_domain) percentage of pods being rescheduled more than
once during a rolling restart of a given failure domain.

We should make the MCO topology aware so that it upgrades all of the host in one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCO should not be topology-aware, IMHO. Topology is very much platform-specific. Really, what is being asked is that the MCO would have to infer sub-pools based on cloud regions. The understanding of cloud-fault domains is well above the level of the MCO. The MCO would need a platform-agnostic means of enumerating the fault-domain and upgrade priorities.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no longer certain this is necessary but I'm going to leave this in until I hear back from the workloads team that this isn't necessary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not necessary would prefer to skip as this feels like a separate enhancement altogether.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really a rollup of many different enhancements across many teams, each likely to lead to an enhancement of their own if sufficiently complex to require an ehancement. I believe we should try to minimize pod rescheduling and I'm currently asking that the workloads team drive that investigation, we'll see what they come up with. I'll change this just to mention that this may require coordination between MCO and Scheduler in some form.

Comment thread enhancements/update/eus-upgrades-mvp.md
only core platform content?

- Should we externalize the API removal bits? Perhaps via a marketplace operator
that's broadly scoped as "EUS 4.6 to EUS 4.10 Validator"?
Copy link
Copy Markdown
Contributor

@crawford crawford Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've written our operators correctly, an EUS Validator would be entirely redundant. All operators must be able to reason about (and take action against!) the validity of their operands in the context of the larger system. This is what makes them an operator, as opposed to a controller.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea here is to inform about user workloads wrt deprecated/removed APIs between 4.6 and 4.10, not with our own.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, OLM is just the delivery mechanism to install an optional component which could decouple the EUS specific API removal notification from the release payload. I don't think we want to attempt to remediate any issues identified but just inform the admin.

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
@russellb
Copy link
Copy Markdown
Contributor

/priority important-soon

@openshift-ci-robot openshift-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 22, 2021
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
Comment thread enhancements/update/eus-upgrades-mvp.md
Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
failure domain because anti-affinity scheduling is used in most scenarios. We
should provide some mechanism to rank upgraded nodes as slightly higher priority
in scheduler configuration. Perhaps a timestamp of configuration generation that
the scheduler then biases toward nodes with the greatest timestamps as a last in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems achievable and super-useful. Currently the MCO annotates nodes with current and target MachineConfig references. It could also annotate them with a rank or timestamp or some such of the current MachineConfig, and the scheduler could use that rank when breaking ties to increase the chances that drained workloads land on the already-updated nodes. With anti-affinity rules and all that usual stuff protecting workloads from getting overly wedged into a particular failure domain, even if that means temporarily scheduling some workload on still-to-be-updated nodes.

Copy link
Copy Markdown
Contributor

@darkmuggle darkmuggle Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a timestamp, while easy enough, is likely going to result in a problem as its does not indicate whether the nodes have the preferred state. I think having the MCO introduce the idea of an "update-epoch" would be clear and allow the scheduler to send workloads based on nodes with the highest epoch. Something like machineconfiguration.openshift.io/update-epoch: 20 (this would be interesting for telemetry, too).

However, if someone has paused their machine config pools, it would have the effect of preferring to run workloads on unpaused pools (since they would have a higher epoch) during normal operations.

An enhancement that focuses on general upgrade safety, speed, and
a minimal amount of EUS to EUS specific work.
upgrade completes we rebalance so that the last node ends up with proportionate
workload.

#### ??? - Provide a hosted graph viewer which will assist disconnected admins in mirroring all content between 4.6 and 4.10
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unsure which board this needs to land on, trying to track that down.

```

If the policy were to change to allow for a version skew of N-2, v1.18 would be
added to the list of acceptable matches. The MCO is not responsible for defining
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will these constrains be defined, and how does the MCO retroactively apply them to prevent the upgrade in the first place?

It sounds like the 4.6 MCO would first have to understand what the constraints are to prevent the upgrade to 4.7, despite the restrictions defined in the 4.7 MCO?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on second thought, there is no real need for the MCO to expand on its current block on pools, since a previous upgrade not finishing would block further upgrades anyways. This means we would never have a kubelet version skew > 1, making this check pointless (unless you wanted to initiate a direct update from 4.6->4.8, which as I understand, shouldn't happen regardless).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific implementation needs to be hashed out, but it would be expected to be defined in the release payload of the version of the MCO which is running, a configmap most likely.


### MCO - Enforce OpenShift's defined host component version skew policies

The MCO, will set Upgradeable=False whenever any MachineConfigPool has one more
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a bit of an overreach onto the MCO. We should be setting upgradeable=false bc an upgrade isn't finished/a pool is degraded and reporting our general status but we shouldn't have to read into kubelet versions etc... to set anything. This feels like overcomplicating the matter? Why can the CVO not make the decision about the next acceptable upgrade based on operator statuses and known upgrade graphs?

Copy link
Copy Markdown
Contributor

@kikisdeliveryservice kikisdeliveryservice Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for this enhancement, we should have the narrowest path possible for these upgrades which would make it safer and easier to support.

Copy link
Copy Markdown
Member Author

@sdodson sdodson Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCO manages the host os, the host os "manages" the kubelet, therefore MCO seems like the right component to enforce constraints of the host os components, it's likely that kernel will be an additional component with constraints.

If an MCP were paused before an upgrade is initiated what happens in your scenario where we set Upgradeable=False when a pool isn't finished or degraded?

The CVO already defines cluster version upgrade paths, but the long term goal is to introduce a decoupling of cluster version from host components while still enforcing version skew policies. This enhancement doesn't propose that, just systems to enable us to change that policy in the future. The future would look like this

  1. Start with 4.6 cluster, Pause Worker MCP
  2. Upgrade 4.6 to 4.7
  3. Upgrade 4.7 to 4.8
  4. MCO is now Upgradeable=False because constraints are violated, kubelet is 1.19 but we're only upgradeable in 1.20 and 1.21, can't upgrade to 4.9.
  5. Worker MCP is unpaused
  6. When all hosts in MCPs now meet the constraints Upgradeable=True
  7. Pause Worker MCP again
  8. Upgrade to 4.9
  9. Upgrade to 4.10
  10. MCO is now Upgradeable=False because constraints are violated, kubelet is 1.21 but we're looking for 1.22 or 1.23.
  11. Worker MCP is unpaused

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan then to skip versions for the worker nodes (e.g. 4.6->4.8->4.10 directly), and that is what will be in the separate proposal you mentioned? And this is setting up for that eventuality?

Put another way, this implementation is for safety only and has nothing to do with EUS. Today you could screw up your cluster by pausing worker pool, upgrading your master from 4.4->4.7, and unpause the worker pool only to have it blow up since it can't make that jump directly. In which case, perhaps we should consider adding some sort of additional check for the wait for all pools work, where paused worker pools block upgrades too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi-zhang Yes, exactly, this closes that gap by defining constraints that all nodes in an MCP must meet before MCO agrees that the cluster can be upgraded to the next minor. We want to decouple that decision from whether or not the pool is up to date in order to provide more flexibility. If someone changed /etc/motd we wouldn't want that to block them from upgrading to the next minor while that rollout is in progress.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my question is why does the MCO have to even look at kubelet versions to determine this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to other suggestions. We had thought of maintaining a list of machine-os-content pullspecs or however that's represented, but that leaves RHEL workers out of the picture and they are managed via MCPs even if that comes via assistance of openshift-ansible. I'm also not sure how we'd maintain that list, where as simple string matching or semantic version comparison would more accurately express the constraints.

Another item I hadn't mentioned is there may become a time when we need to express these constraints on kernel versions, specifically for RHEL Workers.

Is the concern the cluster load required to examine node objects, growth in RBAC necessary to do so, or being able to implement these rules effectively? Anything else?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCO is now Upgradeable=False because constraints are violated, kubelet is 1.19 but we're only upgradeable in 1.20 and 1.21, can't upgrade to 4.9.

@sdodson Just to add a question, if we add something like this for the general case, what are we doing to determine the next upgrade will be 4.9 and not another 4.8 version (which should be allowed?). i.e. when your control plane are on 4.8, your workers are on 4.6, you should block upgrades to 4.9 but not another 4.8 version correct? Without pre-flight checks (maybe in the CVO itself) for the overall upgrade, we won't know in the MCO what the next version would be (until an upgrade gets to the MCO, and its already too late to block by then)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuqi-zhang CVO only inhibits minor version upgrades when Upgradeable=False is set, it does not inhibit z-stream/patch upgrades. So an upgrade into a later 4.8.z would be allowed.

We do this so that if we screw up setting Upgadeable=False you're still allowed to upgrade into a later z-stream where we may fix that.

Comment thread enhancements/update/eus-upgrades-mvp.md Outdated
node_count_per_failure_domain) percentage of pods being rescheduled more than
once during a rolling restart of a given failure domain.

We should make the MCO topology aware so that it upgrades all of the host in one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is not necessary would prefer to skip as this feels like a separate enhancement altogether.

@openshift-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2021
We have pre-existing mechanisms to alert admins to their use of APIs which are
slated for removal in a future release. Currently those mechanisms do not look
forward far enough to span 4.6 to 4.10, we will need to add the ability to
backport that API metadata to 4.6 once we have a firm understanding of all
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does an OLM operator backport the same knowledge so that we can also alert customers to breaking API's?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shawn-hurley Do you mean CRDs installed and managed via OLM or OLM managed operators which use deprecated/removed core APIs? I would expect the latter to generate alerts already once this work lands, but the former would not be covered.

@pweil-
Copy link
Copy Markdown

pweil- commented Feb 8, 2021

Based on current unresolved comments I believe that this PR is in a good state to merge as the umbrella proposal with subsequent iterations on any additional guidance this proposal needs to provide and the following sub-topic implementations being continued discussions for owning teams:

MCO safety checks:
https://github.com/openshift/enhancements/pull/585/files#r569816300
https://github.com/openshift/enhancements/pull/585/files#r569802804

cluster update-retarget: https://github.com/openshift/enhancements/pull/585/files#r569160529

Hosted graph viewer: https://github.com/openshift/enhancements/pull/585/files#r569688098

Eliminate reboots (separate enhancement): https://github.com/openshift/enhancements/pull/585/files#r558461059

User workload alerting: https://github.com/openshift/enhancements/pull/585/files#r558612148

OLM Backports: #585 (comment)

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2021
@openshift-bot
Copy link
Copy Markdown

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit be4bd7b into openshift:master Feb 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.