EUS Upgrades MVP#585
Conversation
|
|
||
| We have pre-existing mechanisms to alert admins to their use of APIs which are | ||
| slated for removal in a future release. Currently those mechanisms do not look | ||
| forward far enough to span 4.6 to 4.10, we will need to add the ability to |
There was a problem hiding this comment.
Experience question: do you envision that we know the intention of the user to move from EUS to EUS or simply that we provide a way to see forward looking information if the admin wants to see it?
More concretely, just as an example, is the vision:
- showing the API removals, operator versioning (below), timing (below), or other safeguards for the current, explicitly known step prominently in the existing experience with a link to "Upgrading to the next EUS?" that provides all the info we have
- that we know more explicitly they've picked a future destination (EUS or not) and we need an experience that customizes to that choice?
or is it a phase 1 then 2? :-)
There was a problem hiding this comment.
Yes, if we want alerting to be different based on EUS vs not we need to have some mechanism to know that they're EUS. Right now I'd planned on that being a requirement that they upgrade into a 4.6 version that falls after the EUS transition and we'd only backport these alerting rules to those versions.
Requiring them to upgrade deep into 4.6.z may be a tough sell but I think if we move on to eventually removing reboots it's a decent exchange because it allows us to assume a higher baseline of maturity because they've got at least all of the quality improvements that happened during the first year of 4.6. This requirement also allows us to limit the upgrade graph and thus the versions we must test forcing them deep into 4.7, 4.8, and 4.9. Only 4.10 will be relatively new at that point and they'd also have the option of waiting another 3-6 months before completing the 4.9 and 4.10 upgrades if they wish.
There was a problem hiding this comment.
Requiring them to upgrade deep into 4.6.z may be a tough sell...
For folks running from 4.6.{shortly-after-EUS-but-before-some-forward-looking-warning}, there's really not all that much difference between the first hop taking them to 4.6.{huge} or 4.7.{huge} and picking up the warning then, if we decide that 4.6.* -> 4.7.{huge} seems safe enough to add an update edge. So this just comes down to our usual "how many update edges are we willing to stretch tests over?", which is a new-release-build-time decision, and doesn't need any enhancement-time decisions.
There was a problem hiding this comment.
While we know the risk is low it's still a cluster roll and a lot of customers think "EUS" means "I only have to reboot or upgrade once every 18 months."
Based on the call with API Server team it sounds like we're going to pursue only Info level alerts whenever we know of an API removal happening in the next four minor releases. We'll also pursue Upgradeable=False whenever the removal is in the next minor, so an API removed in 4.10 would get Info alerts in 4.6, 4.7, 4.8, 4.9 and Upgradeable=False in 4.9, that way the admin knows while running 4.6 but isn't blocked from forward progress until something would actually break when they take that step.
That means it becomes safe to backport all of the logic related to these alerts to 4.6 regardless of whether or not it's an EUS cluster, therefore we can set a deadline of having this backported to 4.6 when 4.9 goes GA which will be at least 12 weeks before potential 4.10 upgrades. I believe we can reasonably assert that they must be no more than 12 weeks old at the time we start providing EUS upgrades.
| we will require anyone upgrading along an EUS specific upgrade path to update to | ||
| a 4.6 version which is specific to EUS. This is necessary because only those who | ||
| follow the EUS upgrade path have 4.10 as their next logical target version. We | ||
| should not alert non EUS clusters of removals in 4.10 until they've upgraded to |
There was a problem hiding this comment.
Maybe this is the intention driven experience I was asking about above. It isn't that they've picked 4.10 (they still have to pick 4.7 as the next target) it's that they've upgraded to this 4.6 EUS version that has all the notification items backported to it?
In that case would there be an EUS 4.7, 4.8, and 4.9 or would the backported notifications show for people not on the EUS path too? (or the alternative above that only shows {next releases notifications} with expansion to {next EUS notifications}
There was a problem hiding this comment.
Yes, this is the intent signal.
The backporting to 4.7, 4.8, and 4.9 is a good point to raise. I hadn't thought about that, I would assume we leave alerting rules in those versions whatever is appropriate for non-EUS upgrades. We know that 4.6 EUS upgrades will be upgrading through those versions but if we raise the red flags in 4.6 EUS versions and they start the upgrade anyway I feel like they were given enough warning.
I'm also envisioning an escalation of alerts based on when the API is removed. APIs removed in N+1 are warning, N+2 or greater are info level only. When running an EUS specific version N+1 should be equivalent to the alert set of your eventual target minus one, ie: 4.9 in the case that 4.6 and 4.10 are the two EUS versions.
There was a problem hiding this comment.
I think now that we know we'll only use Info level alerts for all removals between 4.6 and 4.10 and then Upgradeable=False for removals between 4.N and 4.N+1 we no longer require this intent signal. It's safe to tell all 4.6 customers about removals between 4.6 and 4.10 and it's safe/required to inhibit upgrades from 4.6 to 4.7 when APIs removed in 4.7 are in use.
|
|
||
| #### As an admin preparing for 4.6 to 4.10 EUS upgrade I'm alerted to API removals | ||
|
|
||
| We have pre-existing mechanisms to alert admins to their use of APIs which are |
There was a problem hiding this comment.
I'm assuming this is the notification and alert section.
For UX related teams: I think we need to clarify the experience on if this would be incorporated to part of the upgrade visualization (in addition to alerting) and if so how would that API work?
There was a problem hiding this comment.
It may be useful if the CVO scraped up alerts of this nature and raised them and then the console could only display alerts firing from CVO on the upgrades panel.
There was a problem hiding this comment.
Do we need to consider using upgradeable=false if we detect usage of an API that is removed in next version?
iow.:
4.6
4.7
4.8: API server go "upgradeable=false" if there are cronjobs in the cluster
4.9: CronJobs are gone, but cluster has still 1000 cron jobs created
4.10:
There was a problem hiding this comment.
Info alerts are planned for 4.7, showing up in the console notification side-bar. We will have to backport to 4.6.
There was a problem hiding this comment.
@sdodson interesting idea from @sferich888 today: if the APIs are deprecated in what we generate our API docs from can we use that info to also generate big, scary release notes?
Another suggestion for collaboration: @jwmatthews and the container migration teams may have thoughts on how we can help admins work through deprecations and other ways to make sure they're front and center during our upgrade process.
There was a problem hiding this comment.
@pweil- FYI, we have some beginning validation work in context of Migrations from MTC to warn a user when they will face a problem that a GVK they relied on in the source cluster isn't present in the target cluster (https://github.com/konveyor/mig-controller/blob/master/pkg/gvk/gvk.go#L66) This is a problem space we are interested in, we expect to see it with future migrations, open to collaborating. (cc @djwhatle @eriknelson @shawn-hurley @jortel )
There was a problem hiding this comment.
k8s also has https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1693-warnings which kubectl & oc consume informing users that they are using deprecating APIs. The question is if/how we could also consume that elsewhere.
There was a problem hiding this comment.
@pweil- @sferich888 Yeah I definitely agree that we'll want to have a release notes if not an upgrade section of docs that's unique to 4.6 to 4.10 upgrades. I'll add an epic place holder to ensure that as part of 4.10 dev cycle we pull this together.
I'll find a slot next week to sync up with @sanchezl @deads2k and you, @jwmatthews.
| - While I believe all of this work to be prerequisite to taking on work to skip | ||
| upgrade steps that may not be widely agreed upon. | ||
|
|
||
| - Do we need to consider CCO for similar treatment that we're providing for API |
There was a problem hiding this comment.
@dgoodwin @joelddiaz Your thoughts here please
There was a problem hiding this comment.
As long as we visit each minor release between 4.6->4.10, I think we're okay. We just need to make sure the minor release visited is a release where CCO knows about the next minor release requirements. So for example: you can't go from 4.6.10 (which can alert about upgradeability to 4.7) to 4.7.2 (which doesn't know about upgradeability to 4.8 b/c 4.7.2 was too early to know about what was coming in 4.8).
There was a problem hiding this comment.
We'd always be respecting minimum version requirements, we'll be upgrading along very late 4.6.z, 4.7.z, and 4.8.z. While 4.9.z will be relatively young it must meet the minimum version requirements to upgrade to 4.10.
|
|
||
| ### User Stories | ||
|
|
||
| #### As an admin preparing for 4.6 to 4.10 EUS upgrade I'm alerted to API removals |
| backport that API metadata to 4.6 EUS versions once we have a firm understanding | ||
| of all removals between 4.6 and 4.10. | ||
|
|
||
| #### As an admin I cannot violate OpenShift's defined Kubelet to API version skew |
There was a problem hiding this comment.
@darkmuggle PTAL at this proposed epic as well as the enhancement in full
| node_count_per_failure_domain) percentage of pods being rescheduled more than | ||
| once during a rolling restart of a given failure domain. | ||
|
|
||
| We should make the MCO topology aware so that it upgrades all of the host in one |
There was a problem hiding this comment.
The MCO should not be topology-aware, IMHO. Topology is very much platform-specific. Really, what is being asked is that the MCO would have to infer sub-pools based on cloud regions. The understanding of cloud-fault domains is well above the level of the MCO. The MCO would need a platform-agnostic means of enumerating the fault-domain and upgrade priorities.
There was a problem hiding this comment.
I'm no longer certain this is necessary but I'm going to leave this in until I hear back from the workloads team that this isn't necessary.
There was a problem hiding this comment.
If this is not necessary would prefer to skip as this feels like a separate enhancement altogether.
There was a problem hiding this comment.
This is really a rollup of many different enhancements across many teams, each likely to lead to an enhancement of their own if sufficiently complex to require an ehancement. I believe we should try to minimize pod rescheduling and I'm currently asking that the workloads team drive that investigation, we'll see what they come up with. I'll change this just to mention that this may require coordination between MCO and Scheduler in some form.
| only core platform content? | ||
|
|
||
| - Should we externalize the API removal bits? Perhaps via a marketplace operator | ||
| that's broadly scoped as "EUS 4.6 to EUS 4.10 Validator"? |
There was a problem hiding this comment.
If we've written our operators correctly, an EUS Validator would be entirely redundant. All operators must be able to reason about (and take action against!) the validity of their operands in the context of the larger system. This is what makes them an operator, as opposed to a controller.
There was a problem hiding this comment.
I think the idea here is to inform about user workloads wrt deprecated/removed APIs between 4.6 and 4.10, not with our own.
There was a problem hiding this comment.
Correct, OLM is just the delivery mechanism to install an optional component which could decouple the EUS specific API removal notification from the release payload. I don't think we want to attempt to remediate any issues identified but just inform the admin.
|
/priority important-soon |
| failure domain because anti-affinity scheduling is used in most scenarios. We | ||
| should provide some mechanism to rank upgraded nodes as slightly higher priority | ||
| in scheduler configuration. Perhaps a timestamp of configuration generation that | ||
| the scheduler then biases toward nodes with the greatest timestamps as a last in |
There was a problem hiding this comment.
This seems achievable and super-useful. Currently the MCO annotates nodes with current and target MachineConfig references. It could also annotate them with a rank or timestamp or some such of the current MachineConfig, and the scheduler could use that rank when breaking ties to increase the chances that drained workloads land on the already-updated nodes. With anti-affinity rules and all that usual stuff protecting workloads from getting overly wedged into a particular failure domain, even if that means temporarily scheduling some workload on still-to-be-updated nodes.
There was a problem hiding this comment.
I think using a timestamp, while easy enough, is likely going to result in a problem as its does not indicate whether the nodes have the preferred state. I think having the MCO introduce the idea of an "update-epoch" would be clear and allow the scheduler to send workloads based on nodes with the highest epoch. Something like machineconfiguration.openshift.io/update-epoch: 20 (this would be interesting for telemetry, too).
However, if someone has paused their machine config pools, it would have the effect of preferring to run workloads on unpaused pools (since they would have a higher epoch) during normal operations.
An enhancement that focuses on general upgrade safety, speed, and a minimal amount of EUS to EUS specific work.
| upgrade completes we rebalance so that the last node ends up with proportionate | ||
| workload. | ||
|
|
||
| #### ??? - Provide a hosted graph viewer which will assist disconnected admins in mirroring all content between 4.6 and 4.10 |
There was a problem hiding this comment.
I'm still unsure which board this needs to land on, trying to track that down.
| ``` | ||
|
|
||
| If the policy were to change to allow for a version skew of N-2, v1.18 would be | ||
| added to the list of acceptable matches. The MCO is not responsible for defining |
There was a problem hiding this comment.
How will these constrains be defined, and how does the MCO retroactively apply them to prevent the upgrade in the first place?
It sounds like the 4.6 MCO would first have to understand what the constraints are to prevent the upgrade to 4.7, despite the restrictions defined in the 4.7 MCO?
There was a problem hiding this comment.
Actually on second thought, there is no real need for the MCO to expand on its current block on pools, since a previous upgrade not finishing would block further upgrades anyways. This means we would never have a kubelet version skew > 1, making this check pointless (unless you wanted to initiate a direct update from 4.6->4.8, which as I understand, shouldn't happen regardless).
There was a problem hiding this comment.
The specific implementation needs to be hashed out, but it would be expected to be defined in the release payload of the version of the MCO which is running, a configmap most likely.
|
|
||
| ### MCO - Enforce OpenShift's defined host component version skew policies | ||
|
|
||
| The MCO, will set Upgradeable=False whenever any MachineConfigPool has one more |
There was a problem hiding this comment.
This feels like a bit of an overreach onto the MCO. We should be setting upgradeable=false bc an upgrade isn't finished/a pool is degraded and reporting our general status but we shouldn't have to read into kubelet versions etc... to set anything. This feels like overcomplicating the matter? Why can the CVO not make the decision about the next acceptable upgrade based on operator statuses and known upgrade graphs?
There was a problem hiding this comment.
Also for this enhancement, we should have the narrowest path possible for these upgrades which would make it safer and easier to support.
There was a problem hiding this comment.
The MCO manages the host os, the host os "manages" the kubelet, therefore MCO seems like the right component to enforce constraints of the host os components, it's likely that kernel will be an additional component with constraints.
If an MCP were paused before an upgrade is initiated what happens in your scenario where we set Upgradeable=False when a pool isn't finished or degraded?
The CVO already defines cluster version upgrade paths, but the long term goal is to introduce a decoupling of cluster version from host components while still enforcing version skew policies. This enhancement doesn't propose that, just systems to enable us to change that policy in the future. The future would look like this
- Start with 4.6 cluster, Pause Worker MCP
- Upgrade 4.6 to 4.7
- Upgrade 4.7 to 4.8
- MCO is now Upgradeable=False because constraints are violated, kubelet is 1.19 but we're only upgradeable in 1.20 and 1.21, can't upgrade to 4.9.
- Worker MCP is unpaused
- When all hosts in MCPs now meet the constraints Upgradeable=True
- Pause Worker MCP again
- Upgrade to 4.9
- Upgrade to 4.10
- MCO is now Upgradeable=False because constraints are violated, kubelet is 1.21 but we're looking for 1.22 or 1.23.
- Worker MCP is unpaused
There was a problem hiding this comment.
Is the plan then to skip versions for the worker nodes (e.g. 4.6->4.8->4.10 directly), and that is what will be in the separate proposal you mentioned? And this is setting up for that eventuality?
Put another way, this implementation is for safety only and has nothing to do with EUS. Today you could screw up your cluster by pausing worker pool, upgrading your master from 4.4->4.7, and unpause the worker pool only to have it blow up since it can't make that jump directly. In which case, perhaps we should consider adding some sort of additional check for the wait for all pools work, where paused worker pools block upgrades too?
There was a problem hiding this comment.
@yuqi-zhang Yes, exactly, this closes that gap by defining constraints that all nodes in an MCP must meet before MCO agrees that the cluster can be upgraded to the next minor. We want to decouple that decision from whether or not the pool is up to date in order to provide more flexibility. If someone changed /etc/motd we wouldn't want that to block them from upgrading to the next minor while that rollout is in progress.
There was a problem hiding this comment.
I think my question is why does the MCO have to even look at kubelet versions to determine this?
There was a problem hiding this comment.
I'm open to other suggestions. We had thought of maintaining a list of machine-os-content pullspecs or however that's represented, but that leaves RHEL workers out of the picture and they are managed via MCPs even if that comes via assistance of openshift-ansible. I'm also not sure how we'd maintain that list, where as simple string matching or semantic version comparison would more accurately express the constraints.
Another item I hadn't mentioned is there may become a time when we need to express these constraints on kernel versions, specifically for RHEL Workers.
Is the concern the cluster load required to examine node objects, growth in RBAC necessary to do so, or being able to implement these rules effectively? Anything else?
There was a problem hiding this comment.
MCO is now Upgradeable=False because constraints are violated, kubelet is 1.19 but we're only upgradeable in 1.20 and 1.21, can't upgrade to 4.9.
@sdodson Just to add a question, if we add something like this for the general case, what are we doing to determine the next upgrade will be 4.9 and not another 4.8 version (which should be allowed?). i.e. when your control plane are on 4.8, your workers are on 4.6, you should block upgrades to 4.9 but not another 4.8 version correct? Without pre-flight checks (maybe in the CVO itself) for the overall upgrade, we won't know in the MCO what the next version would be (until an upgrade gets to the MCO, and its already too late to block by then)
There was a problem hiding this comment.
@yuqi-zhang CVO only inhibits minor version upgrades when Upgradeable=False is set, it does not inhibit z-stream/patch upgrades. So an upgrade into a later 4.8.z would be allowed.
We do this so that if we screw up setting Upgadeable=False you're still allowed to upgrade into a later z-stream where we may fix that.
| node_count_per_failure_domain) percentage of pods being rescheduled more than | ||
| once during a rolling restart of a given failure domain. | ||
|
|
||
| We should make the MCO topology aware so that it upgrades all of the host in one |
There was a problem hiding this comment.
If this is not necessary would prefer to skip as this feels like a separate enhancement altogether.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: crawford The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| We have pre-existing mechanisms to alert admins to their use of APIs which are | ||
| slated for removal in a future release. Currently those mechanisms do not look | ||
| forward far enough to span 4.6 to 4.10, we will need to add the ability to | ||
| backport that API metadata to 4.6 once we have a firm understanding of all |
There was a problem hiding this comment.
How does an OLM operator backport the same knowledge so that we can also alert customers to breaking API's?
There was a problem hiding this comment.
@shawn-hurley Do you mean CRDs installed and managed via OLM or OLM managed operators which use deprecated/removed core APIs? I would expect the latter to generate alerts already once this work lands, but the former would not be covered.
|
Based on current unresolved comments I believe that this PR is in a good state to merge as the umbrella proposal with subsequent iterations on any additional guidance this proposal needs to provide and the following sub-topic implementations being continued discussions for owning teams: MCO safety checks: cluster update-retarget: https://github.com/openshift/enhancements/pull/585/files#r569160529 Hosted graph viewer: https://github.com/openshift/enhancements/pull/585/files#r569688098 Eliminate reboots (separate enhancement): https://github.com/openshift/enhancements/pull/585/files#r558461059 User workload alerting: https://github.com/openshift/enhancements/pull/585/files#r558612148 OLM Backports: #585 (comment) /lgtm |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
An enhancement that focuses on general upgrade safety, speed, and
a minimal amount of EUS to EUS specific work.
TODO: