Skip to content

Conversation

@wking
Copy link
Member

@wking wking commented Aug 29, 2021

"administrators should not upgrade" is way too strict. This commit relaxes that to point out that the message will tell you what's off limits, and the CVO will enforce that (but you can force past the CVO's guard). The focus on minor updates is from rhbz#1797624. Subsequently, the CVO grew an internal check that even blocks patch bumps when they are unlikely to succeed in the face of unsupported overrides rhbz#1822844.

@openshift-ci openshift-ci bot requested review from jwforres and knobunc August 29, 2021 03:50
Copy link

@dhellmann dhellmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit

@LalatenduMohanty
Copy link
Member

LalatenduMohanty commented Sep 9, 2021

#995 (comment) is a nit pick but it looks lgtm overall.

@bparees
Copy link
Contributor

bparees commented Sep 9, 2021

the CVO will enforce that (but you can force past the CVO's guard).

my new understanding after this morning is that the CVO doesn't actually enforce anything. The client tools enforce it (don't let you attempt to set a new target minor if upgradeable=false), but if you edit the targetVersion manually, the CVO is happy to proceed w/ attempting to upgrade w/o you having do to anything special to "force" past it.

cc @sdodson

@wking
Copy link
Member Author

wking commented Sep 10, 2021

The client tools enforce it (don't let you attempt to set a new target minor if upgradeable=false), but if you edit the targetVersion manually, the CVO is happy to proceed w/ attempting to upgrade w/o you having do to anything special to "force" past it.

The CVO does block updates (unless you Force through) with a CVO-side guard for Upgradeable=False. The CVO does not enforce any guards around availableUpdates membership, though, so you could have:

  • Update from 4.y.z to 4.y.z', Upgradeable=False because of some ClusterOperator -> CVO says "no problem, it's a patch bump, off we go".
  • Update from 4.y.z to 4.(y+1).z', Upgradeable=False because of some ClusterOperator -> CVO says "blocking, because this is a minor bump".
  • Update from 4.y.z to 4.(y+1).z', Force=True, Upgradeable=False because of some ClusterOperator -> CVO says "I wish I could block this minor bump, but you're forcing me to take it".
  • Update from 4.y.z to 4.y.z', Upgradeable=False because of ClusterVersion spec.overrides -> CVO says "blocking, because you have spec.overrides".

And if you ask for an update that doesn't trip one of those Upgradeable guards:

@wking wking force-pushed the tighter-upgradeable-docs branch from 5de73ed to 7ef0a53 Compare September 10, 2021 18:59
@bparees
Copy link
Contributor

bparees commented Sep 10, 2021

So not specifically related to this PR, @sdodson and @wking and I talked about our upgrade behavior and we have a pretty complicated interaction between apis and clients, and special cases.

We can't do a whole lot about the api, but we need some better documentation that explains the different implications of upgradeable=false, as well as which upgrades are blocked by the client (it won't update the clusterversion resource) and which things are blocked by the cluster (even if you manually set the clusterversion resource, it will refuse to take action). And then what things override those behaviors.

things that contribute to upgradeability decisions:

  1. image validation
  2. edges existing in your channel
  3. CVO upgradeable=false (can have two different meanings: blocking Zs vs blocking Ys)
  4. upgrade already in progress
  5. ????

and then on oc adm upgrade we offer:

      --allow-explicit-upgrade=false: Upgrade even if the upgrade target is not
listed in the available versions list.
      --allow-upgrade-with-warnings=false: Upgrade even if an upgrade is in
process or a cluster error is blocking the update.
      --force=false: Forcefully upgrade the cluster even when upgrade release image validation fails and the cluster is reporting errors.

aiui, the first 2 of those are client behaviors. but --force is the client applying an api setting to the cluster.

and it's not even entirely clear what "cluster is reporting errors" means. or how that interacts with --allow-upgrade-with-warnings which also claims to allow you to proceed in the face of cluster errors.

So something that walks through all the factors that go into evaluating upgrade acceptability/readiness, and then how the client options and api fields interact with/override those factors would be, imho, really useful.

@wking
Copy link
Member Author

wking commented Sep 10, 2021

#885 should have tightened up the Force docs a bit, and it's currently "verification or upgradeable checks" there. We can become more specific, but as we add more CVO-side gates, I expect Force to continue to be the sledgehammer that smashes through all gates. There currently aren't many CVO-side gates, but... it's not clear to me how much we want oc adm upgrade --help guessing about CVO-side details. Would adding a new CVO-side gate count as a breaking API change?

Anyhow, all of that is scoped a bit higher than what we say about Upgradeable and its gate specifically, which is what this PR is trying to focus on.

@bparees
Copy link
Contributor

bparees commented Sep 11, 2021

Would adding a new CVO-side gate count as a breaking API change?

not if we do it as follows:

  1. add a CVO gate field "BlockUpgradeToNonEdgeTargets" (work on the naming)
  2. default the field to false (unset==false)
  3. Unless oc adm upgrade --alow-explicit-upgrade is passed, client sets BlockUpgradeToNonEdgeTargets=true (older clients will not set this, and thus they will get Block=false behavior on the server/CVO side, which is the current behavior today)

Now, that doesn't buy us a whole lot in terms of protecting users, but it does allow us to move the check from being a client-side decision to being an api-side decision, while not breaking the existing behavior, which I think is a worthy goal, though not an urgent one.

We can also introduce a cluster config setting where an admin can specify the behavior they want (block or not block) generally, and then for any given upgrade request, that is the behavior they get unless the client explicitly overrides the cluster setting. While it still doesn't let us default into blocking those edges, it does allow admins to opt into always blocking them by default.

Again, not sure any of that is worth the effort at this point, i'd prefer to start w/ better documentation on how the current system works (what decisions are enforced by the api/cluster/cvo and what decisions are enforced by the client, and what each "decision" actually means (e.g. what obstacles does "force" overcome and what obstacles does it not overcome).

But after we finish that documentation, if we want to improve the experience i think that is a way to do it.

@sdodson
Copy link
Member

sdodson commented Sep 13, 2021

I agree with a docs first approach to cleaning up this mess we've created. Where do we want those docs though? I'd prefer cluster-version-operator/docs but it also seems like enhancements repo is starting to contain these sort of design decisions which are broadly applicable to the platform. Regardless of which of those we choose we should make sure than an admin friendly version makes it into openshift-docs as well.

@bparees
Copy link
Contributor

bparees commented Sep 13, 2021

Where do we want those docs though?

openshift-docs (product docs) is what i had in my head.

It's pulling together apis/implementation details from a variety of components (oc adm upgrade, cvo, clusterversion api, cincinatti) so i dunno that there's any component repo that can provide a holistic summary of the interactions and overall behavior.

An EP that covers the current behavior might be good also since that would be the starting point if we want to introduce any changes or additions to the behavior.

"administrators should not upgrade" is way too strict.  This commit
relaxes that to point out that the message will tell you what's off
limits, and the CVO will enforce that (but you can force past the
CVO's guard).  The focus on minor updates is from [1].  Subsequently,
the CVO grew an internal check that even blocks patch bumps when they
are unlikely to succeed in the face of unsupported overrides [2].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1797624
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=1822844
@wking wking force-pushed the tighter-upgradeable-docs branch from 7ef0a53 to 52ebfaf Compare September 13, 2021 22:44
@bparees
Copy link
Contributor

bparees commented Sep 13, 2021

/lgtm

(technically I am not an apireviewer but this is just updating docs so i don't think it needs a deep level of oversight)

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 13, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 13, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bparees, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 13, 2021
@openshift-merge-robot openshift-merge-robot merged commit 74913c0 into openshift:master Sep 13, 2021
@wking wking deleted the tighter-upgradeable-docs branch September 14, 2021 17:35
// current cluster state. When status is False, the cluster-version operator
// will prevent the cluster from performing impacted updates unless forced.
// When set on ClusterVersion, the message will explain which updates (minor
// or patch) are impacted. When set on ClusterOperator, False will block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an implicit hierarchy here that needs to be called out? For example, if minor version updates are not allowed, are patch version updates also automatically disallowed ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if minor version updates are not allowed, are patch version updates also automatically disallowed ?

SemVer is MAJOR.MINOR.PATCH, so the two cases we have today are "blocks minor bumps, patch bumps are fine" and "blocks all bumps". So could have been minor or all here. The .spec.overrides-is-set message is:

Disabling ownership via cluster version overrides prevents upgrades. Please remove overrides before continuing.

So the actual user experience should be unambiguous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants