Skip to content

Conversation

@abhinavdahiya
Copy link
Contributor

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 26, 2019
pkg/cvo/cvo.go Outdated
func (optr *Operator) defaultPreflightChecks() preflight.List {
return []preflight.Preflight{
preflightcv.NewComponentOverrides(optr.cvLister),
preflightfeatures.New(optr.featuresLister, []configv1.FeatureSet{configv1.TechPreviewNoUpgrade, configv1.CustomNoUpgrade}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many preflight types can't / shouldn't be handled by putting code in the CVO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep this list small to only necessary. because these checks only help if the current knows how to provide safety for next upgrade, while a better and long term solution should be the desired knows how to check those on current's state..

But i also see that the we don't provide any mechanism for clusteroperators to define their own preflight in any other location.. And if we end up being in a situation where we aren't at a place where these are defined by operators in their/some other repo, and we need to add a check to provide safety for an upgrade.. we might have to make more room.

So i don't have a great differentiating line for you currently..

// cache the payload until the release image changes
validPayload := w.payload
if validPayload == nil || !equalUpdate(configv1.Update{Image: validPayload.ReleaseImage}, update) {
if validPayload == nil || !equalUpdate(configv1.Update{Image: validPayload.ReleaseImage}, configv1.Update{Image: update.Image}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? equalUpdate should already ignore version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the force. this check !equalUpdate is always true if the update has force: true even if the release image is the same.

So this function conditional is always activated even in case of reconcile.

Seems like the reason we were seeing the Downloading update during CI runs, in the middle of upgrade.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton added the test case in f0a07b8

here's how the test fails:

[11:57:05] ➜  cluster-version-operator git:(feature-prevent-upgrade) ✗ git --no-pager diff
diff --git a/pkg/cvo/sync_worker.go b/pkg/cvo/sync_worker.go
index e27ffc2..a863743 100644
--- a/pkg/cvo/sync_worker.go
+++ b/pkg/cvo/sync_worker.go
@@ -466,7 +466,7 @@ func (w *SyncWorker) syncOnce(ctx context.Context, work *SyncWork, maxWorkers in

        // cache the payload until the release image changes
        validPayload := w.payload
-       if validPayload == nil || !equalUpdate(configv1.Update{Image: validPayload.ReleaseImage}, configv1.Update{Image: update.Image}) {
+       if validPayload == nil || !equalUpdate(configv1.Update{Image: validPayload.ReleaseImage}, update) {
                klog.V(4).Infof("Loading payload")
                reporter.Report(SyncWorkerStatus{
                        Generation:  work.Generation,
[11:57:13] ➜  cluster-version-operator git:(feature-prevent-upgrade) ✗ go test ./pkg/cvo/... -v -run "^TestCVO_UpgradeUnverifiedPayloadRetriveOnce"
=== RUN   TestCVO_UpgradeUnverifiedPayloadRetriveOnce
E0829 11:57:19.271703    7584 sync_worker.go:322] unable to synchronize image (waiting 62.5ms): The update cannot be verified: some random error
--- FAIL: TestCVO_UpgradeUnverifiedPayloadRetriveOnce (0.50s)
    cvo_scenarios_test.go:1335: unexpected status item 0:
        object.Step:
          a: "ApplyResources"
          b: "RetrievePayload"
        object.VersionHash:
          a: "6GC9TkkG9PA="
          b: ""
FAIL
FAIL    github.com/openshift/cluster-version-operator/pkg/cvo   0.514s
testing: warning: no tests to run
PASS
ok      github.com/openshift/cluster-version-operator/pkg/cvo/internal  (cached) [no tests to run]
?       github.com/openshift/cluster-version-operator/pkg/cvo/internal/dynamicclient    [no test files]

Actual: update,
Verified: info.Verified,
})
if err := preflight.Summarize(w.preflights.RunAll(ctx)); err != nil && !update.Force {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't think force should be used in this case. --force bypasses security checks on content, I think we'd need to do a new force type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openshift/api/blob/58aab2885e38fd7f4fd57f2615f5f9fe38039e22/config/v1/types_cluster_version.go#L209-L218

// force allows an administrator to update to an image that has failed
// verification, does not appear in the availableUpdates list, or otherwise
// would be blocked by normal protections on update. This option should only
// be used when the authenticity of the provided image has been verified out
// of band because the provided image will run with full administrative access
// to the cluster. Do not use this flag with images that comes from unknown
// or potentially malicious sources.
//
// This flag does not override other forms of consistency checking that are
// required before a new update is deployed.

not appear in the availableUpdates list, or otherwise would be blocked by normal protections on update.
This flag does not override other forms of consistency checking that are required before a new update is deployed.

Sooo, with our current documentation, the following line allows admins to use force to override any protections on update and not consistency checks..

I think the preflight checks like the override protection,techpreview and custom noupgrade feature fall more in to protections bucket rather than the consistencies.

And I agree force is a big hammer, but https://bugzilla.redhat.com/show_bug.cgi?id=1730401 is a 4.2 blocker and i would not like to introduce any additions this late in the cycle. So putting this into force for now seems an option.

But i agree that we should 4.2+ think about allowing users to skip only some protections these become more important in auto upgrade. like disconnected people would maybe like to switch off only verify content and while others might want to only skip feature-flags..

but the current proposal of force being the big hammer today does not block moving to future with smaller hammers..?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the size of the hammer is mostly concerned with encouraging use of the hammer. Since this is going to be rare, I think it's ok to allow it without an API change in 4.2.

However, we may need to record in some form whether the hammer was used to assess whether an upgrade was bypassed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A way to record the state of all COs at the time that --force was used would be nice to have in any future work.

@sdodson
Copy link
Member

sdodson commented Aug 28, 2019

/retitle Bug 1730401: WIP: cvo: prevent automatic upgrades for clusters that have un-supported feature gates set

@openshift-ci-robot openshift-ci-robot changed the title WIP: cvo: prevent automatic upgrades for clusters that have un-supported feature gates set Bug 1730401: WIP: cvo: prevent automatic upgrades for clusters that have un-supported feature gates set Aug 28, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 28, 2019
@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: This pull request references Bugzilla bug 1730401, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Bug 1730401: WIP: cvo: prevent automatic upgrades for clusters that have un-supported feature gates set

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 28, 2019
@abhinavdahiya abhinavdahiya force-pushed the feature-prevent-upgrade branch from 3fd0ee8 to f3ce4e8 Compare August 28, 2019 19:30
@abhinavdahiya abhinavdahiya changed the title Bug 1730401: WIP: cvo: prevent automatic upgrades for clusters that have un-supported feature gates set Bug 1730401: prevent automatic upgrades for clusters that have un-supported feature gates set Aug 28, 2019
@abhinavdahiya
Copy link
Contributor Author

I think I have all the tests I thought are important added.
ping @smarterclayton and @wking as it should be ready for review.

@smarterclayton
Copy link
Contributor

I'm somewhat confused because I thought we said this week in various forums we weren't going to block upgrades on the server, but maybe I missed the discussion where this became critical.

@@ -0,0 +1,64 @@
package features
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected a preflight checking cluster operator condition Upgradeable == false. This one actually probably already covers features, although having a features message is... ok, I guess. If KAO is already setting Upgradeable false on feature set we don't need anything feature specific and so CVO is only driven by CO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the goal is to prevent upgrades when features are set..

the precondition upgradeable == false preventing upgrades personally I think needs more thought and testing to make sure the operators are no missusing before we add is important. So I don't think we should be doing it.

KAO already setting a uprageable = false on features is great, but I think depending on that wouldn't be ideal as mentioned above, and also depends on KAO always respecting that :P I think it's better if CVO does it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ ping @smarterclayton

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of our components ALREADY report Upgradeable false

pkg/cvo/cvo.go Outdated
"github.com/openshift/cluster-version-operator/pkg/cvo/internal/dynamicclient"
"github.com/openshift/cluster-version-operator/pkg/internal"
"github.com/openshift/cluster-version-operator/pkg/payload"
"github.com/openshift/cluster-version-operator/pkg/payload/preflight"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected this PR to summarize an Upgradeable condition - ideally before a user ever upgrades. The current structure can't easily do that (and there are advantages to having preflight check coming on the new code), but for instance we could summarize all CO Upgradeable conditions and display a message, so someone knows where to look.

That simplifies CLI and web console experience (sam doesn't have to go read all CO, we get a consistent message and experience).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected this PR to summarize an Upgradeable condition - ideally before a user ever upgrades.

That would be a long running syncer, and I don't think it's great to add that at this time.

The current structure can't easily do that (and there are advantages to having preflight check coming on the new code),

The current structures goal is to prevent upgrades, the long running syncer doesn't do that.. even if we add the long running syncer, we still needs this code to check the precondition before cvo accepts the upgrade. So I think the syncer to provide the Upgradeable condition is orthogonal to this, and perhaps when we add that later on, the precondition could use that as another check??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ ping @smarterclayton

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a long running syncer, and I don't think it's great to add that at this time.

I don't understand why. You just have to summarize the state.

Blocking without reporting seems unacceptable to me. I should know up top whether upgrades should even work.

I think the miscommunication / gap here is that we already have an implemented mechanism for reporting "Don't upgrade me" as Upgradable false since 4.1 GA, but this is targeting other checks which are much more specific / tied to implementation. We may need to have a higher bandwidth conversation on that, but the generic condition is better at isolating CVO from details, and behaves exactly like degraded in that sense.

}
return &payload.UpdateError{
Nested: nil,
Reason: "PreflightChecksFailed",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another word we would use besides Preflight? UpgradePreconditionCheckFailed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done with a5500d3

@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 29, 2019
@abhinavdahiya abhinavdahiya force-pushed the feature-prevent-upgrade branch from a5500d3 to ab89eae Compare August 31, 2019 01:49
@openshift-ci-robot openshift-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 31, 2019
Copy link
Contributor

@smarterclayton smarterclayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this didn't get submitted earlier

@abhinavdahiya abhinavdahiya force-pushed the feature-prevent-upgrade branch from ab89eae to 3ac6f78 Compare September 6, 2019 16:17
@abhinavdahiya
Copy link
Contributor Author

@smarterclayton added the long running upgradeable reporting sync 35c4299

and updated the preflights to single check if upgradeable...

PTAL if this looks better.

…ing Update

This adds an interface `Precondition` that defines preflight checks for an Update.

It adds one types of checks:
1) ClusterVersion upgradeable
* If the upgradeable condition is False, the update should be marked as not safe to proceed.

Also adds a Summarize function, that can summarize the precondition checks failures to a single error that is more understandable by cvo sync_loops

Adds `UpgradePreconditionCheckFailed` as a known reason for summary of UpdateErrors.

Similar to the release-image verification, when an Update is required,

* precondition checks are executed using `precondition.List.RunAll`, which runs all the precondition checks in order and returns errors.
* If any of the precondition checks fails, the cluster version operators does not start the upgrade to the requested Update
* similar to release-image-verification, the user will have to force an upgrade when the Update is rejected due to precondition check failure.

Piggybacking on the `force`:

> https://github.com/openshift/api/blob/58aab2885e38fd7f4fd57f2615f5f9fe38039e22/config/v1/types_cluster_version.go#L209-L218
> // force allows an administrator to update to an image that has failed
> // verification, does not appear in the availableUpdates list, or otherwise
> // would be blocked by normal protections on update. This option should only
> // be used when the authenticity of the provided image has been verified out
> // of band because the provided image will run with full administrative access
> // to the cluster. Do not use this flag with images that comes from unknown
> // or potentially malicious sources.
> //
> // This flag does not override other forms of consistency checking that are
> // required before a new update is deployed.

> not appear in the availableUpdates list, or otherwise would be blocked by normal protections on update.
> This flag does not override other forms of consistency checking that are required before a new update is deployed.

Sooo, with our current documentation, the following line allows admins to use force to override any protections on update and not consistency checks.. and I think the precondition checks like the `override protection`,`techpreview and custom noupgrade feature` fall more in to protections bucket rather than the consistencies. Also I would agree that force is a big hammer, but https://bugzilla.redhat.com/show_bug.cgi?id=1730401 is a 4.2 blocker and i would to like to **not** introduce any additions this late in the cycle. So putting this into force for now seems **an** option.

But i agree that we should 4.2+ think about allowing users to skip only some protections _these become more important in auto upgrade_. like disconnected people would maybe like to switch off only verify content and while others might want to only skip feature-flags.. but the current proposal of force being the big hammer today does not block moving to future with smaller hammers..
@abhinavdahiya abhinavdahiya force-pushed the feature-prevent-upgrade branch from 64249a4 to aceb5bc Compare October 28, 2019 21:51
@abhinavdahiya
Copy link
Contributor Author

@smarterclayton updated the PR to address your comments.

@smarterclayton
Copy link
Contributor

/retest

@smarterclayton
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 28, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abhinavdahiya, smarterclayton

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [abhinavdahiya,smarterclayton]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 01fad60 into openshift:master Oct 29, 2019
@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: All pull requests linked via external trackers have merged. Bugzilla bug 1730401 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1730401: prevent automatic upgrades for clusters with cluster operators reporting Upgradeable false

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 13, 2020
Typo from aceb5bc (pkg: add precondition handlers and perform these
checks before accepting Update, 2019-08-26, openshift#243).
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 24, 2020
Typo from aceb5bc (pkg: add precondition handlers and perform these
checks before accepting Update, 2019-08-26, openshift#243).
wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 24, 2020
Typo from aceb5bc (pkg: add precondition handlers and perform these
checks before accepting Update, 2019-08-26, openshift#243).
wking added a commit to wking/cluster-version-operator that referenced this pull request Jul 29, 2020
…cases

The three cases landed with the same name in 0452814 (cvo: long
running upgradeable sync worker, 2019-09-05, openshift#243).  The new names
make the differences, which all involve the presence and state of
default-operator-2, more obvious.
wking added a commit to wking/cluster-version-operator that referenced this pull request Sep 17, 2020
Fixing a typo from 0452814 (cvo: long running upgradeable sync
worker, 2019-09-05, openshift#243).

Generated with:

  $ sed -i 's/Upgradebale/Upgradeable/g' $(git grep -l Upgradebale)
wking added a commit to wking/openshift-api that referenced this pull request Apr 7, 2021
These docs have not been adjusted since the properties landed in
ab4ff93 (Update ClusterVersion to have a 'force' update flag and
track verified, 2019-04-22, openshift#293).  The initial cluster-version
operator implementation only used them for signature checks,
openshift/cluster-version-operator@f8eff25191 (sync: Report whether
updates are verified and allow admin override, 2019-04-20,
openshift/cluster-version-operator#170).  At that time,
RetrievePayload would always attempt to verify the proposed release
image, including both signature validation and "does this look like a
real release image?" checks.  If that verification succeeded, it would
set 'verified: true'.  If that verification failed, it would look at
'force', and if 'force' was true, it would set 'verified: false' and
proceed with the update, while if 'force' was false, it would reject
the update.

Despite the API docs, availableUpdates never came into this in the CVO
code.  The only availableUpdates guards are client-side in 'oc', and
the API docs are discussion the ClusterVersion API, not 'oc adm
upgrade's --force API.  In openshift/oc@0501d04ff1 (upgrade: Separate
flags for safety instead of abusing force, 2019-09-27,
openshift/oc#109), oc's client-side guards split
--allow-explicit-upgrade (for updating to something not in
availableUpdates) and --allow-upgrade-with-warnings (for updating
despite client-side guards against Progressing and Degraded
ClusterVersion conditions) away from --force, but again, never part of
the ClusterVersion force property's CVO-side handling.

Since then, the CVO grew precondition handlers in
openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition
handlers and perform these checks before accepting Update, 2019-08-26,
openshift/cluster-version-operator#243).  syncOnce ran all the
precondition checks, which at that point was just the Upgradeable
ClusterOperator and ClusterVersion condition check.  If that check
failed, but 'force' was true, it would proceed with the update, while
if 'force' was false, it would reject the update.  In neither case
would 'verified' be altered, it continued to only track the signature
and "does this look like a real release image?" checks.  The idea at
the time was that these would fall under the "normal protections"
phrasing (overriden by force) and not the "other forms of consistency
checking" phrasing (whatever those were supposed to be).
wking added a commit to wking/openshift-api that referenced this pull request Apr 7, 2021
These docs have not been adjusted since the properties landed in
ab4ff93 (Update ClusterVersion to have a 'force' update flag and
track verified, 2019-04-22, openshift#293).  The initial cluster-version
operator implementation only used them for signature checks,
openshift/cluster-version-operator@f8eff25191 (sync: Report whether
updates are verified and allow admin override, 2019-04-20,
openshift/cluster-version-operator#170).  At that time,
RetrievePayload would always attempt to verify the proposed release
image, including both signature validation and "does this look like a
real release image?" checks.  If that verification succeeded, it would
set 'verified: true'.  If that verification failed, it would look at
'force', and if 'force' was true, it would set 'verified: false' and
proceed with the update, while if 'force' was false, it would reject
the update.

Despite the API docs, availableUpdates never came into this in the CVO
code.  The only availableUpdates guards are client-side in 'oc', and
the API docs are discussion the ClusterVersion API, not 'oc adm
upgrade's --force API.  In openshift/oc@0501d04ff1 (upgrade: Separate
flags for safety instead of abusing force, 2019-09-27,
openshift/oc#109), oc's client-side guards split
--allow-explicit-upgrade (for updating to something not in
availableUpdates) and --allow-upgrade-with-warnings (for updating
despite client-side guards against Progressing and Degraded
ClusterVersion conditions) away from --force, but again, never part of
the ClusterVersion force property's CVO-side handling.

Since then, the CVO grew precondition handlers in
openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition
handlers and perform these checks before accepting Update, 2019-08-26,
openshift/cluster-version-operator#243).  syncOnce ran all the
precondition checks, which at that point was just the Upgradeable
ClusterOperator and ClusterVersion condition check.  If that check
failed, but 'force' was true, it would proceed with the update, while
if 'force' was false, it would reject the update.  In neither case
would 'verified' be altered; it continued to only track the signature
and "does this look like a real release image?" checks.  The idea at
the time was that these would fall under the "normal protections"
phrasing (overriden by force) and not the "other forms of consistency
checking" phrasing (whatever those were supposed to be).
wking added a commit to wking/cluster-version-operator that referenced this pull request Jun 17, 2021
…graded"

Because folks can force to override the CVO's precondition guards,
including the Upgradeable=False guard.  The "cannot" wording is from
way back in 0452814 (cvo: long running upgradeable sync worker,
2019-09-05, openshift#243).

Generated with:

  $ sed -i 's/cannot be upgraded/should not be upgraded/g' $(git grep -l 'cannot be upgraded')
wking added a commit to wking/openshift-api that referenced this pull request Jul 14, 2021
These docs have not been adjusted since the properties landed in
ab4ff93 (Update ClusterVersion to have a 'force' update flag and
track verified, 2019-04-22, openshift#293).  The initial cluster-version
operator implementation only used them for signature checks,
openshift/cluster-version-operator@f8eff25191 (sync: Report whether
updates are verified and allow admin override, 2019-04-20,
openshift/cluster-version-operator#170).  At that time,
RetrievePayload would always attempt to verify the proposed release
image, including both signature validation and "does this look like a
real release image?" checks.  If that verification succeeded, it would
set 'verified: true'.  If that verification failed, it would look at
'force', and if 'force' was true, it would set 'verified: false' and
proceed with the update, while if 'force' was false, it would reject
the update.

Despite the API docs, availableUpdates never came into this in the CVO
code.  The only availableUpdates guards are client-side in 'oc', and
the API docs are discussion the ClusterVersion API, not 'oc adm
upgrade's --force API.  In openshift/oc@0501d04ff1 (upgrade: Separate
flags for safety instead of abusing force, 2019-09-27,
openshift/oc#109), oc's client-side guards split
--allow-explicit-upgrade (for updating to something not in
availableUpdates) and --allow-upgrade-with-warnings (for updating
despite client-side guards against Progressing and Degraded
ClusterVersion conditions) away from --force, but again, never part of
the ClusterVersion force property's CVO-side handling.

Since then, the CVO grew precondition handlers in
openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition
handlers and perform these checks before accepting Update, 2019-08-26,
openshift/cluster-version-operator#243).  syncOnce ran all the
precondition checks, which at that point was just the Upgradeable
ClusterOperator and ClusterVersion condition check.  If that check
failed, but 'force' was true, it would proceed with the update, while
if 'force' was false, it would reject the update.  In neither case
would 'verified' be altered; it continued to only track the signature
and "does this look like a real release image?" checks.  The idea at
the time was that these would fall under the "normal protections"
phrasing (overriden by force) and not the "other forms of consistency
checking" phrasing (whatever those were supposed to be).
wking added a commit to wking/openshift-api that referenced this pull request Jul 14, 2021
These docs have not been adjusted since the properties landed in
ab4ff93 (Update ClusterVersion to have a 'force' update flag and
track verified, 2019-04-22, openshift#293).  The initial cluster-version
operator implementation only used them for signature checks,
openshift/cluster-version-operator@f8eff25191 (sync: Report whether
updates are verified and allow admin override, 2019-04-20,
openshift/cluster-version-operator#170).  At that time,
RetrievePayload would always attempt to verify the proposed release
image, including both signature validation and "does this look like a
real release image?" checks.  If that verification succeeded, it would
set 'verified: true'.  If that verification failed, it would look at
'force', and if 'force' was true, it would set 'verified: false' and
proceed with the update, while if 'force' was false, it would reject
the update.

Despite the API docs, availableUpdates never came into this in the CVO
code.  The only availableUpdates guards are client-side in 'oc', and
the API docs are discussion the ClusterVersion API, not 'oc adm
upgrade's --force API.  In openshift/oc@0501d04ff1 (upgrade: Separate
flags for safety instead of abusing force, 2019-09-27,
openshift/oc#109), oc's client-side guards split
--allow-explicit-upgrade (for updating to something not in
availableUpdates) and --allow-upgrade-with-warnings (for updating
despite client-side guards against Progressing and Degraded
ClusterVersion conditions) away from --force, but again, never part of
the ClusterVersion force property's CVO-side handling.

Since then, the CVO grew precondition handlers in
openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition
handlers and perform these checks before accepting Update, 2019-08-26,
openshift/cluster-version-operator#243).  syncOnce ran all the
precondition checks, which at that point was just the Upgradeable
ClusterOperator and ClusterVersion condition check.  If that check
failed, but 'force' was true, it would proceed with the update, while
if 'force' was false, it would reject the update.  In neither case
would 'verified' be altered; it continued to only track the signature
and "does this look like a real release image?" checks.  The idea at
the time was that these would fall under the "normal protections"
phrasing (overriden by force) and not the "other forms of consistency
checking" phrasing (whatever those were supposed to be).
wking added a commit to wking/openshift-api that referenced this pull request Jul 14, 2021
These docs have not been adjusted since the properties landed in
ab4ff93 (Update ClusterVersion to have a 'force' update flag and
track verified, 2019-04-22, openshift#293).  The initial cluster-version
operator implementation only used them for signature checks,
openshift/cluster-version-operator@f8eff25191 (sync: Report whether
updates are verified and allow admin override, 2019-04-20,
openshift/cluster-version-operator#170).  At that time,
RetrievePayload would always attempt to verify the proposed release
image, including both signature validation and "does this look like a
real release image?" checks.  If that verification succeeded, it would
set 'verified: true'.  If that verification failed, it would look at
'force', and if 'force' was true, it would set 'verified: false' and
proceed with the update, while if 'force' was false, it would reject
the update.

Despite the API docs, availableUpdates never came into this in the CVO
code.  The only availableUpdates guards are client-side in 'oc', and
the API docs are discussion the ClusterVersion API, not 'oc adm
upgrade's --force API.  In openshift/oc@0501d04ff1 (upgrade: Separate
flags for safety instead of abusing force, 2019-09-27,
openshift/oc#109), oc's client-side guards split
--allow-explicit-upgrade (for updating to something not in
availableUpdates) and --allow-upgrade-with-warnings (for updating
despite client-side guards against Progressing and Degraded
ClusterVersion conditions) away from --force, but again, never part of
the ClusterVersion force property's CVO-side handling.

Since then, the CVO grew precondition handlers in
openshift/cluster-version-operator@aceb5bc66f (pkg: add precondition
handlers and perform these checks before accepting Update, 2019-08-26,
openshift/cluster-version-operator#243).  syncOnce ran all the
precondition checks, which at that point was just the Upgradeable
ClusterOperator and ClusterVersion condition check.  If that check
failed, but 'force' was true, it would proceed with the update, while
if 'force' was false, it would reject the update.  In neither case
would 'verified' be altered; it continued to only track the signature
and "does this look like a real release image?" checks.  The idea at
the time was that these would fall under the "normal protections"
phrasing (overriden by force) and not the "other forms of consistency
checking" phrasing (whatever those were supposed to be).
wking added a commit to wking/cluster-version-operator that referenced this pull request Oct 7, 2021
…graded"

Because folks can force to override the CVO's precondition guards,
including the Upgradeable=False guard.  The "cannot" wording is from
way back in 0452814 (cvo: long running upgradeable sync worker,
2019-09-05, openshift#243).

Generated with:

  $ sed -i 's/cannot be upgraded/should not be upgraded/g' $(git grep -l 'cannot be upgraded')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants