OCPBUGS#9448: Add a note regarding checking the cluster version by xenolinux · Pull Request #57136 · openshift/openshift-docs

xenolinux · 2023-03-14T05:15:26Z

Version(s): 4.9+

Issue: https://issues.redhat.com/browse/OCPBUGS-9448

Link to docs preview: https://57136--docspreview.netlify.app/openshift-enterprise/latest/updating/updating-cluster-cli.html#update-upgrading-cli_updating-cluster-cli

QE review:

QE has approved this change.

Additional information:

ocpdocs-previewbot · 2023-03-14T05:21:45Z

🤖 Updated build preview is available at:
https://57136--docspreview.netlify.app

Build log: https://circleci.com/gh/ocpdocs-previewbot/openshift-docs/11401

achuzhoy · 2023-03-16T15:21:01Z

/lgtm

xenolinux · 2023-03-17T03:25:30Z

/label peer-review-needed

lahinson · 2023-03-17T13:36:49Z

/label peer-review-in-progress
/remove-label peer-review-needed

lahinson

I suggested a slight revision for clarity, but otherwise, LGTM!

lahinson · 2023-03-17T14:04:54Z

 +
+[NOTE]
+====
+You can ignore if the `oc get clusterversion` command gives the following error while `PROGRESSING` is `True`:


Suggested change

You can ignore if the `oc get clusterversion` command gives the following error while `PROGRESSING` is `True`:

If the `oc get clusterversion` command displays the following error while the `PROGRESSING` status is `True`, you can ignore the error.

lahinson · 2023-03-17T14:13:11Z

/remove-label peer-review-in-progress
/label peer-review-done

openshift-ci · 2023-03-17T14:21:28Z

New changes are detected. LGTM label has been removed.

xenolinux · 2023-03-17T14:22:03Z

/label merge-review-needed

jboxman-rh · 2023-03-17T17:02:25Z

/cherry-pick enterprise-4.13

openshift-cherrypick-robot · 2023-03-17T17:03:11Z

@jboxman-rh: new pull request created: #57356

Details

In response to this:

/cherry-pick enterprise-4.13

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jboxman-rh · 2023-03-17T17:31:12Z

/cherry-pick enterprise-4.12

jboxman-rh · 2023-03-17T17:31:19Z

/cherry-pick enterprise-4.11

jboxman-rh · 2023-03-17T17:31:35Z

/cherry-pick enterprise-4.10

jboxman-rh · 2023-03-17T17:31:55Z

/cherry-pick enterprise-4.9

openshift-cherrypick-robot · 2023-03-17T17:31:58Z

@jboxman-rh: new pull request created: #57361

Details

In response to this:

/cherry-pick enterprise-4.12

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2023-03-17T17:32:04Z

@jboxman-rh: new pull request created: #57362

Details

In response to this:

/cherry-pick enterprise-4.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2023-03-17T17:32:20Z

@jboxman-rh: new pull request created: #57363

Details

In response to this:

/cherry-pick enterprise-4.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-cherrypick-robot · 2023-03-17T17:32:37Z

@jboxman-rh: #57136 failed to apply on top of branch "enterprise-4.9":

Applying: OCPBUGS#9448: Add a note regarding checking clusterversion
Using index info to reconstruct a base tree...
M	modules/update-upgrading-cli.adoc
Falling back to patching base and 3-way merge...
Auto-merging modules/update-upgrading-cli.adoc
CONFLICT (content): Merge conflict in modules/update-upgrading-cli.adoc
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 OCPBUGS#9448: Add a note regarding checking clusterversion
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick enterprise-4.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jboxman-rh · 2023-03-20T17:27:10Z

@xenolinux, looks like this doesn't merge cleanly into 4.9; Is this something that you can fix? Thanks!

xenolinux · 2023-03-21T02:29:19Z

@xenolinux, looks like this doesn't merge cleanly into 4.9; Is this something that you can fix? Thanks!

Yes. Opened a manual CP PR #57454

This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448

This reverts commit fb0d246, #57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, #55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448

This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448

openshift-ci Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Mar 14, 2023

xenolinux force-pushed the clusterversion branch 2 times, most recently from 944de6a to ff043df Compare March 16, 2023 05:36

openshift-ci Bot assigned achuzhoy Mar 16, 2023

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 16, 2023

openshift-ci Bot added the peer-review-needed Signifies that the peer review team needs to review this PR label Mar 17, 2023

openshift-ci Bot added peer-review-in-progress Signifies that the peer review team is reviewing this PR and removed peer-review-needed Signifies that the peer review team needs to review this PR labels Mar 17, 2023

lahinson reviewed Mar 17, 2023

View reviewed changes

openshift-ci Bot added peer-review-done Signifies that the peer review team has reviewed this PR and removed peer-review-in-progress Signifies that the peer review team is reviewing this PR labels Mar 17, 2023

OCPBUGS#9448: Add a note regarding checking clusterversion

fb0d246

xenolinux force-pushed the clusterversion branch from ff043df to fb0d246 Compare March 17, 2023 14:21

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Mar 17, 2023

openshift-ci Bot added the merge-review-needed Signifies that the merge review team needs to review this PR label Mar 17, 2023

jboxman-rh added this to the Continuous Release milestone Mar 17, 2023

jboxman-rh merged commit 358b3cc into openshift:main Mar 17, 2023

openshift-cherrypick-robot mentioned this pull request Mar 17, 2023

[enterprise-4.13] OCPBUGS#9448: Add a note regarding checking the cluster version #57356

Merged

openshift-cherrypick-robot mentioned this pull request Mar 17, 2023

[enterprise-4.12] OCPBUGS#9448: Add a note regarding checking the cluster version #57361

Merged

openshift-cherrypick-robot mentioned this pull request Mar 17, 2023

[enterprise-4.11] OCPBUGS#9448: Add a note regarding checking the cluster version #57362

Merged

openshift-cherrypick-robot mentioned this pull request Mar 17, 2023

[enterprise-4.10] OCPBUGS#9448: Add a note regarding checking the cluster version #57363

Merged

xenolinux mentioned this pull request Mar 21, 2023

[enterprise-4.9] OCPBUGS#9448: Add a note regarding checking the cluster version #57454

Merged

jboxman-rh removed the merge-review-in-progress Signifies that the merge review team is reviewing this PR label Apr 3, 2023

wking mentioned this pull request Oct 5, 2023

Revert "OCPBUGS#9448: Add a note regarding checking clusterversion" #65812

Merged

1 task

	You can ignore if the `oc get clusterversion` command gives the following error while `PROGRESSING` is `True`:
	If the `oc get clusterversion` command displays the following error while the `PROGRESSING` status is `True`, you can ignore the error.

Conversation

xenolinux commented Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ocpdocs-previewbot commented Mar 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achuzhoy commented Mar 16, 2023

Uh oh!

xenolinux commented Mar 17, 2023

Uh oh!

lahinson commented Mar 17, 2023

Uh oh!

lahinson left a comment

Choose a reason for hiding this comment

Uh oh!

lahinson Mar 17, 2023

Choose a reason for hiding this comment

Uh oh!

xenolinux Mar 17, 2023

Choose a reason for hiding this comment

Uh oh!

lahinson commented Mar 17, 2023

Uh oh!

openshift-ci Bot commented Mar 17, 2023

Uh oh!

xenolinux commented Mar 17, 2023

Uh oh!

jboxman-rh commented Mar 17, 2023

Uh oh!

openshift-cherrypick-robot commented Mar 17, 2023

Uh oh!

jboxman-rh commented Mar 17, 2023

Uh oh!

jboxman-rh commented Mar 17, 2023

Uh oh!

jboxman-rh commented Mar 17, 2023

Uh oh!

jboxman-rh commented Mar 17, 2023

Uh oh!

openshift-cherrypick-robot commented Mar 17, 2023

Uh oh!

openshift-cherrypick-robot commented Mar 17, 2023

Uh oh!

openshift-cherrypick-robot commented Mar 17, 2023

Uh oh!

openshift-cherrypick-robot commented Mar 17, 2023

Uh oh!

jboxman-rh commented Mar 20, 2023

Uh oh!

xenolinux commented Mar 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

xenolinux commented Mar 14, 2023 •

edited

Loading

ocpdocs-previewbot commented Mar 14, 2023 •

edited

Loading