GH#53054 modules/update-upgrading-cli: Use 'oc adm upgrade' for current status#55005
Merged
mburke5678 merged 1 commit intoopenshift:mainfrom Feb 7, 2023
Merged
Conversation
e3c2ada to
891cda7
Compare
|
🤖 Updated build preview is available at: Build log: https://circleci.com/gh/ocpdocs-previewbot/openshift-docs/8423 |
wking
reviewed
Jan 20, 2023
wking
reviewed
Jan 20, 2023
wking
reviewed
Jan 20, 2023
795ed6c to
cdf2fcd
Compare
wking
reviewed
Jan 25, 2023
wking
reviewed
Jan 25, 2023
LalatenduMohanty
suggested changes
Jan 26, 2023
cdf2fcd to
1fe7de5
Compare
4c3c60c to
97dc60b
Compare
wking
reviewed
Jan 27, 2023
29d2313 to
f225aa9
Compare
Contributor
Author
|
/label peer-review-needed |
mburke5678
reviewed
Jan 31, 2023
Contributor
|
@obrown1205 A few nits. Otherwise LGTM |
Contributor
|
/lgtm |
Contributor
Author
|
/label merge-review-needed |
This was referenced Feb 7, 2023
1 task
wking
added a commit
to wking/openshift-docs
that referenced
this pull request
Oct 16, 2023
This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
openshift-cherrypick-robot
pushed a commit
to openshift-cherrypick-robot/openshift-docs
that referenced
this pull request
Nov 1, 2023
This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
openshift-cherrypick-robot
pushed a commit
to openshift-cherrypick-robot/openshift-docs
that referenced
this pull request
Nov 1, 2023
This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
skopacz1
pushed a commit
to skopacz1/openshift-docs
that referenced
this pull request
Nov 1, 2023
This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
skopacz1
pushed a commit
to skopacz1/openshift-docs
that referenced
this pull request
Nov 1, 2023
This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
michaelryanpeter
pushed a commit
that referenced
this pull request
Nov 1, 2023
This reverts commit fb0d246, #57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, #55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
GroceryBoyJr
pushed a commit
to GroceryBoyJr/openshift-docs
that referenced
this pull request
Nov 17, 2023
This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
DCChadwick
pushed a commit
to DCChadwick/openshift-docs
that referenced
this pull request
Nov 28, 2023
This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
skrthomas
pushed a commit
to skrthomas/openshift-docs
that referenced
this pull request
Dec 6, 2023
This reverts commit fb0d246, openshift#57136. The commit message did not explain the motivation, but [1] has: One note is: " After the update completes, you can confirm that the cluster version has updated to the new version: $ oc get clusterversion " The thing is that if the upgrade didn't complete, this output may include errors, for example: " oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.26 True True 24m Unable to apply 4.11.0-rc.7: an unknown error has occurred: MultipleErrors " We should include a note to disregard these errors as long as the "Progressing" is in True. But 1143cbe (Removed jq commands and replaced with oc adm upgrade, 2023-01-20, openshift#55005) was moving us from 'oc get clusterversion' towards 'oc adm upgrade' for "show my current context". Both get ClusterVersion and print a summary of its current status to the console. 'oc get ...' is generic CustomResourceDefinition rendering and the only knobs are things like additionalPrinterColumns, because 'get' needs to work if there are multiple matching resources and fit them each into one line of text. We have more control over the rendering in 'oc adm upgrade', where we know we only care about the ClusterVersion named 'cluster', and can take as many lines as we need to explain the details of the current cluster state. So basically there's no upside to using 'oc get clusterversion', unless you are using '-o json' or something and piping the results into a robot for structured consumption. For direct human consumption, 'oc adm upgrade' will always be better, and in this commit, I'm moving us back to using 'oc adm upgrade'. This also gets the intervening 'Cluster version is <version>' example output back to making sense, since fb0d246 failed to update that example output when it pivoted the suggested command. I'm also dropping the "you can ignore the error" line. While there are certainly cases when cluster components ask for admin intervention (e.g. by setting Available=False in a ClusterOperator) despite admin intervention not actually being needed, it is absolutely not a good idea to ignore those in general. Cases: * Cluster claims it is happy... * ... and it actually is happy. Hooray! * ... but it is actually unhealthy. Worth a bug to the overly optimistic component about more accurately reporting that it is unhealthy, and should be calling for admin intervention. * Cluster claims it is unhealthy... * ... and it actually is unhealthy. Sad. But at least it is appropriately calling for help. Error messages should be actionable, so the admin can quickly identify and resolve the issue, and if not, it is worth a bug to the opaquely-failing component about more helpfully guiding the admin through triage and resolution. * ... but it is actually pretty healthy. This is the case where the admin could ignore the error message. But it is hard to distinguish from the "actually unhealthy" cases where the admin should not ignore the error message. Certainly: an unknown error has occurred: MultipleErrors is insufficient data to decide that the error report is false-positive noise. Worth digging into the source of the error report, and a bug to the noisy component about not calling for admin assistance when no intervention is actually needed. There will probably always be some corner cases where the cluster does not ask for help despite needing help, or the cluster asks for help despite not needing help, but as the controllers get smarter, those cases should become more and more rare, to the point where we should not need to discuss them in docs beyond generic comments pointing out that no real-world diagnostic system is completely free of false-positive and false-negative risk. [1]: https://issues.redhat.com/browse/OCPBUGS-9448
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Version(s):
4.10+
Issue:
Resolving conflicts from #53054
Link to docs preview:
https://55005--docspreview.netlify.app/openshift-enterprise/latest/updating/updating-cluster-cli.html#update-upgrading-cli_updating-cluster-cli
QE review:
Additional information: