-
Notifications
You must be signed in to change notification settings - Fork 227
docs: degraded condition #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -75,11 +75,35 @@ Here are the guarantees components can get when they follow the rules we define: | |
|
|
||
| There are a set of guarantees components are expected to honor in return: | ||
|
|
||
| 1. A component doesn't report the `Available` status condition the first time until they are completely rolled out (or within some reasonable percentage if the component must be installed to all nodes) | ||
| 2. A component reports `Failing` when it can't accomplish its task. This might be very broad - the API servers are down so I failed to update - or narrow - I couldn't update the last secret. In either case, Failing communicates something is wrong. | ||
| 3. A component reports `Progressing` when it is rolling out new code, propagating config changes, or otherwise moving from one steady state to another. It should not report progressing when it is reconciling a previously known state. If it is progressing to a new version, it should include the version in the message for the condition like "Moving to v1.0.1". | ||
| 4. A component reports `Upgradeable` as `false` when it wishes to prevent an upgrade for an admin-correctable condition. The component should include a message that describes what must be fixed. | ||
| 5. A component reports when it has rolled out the new version of its operands | ||
| 1. A operator doesn't report the `Available` status condition the first time | ||
| until they are completely rolled out (or within some reasonable percentage if | ||
| the component must be installed to all nodes) | ||
| 2. An operator reports `Degraded` when its current state does not match its | ||
| desired state over a period of time resulting in a lower quality of service. | ||
| The period of time may vary by component, but a `Degraded` state represents | ||
| persistent observation of a condition. As a result, a component should not | ||
| oscillate in and out of `Degraded` state. A service may be `Available` even | ||
| if its degraded. For example, your service may desire 3 running pods, but 1 | ||
| pod is crash-looping. The service is `Available` but `Degraded` because it | ||
| may have a lower quality of service. A component may be `Progressing` but | ||
| not `Degraded` because the transition from one state to another does not | ||
| persist over a long enough period to report `Degraded`. A service should not | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why operator can't be Progressing=True and Degraded=True? Today, if progressing towards a new version fails we flip Failing=True but keep Progressing=True (if for instance, the master pool in MCO don't get ready after an upgrade, and that may be temporary till all nodes roll out or persistent over a period of time which I guess we can try to measure/act on). Besides, Why can't we be Degraded while Progressing? |
||
| report `Degraded` during the course of a normal upgrade. A service may report | ||
| `Degraded` in response to a persistent infrastructure failure that requires | ||
| administrator intervention. For example, if a control plane host is unhealthy | ||
| and must be replaced. An operator should report `Degraded` if unexpected | ||
| errors occur over a period, but the expectation is that all unexpected errors | ||
| are handled as operators mature. | ||
| 3. An operator reports `Progressing` when it is rolling out new code, | ||
| propagating config changes, or otherwise moving from one steady state to | ||
| another. It should not report progressing when it is reconciling a previously | ||
| known state. If it is progressing to a new version, it should include the | ||
| version in the message for the condition like "Moving to v1.0.1". | ||
| 4. An operator reports `Upgradeable` as `false` when it wishes to prevent an | ||
| upgrade for an admin-correctable condition. The component should include a | ||
| message that describes what must be fixed. | ||
| 5. An operator reports a new version when it has rolled out the new version to | ||
| all of its operands. | ||
|
|
||
| ### Status | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
An operator reports Degraded when its current state does not match its desired state over a period of time resulting in a lower quality of service .is much more clearer that operators mark degraded when they have been trying to achieve desired but haven't achieved it for a period of time....There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. updated phrasing to match.