Disable a machine from being fenced#332
Conversation
|
Hi @emesika. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| Motivation | ||
| ---------------- | ||
|
|
||
| The administrator should be able to exclude nodes from being fenced.This will allow to track the machine manually without applying the machine fencing mechanism according to the system administrator considerations. |
There was a problem hiding this comment.
I am confused by the "fencing" term. In my understanding:
Fencing is the process of isolating a node to protect a cluster and its resources. Without fencing, a faulty node can cause data corruption in a cluster.
Your proposal looks like disabling machine healthchecking. is machine healthcking and fencing same thing?
There was a problem hiding this comment.
I find the terminology confusing too.
We talk about health checking (noticing a failure) and remediation (currently just deletion).
I believe the point of both health checking and the remediation is (a) to allow safely moving workloads from unhealthy nodes and (b) attempting to bring the node back into a healthy state. And so we say this is fencing.
But if we're going to use that term at all, we should use it consistently.
e.g. README.md says:
Machine healthcheck controller
Reconciles desired state for MachineHealthChecks by ensuring that machines targeted by machineHealthCheck objects are healthy or remediated otherwise
Would we be happy to change that to:
Machine healthcheck (fencing) controller
Provides node level fencing by checking that machines targeted by MachineHealthCheck objects are healthy (as indicated by NodeConditions) and attempting remediation of unhealthy machines to bring them back to a healthy state
| @@ -0,0 +1,31 @@ | |||
| Disable Machine Fencing | |||
There was a problem hiding this comment.
why would you need this at all since mhc targets only machines of your choice via label selector?
There was a problem hiding this comment.
We have a maintenace controller, that should drain all VM's from the respective node, and also we do not want to fence it, because of it under the maintenance, but its temporary state and we do not want each time update node or MHC labels, because it easy to forgot to do it and disable fencing for the machine at all
There was a problem hiding this comment.
isn't this proposing exactly to use a label? can you please include that use casehttps://docs.google.com/document/d/10kauaJiXaWpvmd_qVsgIoZBNQLJMXMsQga3exV04ZTY/edit?ts=5d0b91e6#heading=h.4ifefbk4b4y2 so can be discussed
There was a problem hiding this comment.
this one who will set the label is node-maintenance controller, we just want respect this label under MHC and do not fence node
Added under the document
There was a problem hiding this comment.
I think this feature is useful. In some cases, it maybe desired to skip specific machines.
Another approach could be to support list of labels, that guide nodes to be skipped, as a configuration option on mhc
bc5e772 to
ede40bb
Compare
|
closing due to inactivity. Please create a PR against https://github.com/openshift/enhancements if still relevant |
Documentation for the requirement to disable fencing using a special label set on the machine object