Skip to content

Disable a machine from being fenced#332

Closed
emesika wants to merge 1 commit intoopenshift:masterfrom
emesika:disable-fencing-doc
Closed

Disable a machine from being fenced#332
emesika wants to merge 1 commit intoopenshift:masterfrom
emesika:disable-fencing-doc

Conversation

@emesika
Copy link
Copy Markdown

@emesika emesika commented Jun 24, 2019

Documentation for the requirement to disable fencing using a special label set on the machine object

@openshift-ci-robot openshift-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 24, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

Hi @emesika. Thanks for your PR.

I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 24, 2019
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign ingvagabund
You can assign the PR to them by writing /assign @ingvagabund in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment thread docs/proposals/disable-fencing.txt Outdated
Motivation
----------------

The administrator should be able to exclude nodes from being fenced.This will allow to track the machine manually without applying the machine fencing mechanism according to the system administrator considerations.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by the "fencing" term. In my understanding:
Fencing is the process of isolating a node to protect a cluster and its resources. Without fencing, a faulty node can cause data corruption in a cluster.

Your proposal looks like disabling machine healthchecking. is machine healthcking and fencing same thing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the terminology confusing too.

We talk about health checking (noticing a failure) and remediation (currently just deletion).

I believe the point of both health checking and the remediation is (a) to allow safely moving workloads from unhealthy nodes and (b) attempting to bring the node back into a healthy state. And so we say this is fencing.

But if we're going to use that term at all, we should use it consistently.

e.g. README.md says:

Machine healthcheck controller
Reconciles desired state for MachineHealthChecks by ensuring that machines targeted by machineHealthCheck objects are healthy or remediated otherwise

Would we be happy to change that to:

Machine healthcheck (fencing) controller
Provides node level fencing by checking that machines targeted by MachineHealthCheck objects are healthy (as indicated by NodeConditions) and attempting remediation of unhealthy machines to bring them back to a healthy state

Comment thread docs/proposals/disable-fencing.txt Outdated
@@ -0,0 +1,31 @@
Disable Machine Fencing
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would you need this at all since mhc targets only machines of your choice via label selector?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a maintenace controller, that should drain all VM's from the respective node, and also we do not want to fence it, because of it under the maintenance, but its temporary state and we do not want each time update node or MHC labels, because it easy to forgot to do it and disable fencing for the machine at all

Copy link
Copy Markdown
Member

@enxebre enxebre Jul 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this proposing exactly to use a label? can you please include that use casehttps://docs.google.com/document/d/10kauaJiXaWpvmd_qVsgIoZBNQLJMXMsQga3exV04ZTY/edit?ts=5d0b91e6#heading=h.4ifefbk4b4y2 so can be discussed

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one who will set the label is node-maintenance controller, we just want respect this label under MHC and do not fence node
Added under the document

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this feature is useful. In some cases, it maybe desired to skip specific machines.
Another approach could be to support list of labels, that guide nodes to be skipped, as a configuration option on mhc

@vikaschoudhary16
Copy link
Copy Markdown
Contributor

PTAL openshift/cluster-api#48

@emesika emesika force-pushed the disable-fencing-doc branch from bc5e772 to ede40bb Compare July 11, 2019 10:46
@enxebre
Copy link
Copy Markdown
Member

enxebre commented Nov 4, 2019

closing due to inactivity. Please create a PR against https://github.com/openshift/enhancements if still relevant

@enxebre enxebre closed this Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants