-
Notifications
You must be signed in to change notification settings - Fork 231
4019 fencing backport mhc external remediation template #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -80,6 +80,15 @@ type MachineHealthCheckSpec struct { | |
| // +kubebuilder:validation:Pattern="^([0-9]+(\\.[0-9]+)?(ns|us|µs|ms|s|m|h))+$" | ||
| // +kubebuilder:validation:Type:=string | ||
| NodeStartupTimeout metav1.Duration `json:"nodeStartupTimeout,omitempty"` | ||
|
|
||
| // RemediationTemplate is a reference to a remediation template | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we sure this does not require a whole new set of RBAC permissions on our side?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How will the controller be able to create new objects from the template? Who is going to be adding the appropriate RBAC? Is that going to be part of the default set of RBAC or will another component grant permission to the MHC service account in their own way?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was no change in this logic from upstream.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just because a CRD exists, doesn't mean a user has permission to create resources 😉 Something will need to give the service account for the Machine Health Check controller permissions to be able to create these resources, it could be that we just add those to the defaults (which I think is perfectly acceptable), but to do this we need to know ahead of time all of the types that could possible be referred to here
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
IMO forcing all external remediations into a single API group will limit us severely. This means that only operators built by OpenShift and maintained by OpenShift would be able to be used, ie, no community remediation templates, if that's not an issue then sure, we can go down this route
I think my preference would be this kind of approach, it would be consistent across both openshift and community remediation methods at least
Yes I appreciate that and definitely want it to be generic, but that doesn't fit RBAC very well, so we need to come up with some solution of allowing the MHC permission for the groups that are appropriate.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Good point.
The service account could be different in different distributions, isn't it? for example, MHC SA could be Also, I'm not sure if someone will be able to remember that changing MHC service account breaks other external components. Maybe remediators should expose some config value. it could be either the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a new idea came up to my mind - what about just having a non openshift api group which is specifically for external remediation? WDYT?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just wanted to add another idea here. we could create the default build with a tightly focused api group for the external remediation template, but give the user clear instructions on how to override/expand that initial group. basically as a day 2 operation they could create their own special EMRs and load those in, but by default it would work "out of the box" with a reasonable setting.
+1, i think this is saying something similar this situation is also similar to how we have modified the cluster-api provider in the cluster-autoscaler. it's a different problem there, but perhaps a similar solution. we compile the autoscaler to know about our resrouce groups, but it is possible for the autoscaler to be reconfigured by environment variable to use a different api group. this allows us to share code with upstream capi and also keep our openshift types for when we deploy.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @elmiko We discussed this issue with the CAPI community, and CAPI is already using RBAC aggregated label which allows external components to add their RBAC rules. I believe this is an elegant way to solve this problem and it's always good to be aligned with upstream. |
||
| // provided by an infrastructure provider. | ||
| // | ||
| // This field is completely optional, when filled, the MachineHealthCheck controller | ||
| // creates a new object from the template referenced and hands off remediation of the machine to | ||
| // a controller that lives outside of Machine API Operator. | ||
| // +optional | ||
| RemediationTemplate *corev1.ObjectReference `json:"remediationTemplate,omitempty"` | ||
| } | ||
|
|
||
| // UnhealthyCondition represents a Node condition type and value with a timeout | ||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to remove these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
They weren't being used in MAO so I assumed it is a pretty safe bet to remove them.
Of course I could be wrong here - so looking for cloud team input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these changes should at least belong to a separate commit or a different PR. Agree - we don't use these values, we don't have and planning to have cluster resource in MAPI, yet it makes hard to approve such change.