OCPBUGS-43309: The "oc adm ocp-certificates regenerate-machine-config-server-serving-cert" command is failing#1900
Conversation
|
@djoshy: This pull request references Jira Issue OCPBUGS-43309, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
ardaguclu
left a comment
There was a problem hiding this comment.
Dropped a few comments. I haven't checked the CI failures related or not
/retest
| func (o *RegenerateMCOOptions) ensureMCSSecretType(c *kubernetes.Clientset, ctx context.Context) error { | ||
| // Retrieve the machine-config-server-tls secret | ||
| mcsTLSSecret, err := c.CoreV1().Secrets(mcoNamespace).Get(ctx, mcsTlsSecretName, metav1.GetOptions{}) | ||
| if err != nil { |
There was a problem hiding this comment.
I assume that notfound error for this secret is not expected?
There was a problem hiding this comment.
It is not expected(nodes cannot join the cluster if this secret doesn't exist in the MCO namespace), but good callout - I think it might be good if the command accounted for this case
| Data: mcsTLSSecret.Data, | ||
| Type: corev1.SecretTypeTLS, | ||
| } | ||
| if _, err := c.CoreV1().Secrets(mcoNamespace).Create(ctx, newSecret, metav1.CreateOptions{}); err != nil { |
There was a problem hiding this comment.
What is the consequence of deletion succeeds but creation fails. Because in that case we won't have this secret anymore?. Besides, we are not handling not found errors. Thus, this command will always start failing?.
There was a problem hiding this comment.
Nothing catastrophic. This secret will be created or updated by the cert controller's sync which is manually called later in this command:
What seems to be problematic is, if the secret fed into the controller is not of the kubernetes.io/tls type, it causes the cert controller sync to fail. This wasn't the case before the 1.31 rebase, but it seems like they've made it more strict now.
I have added handling for the not found error, but if we are missing this secret, the cluster may have bigger problems. Also, just for some additional context, this command is used to manually rotate a cert that has a 10 year lifecycle. We are hoping to automate this rotation in the future, but our use rate for this command is quite low at the moment.
|
/retest-required |
|
Thank you |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ardaguclu, djoshy The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Holding for QE review /hold /retest-required |
|
Verified using IPI on AWS /label cherry-pick-approved |
|
@sergiordlr: Can not set label cherry-pick-approved: Must be member in one of these teams: [] DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/unhold |
|
@djoshy: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@djoshy: Jira Issue OCPBUGS-43309: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-43309 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-cli |
|
[ART PR BUILD NOTIFIER] Distgit: ose-tools |
|
[ART PR BUILD NOTIFIER] Distgit: openshift-enterprise-deployer |
|
[ART PR BUILD NOTIFIER] Distgit: ose-cli-artifacts |
Closes: OCPBUGS-43309
This ensures that the machine-config-server-tls secret is of the
kubernetes.io/tlstype before attempting to rotate them.This is only an issue in 4.18+, as the vendored cert controller package was updated during the 1.31.1 rebase. (#1877, diff)
In the current master of library-go, the cert controller only permits secrets of the type
kubernetes.io/tlsto be rotated, and therefore this "preflight check" is necessary before running the controller sync.