Merge master into layering#3060
Conversation
The call to TempDir a few lines above already created this directory, so this call to MkdirAll is completely unecessary
When we added the nodeip-configuration service for None platform deployments, we broke some existing users who were relying on the (largely undefined) previous behavior Kubelet used to select its node ip. While it is possible to work around this by overriding the node ip selection logic, that's very cumbersome and not an acceptable user experience. This change adds a KUBELET_NODEIP_HINT env variable that can be used to override the default behavior of runtimecfg when selecting a node ip. When the variable is unset, the old behavior of selecting an address on the interface of the default route will take effect. When the variable is set, its value will be passed to runtimecfg like a VIP for the IPI platforms. This will cause runtimecfg to prefer an address in the same subnet as the one provided in KUBELET_NODEIP_HINT. If no such address is found, it will fall back to the default route logic as before. KUBELET_NODEIP_HINT can be set using a systemd environment file. The file must be named /etc/default/nodeip-configuration with contents such as (replacing the IP as appropriate): KUBELET_NODEIP_HINT=192.0.2.1 This file should be created using a machine-config manifest that is passed to the installer so it will take effect on initial deployment. The node ip cannot be changed after the node registers initially so this cannot be done as a day 2 operation. Note that the IP specified in the hint does not necessarily need to exist in the environment, it just needs to be in the correct subnet. No traffic will be sent to this address. Co-authored-by: Dan Winship <danwinship@redhat.com>
The machine config controller did not previously have a metrics handler so one must be added in order for us to do any alerting/metrics work. This requires setting up: - Cluster Roles - Cluster Role Bindings - ServiceMonitor for metrics - Service for metrics - oauth-proxy sidecar to deploymentfor machine-config-controller - mcc-proxy-tls secret for machine-config-controller - metrics handler function in machine-config-controller common - Cluster Roles - Cluster Role Bindings - ServiceMonitor for metrics - Service for metrics - oauth-proxy sidecar to deploymentfor machine-config-controller - mcc-proxy-tls secret for machine-config-controller - metrics handler function in machine-config-controller common I cribbed off of: 557303f And then to add oauth: 3ab692f
Adds certificate helper functions to: - extract certificates from PEM bundles - find the certificate that has the latest expiry date when provided a list
Adds functionality to the node controller such that: 1.) when a paused machine config pool attempts to sync 2.) if the kubelet-ca has been updated in the pool's 'spec' config 3.) the MCC will set metric to the NotAfter date of the kube-apiserver-to-kubelet-signer certificate 5.) once the pool is unpaused, that metric will be reset to zero
Testutil package from the prometheus client used in the node_controller tests, needed to add as dependency. Commands run: ``` $ go mod tidy $ go mod vendor $ make verify ```
Adds an e2e test that steps through the rotation of the kubelet-apiserver-to-kubelet-signer by: - pausing a pool - rotating the certificate - checking that the proper metric is emitted - unpausing the pool - checking that the metric stops being emitted
Node controller now requires a MachineConfigInformer as part of its New() function, updates bootstrap_tests to match
As we now tear down and reconfigure br-ex on every reboot, we must provide a means to stabilize interface selection in scenarios with multiple default route interfaces. Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Signed-off-by: Andreas Karis <ak.karis@gmail.com>
Update controllerconfig CRD and relevant switch statements in pkg to handle Nutanix platform. Also Update install/0000_80_machine-config-operator_00_namespace.yaml Add `openshift-nutanix-infra` to list of namespaces.
Right now Fedora doesn't ship Go 1.17, only Go 1.18beta. That version emits a different error message for incompatible TLS versions. Adjust our unit test to handle both. (Also, a motivation for me is to cross-check the new CI configuration after openshift/release#27015 )
server/api_test: Adjust expected error message for Go 1.18
Created MCONamespace constant and used in all *.go files except for test/helpers/utils.go which would create a cyclic import
…-certificate Send alert when MCO can't safely apply updated Kubelet CA on nodes in paused pool
Remove the restriction on the runtime-request-timeout option in the kubeletconfig. Signed-off-by: Urvashi Mohnani <umohnani@redhat.com>
…nodes in paused pool"
…-74-controller-alert-certificate Revert "Send alert when MCO can't safely apply updated Kubelet CA on nodes in paused pool"
Resourcemerge did not previously merge a container's Resources.Requests in ensureContainer(), which meant that during upgrade cases where we update the container object directly with changes (instead of applying/re-applying the manifests), Resources.Requests changes would not propagate to the updated object. This makes ensureContainer update Resources.Requests if it has changed, which keeps that structure from getting scraped off when we update. ( Which will keep us from failing tests, since at least cpu and memory in that structure are required fields )
Make our resourcemerge fork update a container's Resources.Requests, un-revert openshift#2802
This will keep layered and non-layered update logging consistent
bootstrap_test.go: remove unused constants
The main motivation here is to work around coreos/rpm-ostree#3523 (Which is itself a workaround for a RHEL8 systemd bug) Basically this e2e is invoking `rpm-ostree kargs` in a pretty tight loop which triggers that bug. To read the kernel command line, we can just read `/proc/cmdline` instead. (Now, this is the *actual* cmdline instead of just rpm-ostree's view of it, but it should be fine)
…latform Add Nutanix Platform to Machine Config Operator
Today, typing `make` does nothing, which is not very useful. By listing this rule first, `make` will default to `make binaries`.
Fix description typo in osImageURL CRD parameter
e2e: Use `/proc/cmdline` instead of `rpm-ostree kargs`
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
retesting all of these (single node didnt bootstrap) |
|
ok ignore the single node comment above bc they apparently never worked in this branch, but the gcp-op failures seem.. not like flakes? for ex: |
|
also that bot report above seems wrong? |
|
/retest |
1 similar comment
|
/retest |
|
That's...weird, it's like that job somehow lost our override adding |
|
/retest |
1 similar comment
|
/retest |
|
@cgwalters: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
OK let's do #3126 first, then I think it may make sense to instead force-rebase layering on master. |
|
Trying a rebase of layering on current master, there are some conflicts to work through. Split out one bit in #3133 |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
@cgwalters: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Keeping up with things