fix: working systemd monitor jobs#3788
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3788 +/- ##
=======================================
Coverage 73.19% 73.20%
=======================================
Files 148 148
Lines 25394 25403 +9
=======================================
+ Hits 18587 18596 +9
Misses 5671 5671
Partials 1136 1136
Continue to review full report at Codecov.
|
089b5ab to
21098b8
Compare
| { | ||
| "name": "kubernetes-dashboard", | ||
| "enabled": true | ||
| "enabled": false |
There was a problem hiding this comment.
these are temporary changes while we work on reducing customData size
| NODE_INDEX=$(hostname | tail -c 2) | ||
| NODE_NAME=$(hostname) | ||
| PRIVATE_IP=$(hostname -I | cut -d' ' -f1) | ||
| PRIVATE_IP=$(hostname -i | cut -d' ' -f1) |
There was a problem hiding this comment.
It's unclear to me why we're using -I (get me all interfaces) vs -i get me my primary interface...
There was a problem hiding this comment.
Note that the ip address here (-i) is a resolved IP address based on the host name. Is that what you want?
The hostname -I returns all ip addresses without DNS resolution. This means you get things like loopback, etc.
There was a problem hiding this comment.
Note that I would look into another way to get the right IP address for the master (like, it must be known somewhere already)
21098b8 to
134d3f2
Compare
| @@ -1,7 +1,7 @@ | |||
| [Unit] | |||
There was a problem hiding this comment.
I would advocate we eliminate this timer spec. It adds complexity to the overall implementation, and the way we're implementing the docker health check (docker ps) should not be "racy" given that the health check script runs only after the docker systemd service has started (see After=docker.service above)
@Michael-Sinz @mboersma thoughts?
There was a problem hiding this comment.
I think what we're implicitly saying by delaying things for 30 mins is that we are tolerant of docker ps repeatedly failing during the first 30 mins of boot time, which I don't think is defensible.
There was a problem hiding this comment.
I think providing a warmup period for Docker to get its act together was basically what Azure/acs-engine#4050 was about. I agree simpler is way better when it comes to systemd units especially, so l'm ok with removing it to see if it's unneeded now.
There was a problem hiding this comment.
After further investigation, it is a bit tricky to tell a systemd service "wait until this service starts, and also until it is fully activated; and so I've moved the delay into the health script itself. Arguably that is less complicated than maintaining systemd overhead to do that.
| {{- end}} | ||
| systemctlEnableAndStart kubelet || exit {{GetCSEErrorCode "ERR_KUBELET_START_FAIL"}} | ||
| wait_for_file 1200 1 /etc/systemd/system/kubelet-monitor.service || exit {{GetCSEErrorCode "ERR_FILE_WATCH_TIMEOUT"}} | ||
| systemctlEnableAndStart kubelet-monitor || exit {{GetCSEErrorCode "ERR_KUBELET_START_FAIL"}} |
There was a problem hiding this comment.
Also (continuing along the systemd relationship conversation), we are as a rule blocking the startup of the monitor jobs on the jobs themselves starting successfully, as exemplified by this kubelet-monitor start operation
| DOCKER_VERSION=1.13.1-1 | ||
| NVIDIA_CONTAINER_RUNTIME_VER=2.0.0 | ||
| NVIDIA_DOCKER_SUFFIX=docker18.09.2-1 | ||
| PRIVATE_IP=$(ip -4 addr show eth0 | grep -Po '(?<=inet )[\d.]+') |
There was a problem hiding this comment.
This is a new, common "what is my private IP address?" implementation to be shared across the various places where that runtime determination is needed.
eth0 always has this information at first boot; in an Azure CNI configuration, an "azure0" bridge interface is set up with the primary NIC IP address (and the eth0 interface no longer containes it)
In the edge case scenario where none of those exists, we fall back to the IP address that the hostname resolves to. The reason we don't want to do that primarily is because we don't want to rely upon DNS lookups.
| #!/bin/bash | ||
| NODE_INDEX=$(hostname | tail -c 2) | ||
| NODE_NAME=$(hostname) | ||
| PRIVATE_IP=$(hostname -I | cut -d' ' -f1) |
There was a problem hiding this comment.
This var assignment has been moved into the cse_helpers.sh instead for more general purpose usage, and the etcd vars are moved down to local vars inside the funcs that use them
| sysctl_reload 10 5 120 || exit {{GetCSEErrorCode "ERR_SYSCTL_RELOAD"}} | ||
| wait_for_file 1200 1 /etc/default/kubelet || exit {{GetCSEErrorCode "ERR_FILE_WATCH_TIMEOUT"}} | ||
| wait_for_file 1200 1 /var/lib/kubelet/kubeconfig || exit {{GetCSEErrorCode "ERR_FILE_WATCH_TIMEOUT"}} | ||
| if [[ -n ${MASTER_NODE} ]]; then |
There was a problem hiding this comment.
This one-time VMSS master-specific foo has been moved into the bootstrap script, and out of the shell script that runs every time that the kubelet systemd service starts.
| DOCKER_VERSION=1.13.1-1 | ||
| NVIDIA_CONTAINER_RUNTIME_VER=2.0.0 | ||
| NVIDIA_DOCKER_SUFFIX=docker18.09.2-1 | ||
| PRIVATE_IP=$( (ip -br -4 addr show eth0 || ip -br -4 addr show azure0) | grep -Po '\d+\.\d+\.\d+\.\d+') |
There was a problem hiding this comment.
This code does the following:
- Get me the IP addresses on eth0
- Or, if there aren't any, get me the IP addresses on azure0
- If there isn't exactly 1 IP address from trying those, then just get the IP address that the hostname DNS entry returns
#3 is undesirable as it relies upon functional DNS, so we only do it in a fallback scenario
We consider the possibility that eth0 won't have the desired IP address in order to make this solution resilient during the lifecycle of the VM: in Azure CNI configuration scenarios the IP address will be attached to the "azure0" interface when the first (non-hostNetwork) pod is scheduled onto the node that the VM represents (that's how Azure CNI routes container traffic out of the VM).
There was a problem hiding this comment.
Which one is priority? The azure0 or the eth0?
If azure0 is there then eth0 should not be, then would the order be better first check azure0 and then eth0?
(Unclear from your comment)
There was a problem hiding this comment.
When we define the VM in the ARM template, we declare a primary IP address on a single NIC, so we expect it to be reflected in eth0. Because Azure CNI creates a bridge "azure0" interface and "takes over" the IP address from eth0, that's where we have to look (if we want to look locally and not rely upon DNS) for the primary IP address of the host.
An additional consideration is that Ubuntu 18.04-LTS's cloud-init implementation has a bug in that it doesn't respect the ARM template-declared primary IP address when the spec also includes secondary IP addresses, and so eth0 will have more than one IP address (those secondary IP addresses are for Azure CNI to use in the container networking layer, not the host layer, so eth0 should only ever have 1 IP address). To deal with that edge case (e.g., 18.04-LTS on the first boot, before the cloud-init network config override has been applied) we evaluate the number of IP addresses returned (the | grep -c '^' part), and if it's not exactly 1, then we just fallback to relying upon DNS.
| Restart=always | ||
| RestartSec=10 | ||
| RemainAfterExit=yes | ||
| Environment=CONTAINER_RUNTIME={{GetContainerRuntime}} |
There was a problem hiding this comment.
This service is now overloaded to support both moby or containerd
| echo "" >> /etc/environment | ||
| fi | ||
| {{- if IsMasterVirtualMachineScaleSets}} | ||
| source {{GetCSEHelpersScriptFilepath}} |
There was a problem hiding this comment.
This source statement gets us the $PRIVATE_IP var we need
| env azure.Environment | ||
| azureClient *armhelpers.AzureClient | ||
| firstMasterRegexStr = fmt.Sprintf("^%s-", common.LegacyControlPlaneVMPrefix) | ||
| firstMasterRegexStr = fmt.Sprintf("^%s-.*-0", common.LegacyControlPlaneVMPrefix) |
There was a problem hiding this comment.
If you look hard enough you find interesting things
07b12c2 to
eadf7ac
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jackfrancis, mboersma The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Reason for Change:
This PR updates the implementation of the various systemd monitor jobs so that the following critical services are monitored for failure, and restarted:
Issue Fixed:
Requirements:
Notes: