[OCPCLOUD-1106] Add afterburn task to update AWS hostname to match instance metadata#2401
Conversation
506bf60 to
8dc8532
Compare
|
Hi Danil, What is the context of this PR? I don't believe any of the cloud platforms today use afterburn directly to acquire their hostname (other than GCP to work around GCP hostname truncating issues). Is there a bug you are trying to resolve? |
|
Is this related to the cloud provider plugin requiring that the hostname and the node name match for admission to the cluster? I know this is a problem for VMware and if we have to do this for AWS....I fear we'll be doing it for the other clouds too. |
|
What happened is that I tried an automated reconfiguration for kubelet with |
8dc8532 to
bed87a5
Compare
|
/retest |
There was a problem hiding this comment.
/proc/sys/kernel/hostname is the default and only place where golang os.Hostname used in kubelet will gain the value. node-valid-hostname.service hostnamectl call fails to update current hostname in running machine, so there is a manual copy.
Could not set property: Connection timed out
There was a problem hiding this comment.
@rphillips Here is an issue with hostnamectl I observed, re: #2401 (comment)
|
@darkmuggle @yuqi-zhang PTAL. |
|
/retest |
|
We need to fix this in the external cloud providers. There are too many platforms this can effect. |
|
@rphillips I tested it in GCP, Azure, migration works fine on those platforms. vSphere has a fix, same for OpenStack, so it is possible this is the only place left to be changed. We can't really fix it for There is no way for standard |
|
/retest |
@darkmuggle I tested it in other clouds, does not seem to be the case. Only AWS is affected. So there is a fix. |
|
Upstream issue: kubernetes/kubernetes#70897 |
|
We should use the following script/function to set the hostname, which uses hostnamectl: |
|
The script gets installed at |
bed87a5 to
e56f075
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Danil-Grigorev The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@rphillips Your feedback should be addressed now, could you take a look? The script is now using |
|
@darkmuggle @yuqi-zhang PTAL. AWS provider is the only one affected and as @rphillips pointed out it is coreOS issue- kubernetes/kubernetes#70897. They propose to use afterburner for managing such tasks, which is this PR exactly doing. |
|
Hi, sorry for the late comment, I'm not the most comfortable with hostnames so I defer to @darkmuggle if he has the time to take a look. Just a quick question, was comparing this to: That service appears to not run on firstboot. Should this do the same? |
|
The change looks fine, it's consistent with other platforms. |
|
i was able to manually run the commands from the afterburn-hostname, and those did work: [root@ip-10-0-152-55 sbin]# source /usr/local/sbin/set-valid-hostname.sh
[root@ip-10-0-152-55 sbin]# set_valid_hostname `cat /run/afterburn.hostname`
exit
[core@ip-10-0-152-55 sbin]$ echo $?
0
[core@ip-10-0-152-55 sbin]$ hostname
ip-10-0-152-55.us-east-2.compute.internalnot sure why this is failing during boot though. perhaps there is another dependency we need to wait on, or add more retries? |
e56f075 to
debdd84
Compare
cgwalters
left a comment
There was a problem hiding this comment.
Also, this needs to do the same thing as we're doing on GCP here: https://github.com/openshift/machine-config-operator/blob/master/templates/common/gcp/files/etc-networkmanager-conf.d-hostname.yaml
I guess just copy that file into templates/common/aws, longer term it'd be good to dedup them.
Otherwise the hostname change may get reverted when NM renews the DHCP lease.
|
i created a small patch on top of this pr based off of @cgwalters suggestions and it seems to be working well for me. when i install a cluster with this patch in place, i see machines joining and nodes created with the proper names. here is an example of the CSRs after bootstrap: |
debdd84 to
a20b5b9
Compare
elmiko
left a comment
There was a problem hiding this comment.
just a minor nit, but otherwise this looks good and i know it works ;)
cgwalters
left a comment
There was a problem hiding this comment.
LGTM.
(At some point I am curious around the larger backstory on why in AWS DHCP doesn't give us the hostname we need)
108035e to
55478e9
Compare
55478e9 to
1e2c932
Compare
- Add AWS support to usr-local-bin-mco-hostname.yaml based on @elmiko implementation Co-authored-by: Michael McCune <msm@opbstudios.com>
1e2c932 to
cef5683
Compare
kikisdeliveryservice
left a comment
There was a problem hiding this comment.
and adding lgtm for Ben & Colin's reviews
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, Danil-Grigorev, darkmuggle, kikisdeliveryservice The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e-agnostic-upgrade |
|
@Danil-Grigorev: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/test e2e-agnostic-upgrade |
- What I did
During kubelet migration from
--cloud-provider=awsto--cloud-provider=externalkubelet determinesNodename by the hostname of the machine where it is running. AWS hostname is mismatched, usually it is a reversed DNS mapping of machine ip, likeip-10-0-195-176. This results in node name mismatch in kubelet, and API server admission plugin rejects changes for the object.This fixes the issue.
- How to verify it
<region>.compute.internaleven when the cloud-provider is set toexternal.- Description for the changelog