[skip-ci] Allow make push to replace and relaunch Appliance components#7233
[skip-ci] Allow make push to replace and relaunch Appliance components#7233gigawhitlocks merged 18 commits intovmware:masterfrom
make push to replace and relaunch Appliance components#7233Conversation
make push to replace and relaunch Appliance componentsmake push to replace and relaunch Appliance components
|
I think its [skip ci] for future reference |
|
@gigawhitlocks I think this comment may have been meant for this PR: #7170 (comment) If it's relevant, please note. If not I've not idea who I was writing that for. |
vburenin
left a comment
There was a problem hiding this comment.
Can't wait to get it in!
|
|
||
| # Grab vSphere's thumbprint for calling vic-machine | ||
| function get-thumbprint () { | ||
| govc about.cert | grep "SHA-1 Thumbprint" | awk '{print $NF}' |
There was a problem hiding this comment.
govc about.cert -k -thumbprint gives you the thumbprint undercoated so you don't need the grep/awk.
There was a problem hiding this comment.
still prints the IP so I do need awk, but not grep
|
|
||
| # Run the command given on the VCH instead of locally | ||
| function on-vch() { | ||
| sshpass -ppassword ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no root@$VCH_IP -C $@ 2>/dev/null |
There was a problem hiding this comment.
could we use the vic-machine debug --authorized-key option?
It's a path I don't think we test sufficiently but would also avoid hard coding dev passwords into the scripts.
|
|
||
|
|
||
| if [[ ! -v VIC_ID ]] && [[ $(vch-name-is-ambiguous) ]]; then | ||
| echo "The provided VIC name is ambiguous; please choose the correct VCH ID from the output below and assign it to the environment variable VIC_ID, e.g., export VIC_ID=12" |
There was a problem hiding this comment.
Check if the script is interactive - if it is then use read -p "Enter VCH ID: " VIC_ID to get the variable from the user.
From a consistency perspective I suggest renaming that variable to be VCH_ID instead of VIC_ID
| done | ||
| fi | ||
|
|
||
| on-vch vic-init & |
There was a problem hiding this comment.
This really shouldn't be necessary unless configured at a debug level that causes vic-init to exit instead of relaunching components (think that's debug>=3)
I'm not sure what will happen here, but there is a risk of multiple vic-init's running - I don't know that there's anything that prevents that.
I'd suggest using systemctl restart vic-init.service to perform a restart if necessary.
I'd also ask you to follow up on why the restart is needed.
33b00ca to
770419a
Compare
|
Apologies @hickeng I moved everything around and made a bunch of improvements, in addition to addressing your concerns |
| govc about.cert -k -thumbprint | awk '{print $NF}' | ||
| } | ||
|
|
||
| # SCPs the component in $1 to the VCH, plops it in place, and brutally kills the previous running process |
There was a problem hiding this comment.
This this comment is intended for replace-component
| scp -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no $VIC_DIR/bin/$1 root@$VCH_IP:/tmp/$1 2>/dev/null | ||
| on-vch mv /sbin/$1 /tmp/old-$1 | ||
| on-vch mv /tmp/$1 /sbin/$1 | ||
| on-vch chmod 755 /sbin/$1 |
There was a problem hiding this comment.
suggest doing this first before the mv /sbin/$1 /tmp/old-$1 - that way there's no window in which vic-init will try and fail to execute it.
|
|
||
| function replace-components () { | ||
| echo "Killing vic-init (and not taking no for an answer)..." | ||
| on-vch systemctl kill vic-init |
There was a problem hiding this comment.
still wanting an answer as to why this is necessary - vic-init should be restarting dead components.
If you still observe that not occurring with debug <= 1 please raise an issue for it with /var/log/vic/init.log attached that showcases the failure to restart.
| } | ||
|
|
||
| function get-thumbprint() { | ||
| govc about.cert -k -thumbprint | awk '{print $NF}' |
There was a problem hiding this comment.
might want pipefail and or $? check for govc.
$ govc about.cert -k -thumbprint | awk '{print $NF}'
govc: dial tcp 192.168.78.128:443: getsockopt: no route to host
There was a problem hiding this comment.
strictly speaking govc ls is called first in the script and we'll have failed by the time we do a govc about.cert if govc can't communicate, but I think pipefail should just be on in the whole script, I was already thinking about that (enabled pipefail in the most recent commit as of time of writing)
|
@mhagen-vmware apparently you have dominion over |
| replace-component $x | ||
| done | ||
|
|
||
| # George swears you don't need to do this but if you don't, the new binaries are not launched |
There was a problem hiding this comment.
George still swears you don't need to do this and wants logs from it not happening so we can track down WHY.
As written in this PR it's likely because the component processes haven't been killed after the binary is replaced.
| if [[ $1 == "vic-init" ]]; then | ||
| on-vch systemctl restart vic-init | ||
| else | ||
| on-vch kill -9 $pid |
There was a problem hiding this comment.
When #6943 can you remind me that it should update this to use kill -TERM. That will cause vSphere sessions to be cleaned up properly
| on-vch chmod 755 /tmp/$1 | ||
| on-vch mv /tmp/$1 /sbin/$1 | ||
| if [[ $1 == "vic-init" ]]; then | ||
| on-vch systemctl restart vic-init |
There was a problem hiding this comment.
We might need to look at the effect of this with #6943 - it's possible that this would cause all components to be shutdown as well which is not what we're after.
Continuing from #7164 sorry, comments from that PR should be addressed at this time, except for a small modification to systemd in the appliance, which I'll have to modify and test. Resolves #7167