Skip to content

[skip-ci] Allow make push to replace and relaunch Appliance components#7233

Merged
gigawhitlocks merged 18 commits intovmware:masterfrom
gigawhitlocks:hackathon
Feb 7, 2018
Merged

[skip-ci] Allow make push to replace and relaunch Appliance components#7233
gigawhitlocks merged 18 commits intovmware:masterfrom
gigawhitlocks:hackathon

Conversation

@gigawhitlocks
Copy link
Contributor

Continuing from #7164 sorry, comments from that PR should be addressed at this time, except for a small modification to systemd in the appliance, which I'll have to modify and test. Resolves #7167

@gigawhitlocks gigawhitlocks changed the title Allow make push to replace and relaunch Appliance components [skip-ci] Allow make push to replace and relaunch Appliance components Feb 1, 2018
@sgairo
Copy link
Contributor

sgairo commented Feb 1, 2018

I think its [skip ci] for future reference

@hickeng
Copy link
Contributor

hickeng commented Feb 2, 2018

@gigawhitlocks I think this comment may have been meant for this PR: #7170 (comment)

If it's relevant, please note. If not I've not idea who I was writing that for.

Copy link
Contributor

@vburenin vburenin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't wait to get it in!


# Grab vSphere's thumbprint for calling vic-machine
function get-thumbprint () {
govc about.cert | grep "SHA-1 Thumbprint" | awk '{print $NF}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

govc about.cert -k -thumbprint gives you the thumbprint undercoated so you don't need the grep/awk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice!

Copy link
Contributor Author

@gigawhitlocks gigawhitlocks Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still prints the IP so I do need awk, but not grep


# Run the command given on the VCH instead of locally
function on-vch() {
sshpass -ppassword ssh -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no root@$VCH_IP -C $@ 2>/dev/null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use the vic-machine debug --authorized-key option?
It's a path I don't think we test sufficiently but would also avoid hard coding dev passwords into the scripts.



if [[ ! -v VIC_ID ]] && [[ $(vch-name-is-ambiguous) ]]; then
echo "The provided VIC name is ambiguous; please choose the correct VCH ID from the output below and assign it to the environment variable VIC_ID, e.g., export VIC_ID=12"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the script is interactive - if it is then use read -p "Enter VCH ID: " VIC_ID to get the variable from the user.

From a consistency perspective I suggest renaming that variable to be VCH_ID instead of VIC_ID

done
fi

on-vch vic-init &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really shouldn't be necessary unless configured at a debug level that causes vic-init to exit instead of relaunching components (think that's debug>=3)

I'm not sure what will happen here, but there is a risk of multiple vic-init's running - I don't know that there's anything that prevents that.

I'd suggest using systemctl restart vic-init.service to perform a restart if necessary.
I'd also ask you to follow up on why the restart is needed.

@gigawhitlocks
Copy link
Contributor Author

Apologies @hickeng I moved everything around and made a bunch of improvements, in addition to addressing your concerns

govc about.cert -k -thumbprint | awk '{print $NF}'
}

# SCPs the component in $1 to the VCH, plops it in place, and brutally kills the previous running process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This this comment is intended for replace-component

scp -oUserKnownHostsFile=/dev/null -oStrictHostKeyChecking=no $VIC_DIR/bin/$1 root@$VCH_IP:/tmp/$1 2>/dev/null
on-vch mv /sbin/$1 /tmp/old-$1
on-vch mv /tmp/$1 /sbin/$1
on-vch chmod 755 /sbin/$1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest doing this first before the mv /sbin/$1 /tmp/old-$1 - that way there's no window in which vic-init will try and fail to execute it.


function replace-components () {
echo "Killing vic-init (and not taking no for an answer)..."
on-vch systemctl kill vic-init
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still wanting an answer as to why this is necessary - vic-init should be restarting dead components.
If you still observe that not occurring with debug <= 1 please raise an issue for it with /var/log/vic/init.log attached that showcases the failure to restart.

}

function get-thumbprint() {
govc about.cert -k -thumbprint | awk '{print $NF}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want pipefail and or $? check for govc.

$ govc about.cert -k -thumbprint | awk '{print $NF}'
govc: dial tcp 192.168.78.128:443: getsockopt: no route to host

Copy link
Contributor Author

@gigawhitlocks gigawhitlocks Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly speaking govc ls is called first in the script and we'll have failed by the time we do a govc about.cert if govc can't communicate, but I think pipefail should just be on in the whole script, I was already thinking about that (enabled pipefail in the most recent commit as of time of writing)

@gigawhitlocks
Copy link
Contributor Author

@mhagen-vmware apparently you have dominion over infra/scripts

Copy link
Contributor

@mhagen-vmware mhagen-vmware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

replace-component $x
done

# George swears you don't need to do this but if you don't, the new binaries are not launched
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

George still swears you don't need to do this and wants logs from it not happening so we can track down WHY.
As written in this PR it's likely because the component processes haven't been killed after the binary is replaced.

if [[ $1 == "vic-init" ]]; then
on-vch systemctl restart vic-init
else
on-vch kill -9 $pid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When #6943 can you remind me that it should update this to use kill -TERM. That will cause vSphere sessions to be cleaned up properly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

roger that

on-vch chmod 755 /tmp/$1
on-vch mv /tmp/$1 /sbin/$1
if [[ $1 == "vic-init" ]]; then
on-vch systemctl restart vic-init
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to look at the effect of this with #6943 - it's possible that this would cause all components to be shutdown as well which is not what we're after.

@gigawhitlocks gigawhitlocks merged commit f67dd8b into vmware:master Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants