Skip to content

lxd/instance: use SSH by default, allow switching to exec#114

Merged
OddBloke merged 2 commits into
canonical:masterfrom
OddBloke:lxd_ssh
Feb 10, 2021
Merged

lxd/instance: use SSH by default, allow switching to exec#114
OddBloke merged 2 commits into
canonical:masterfrom
OddBloke:lxd_ssh

Conversation

@OddBloke
Copy link
Copy Markdown
Contributor

@OddBloke OddBloke commented Feb 5, 2021

Analysis performed (and captured in #112) indicates that accessing LXD
instances via SSH rather than via lxc exec leads to faster test
execution, and also brings LXD's default behaviour in line with the
default behaviour of other clouds.

This commit introduces a execute_via_ssh parameter and instance
variable, which are used by execute() to determine whether to connect
to the instance via SSH, or use lxc exec. The default value of
execute_via_ssh is True.

(Prior to this commit, we would execute via SSH but only if we had a key
manually set against the instance, leading to inconsistencies in
behaviour depending on local configuration. This also addresses that.)

Fixes: #112

@OddBloke
Copy link
Copy Markdown
Contributor Author

OddBloke commented Feb 5, 2021

The corresponding cloud-init changes are at canonical/cloud-init#802

@OddBloke
Copy link
Copy Markdown
Contributor Author

OddBloke commented Feb 5, 2021

Looks like I need some test changes; I think the code is good to review while I look at them.

@OddBloke
Copy link
Copy Markdown
Contributor Author

OddBloke commented Feb 5, 2021

Ah, and it also looks like I've introduced an error condition in the "default SSH key" codepath (which I don't take on my local machine), so I'll also need to address that.

@OddBloke OddBloke force-pushed the lxd_ssh branch 3 times, most recently from 613d69c to 55e1162 Compare February 9, 2021 19:51
And remove the parameterisation, as we only have a single parameter set.
@OddBloke OddBloke force-pushed the lxd_ssh branch 5 times, most recently from 051bc2f to 8f1a526 Compare February 9, 2021 20:42
Analysis performed (and captured in canonical#112) indicates that accessing LXD
instances via SSH rather than via `lxc exec` leads to faster test
execution, and also brings LXD's default behaviour in line with the
default behaviour of other clouds.

This commit introduces a `execute_via_ssh` parameter and instance
variable, which are used by `execute()` to determine whether to connect
to the instance via SSH, or use `lxc exec`.  The default value of
`execute_via_ssh` is `True`.

(Prior to this commit, we would execute via SSH but only if we had a key
manually set against the instance, leading to inconsistencies in
behaviour depending on local configuration.  This also addresses that
inconsistency, and removes code that was required to support it.)

Fixes: canonical#112
@OddBloke OddBloke marked this pull request as ready for review February 9, 2021 21:45
Comment thread pycloudlib/lxd/cloud.py
local_path=local_path
)

with open(local_path, "r") as f:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this code is not optimal at all, but the reason I have added it here is that we could change the ssh keys here, but still have launched instances that are using an old key. In that situation, we would provided an unusable instance.

However, I know that this is not optimal because, for example, if we can't exec into the machine, that pull_file call will not work. So maybe we can drop this check and live with this possibility.

But if you think there is a smarter way to handle that, no problem at all

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment, this is the code I was hoping to discuss! You're correct, the issue I found is that we don't necessarily have exec access when this runs (and, indeed, we definitely don't when this runs in the demo scripts, which was causing CI failure). Adding an instance._wait_for_execute() call in get_instance doesn't seem like the right thing to do: if the instance isn't expected to be accessible, that would cause a long delay.

I think this logic is less necessary than it was previously, however, because instance.key_pair no longer determines our execute behaviour: even if instance.key_pair is None, we will still attempt to SSH in (using any keys available to paramiko).

However, I don't think we can say it's entirely unnecessary: while we don't need it to determine the SSH key to use (and, actually, never did because paramiko will try all keys by default), it did serve a purpose previously: if SSH execution could work (determined via exec), the LXDInstance would be configured to execute via SSH instead of via exec, otherwise it would be configured to use exec. We lose that behaviour here: we will always return a LXDInstance configured to execute via SSH.

cloud-init's tests only have a single place which uses get_instance (when CLOUD_INIT_EXISTING_INSTANCE_ID is set). It actually has a similar issue to the one we're trying to solve here: previously with a local SSH key configured, it would have always configured SSH access (even to instances for which SSH wouldn't work: the existence of an authorized key does not indicate, e.g., working networking), and would otherwise have always configured exec access. I need to update canonical/cloud-init#802 to handle things differently: we should set instance.execute_via_ssh after the instance has been instantiated. We would still not require the "execution method auto-detection" which is being removed here.

Does UA client have test code which would be affected by this removal? Or can we safely remove it (and modify the docstring here to explain the execution behaviour)?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not known about the fact that paramiko would try to access the machine with all of the keys that is has seen. If that is the case, I don't see any problem removing the code and it should not affect uaclient at all.

Thanks for all the provided context @OddBloke :)

@OddBloke OddBloke merged commit d33dc15 into canonical:master Feb 10, 2021
@OddBloke OddBloke deleted the lxd_ssh branch February 10, 2021 19:21
OddBloke added a commit to OddBloke/cloud-init that referenced this pull request Feb 11, 2021
pycloudlib has modified the way LXD executes tests
(canonical/pycloudlib#114): it will always use
SSH to access them by default, instead of using `lxc exec`.  This
behaviour is transparent for them majority of cloud-init's integration
tests, but some currently depend on using `lxc exec` to access instances
with (intentionally) broken networking: obviously these are not
accessible via SSH.

pycloudlib retains support for switching an instance to use `lxc exec`.
This commit introduces the `lxd_use_exec` mark, which tests can use to
indicate to the integration testing framework that they should be so
switched, and applies it to all applicable tests.
OddBloke added a commit to OddBloke/cloud-init that referenced this pull request Feb 18, 2021
pycloudlib has modified the way LXD executes tests
(canonical/pycloudlib#114): it will always use
SSH to access them by default, instead of using `lxc exec`.  This
behaviour is transparent for them majority of cloud-init's integration
tests, but some currently depend on using `lxc exec` to access instances
with (intentionally) broken networking: obviously these are not
accessible via SSH.

pycloudlib retains support for switching an instance to use `lxc exec`.
This commit introduces the `lxd_use_exec` mark, which tests can use to
indicate to the integration testing framework that they should be so
switched, and applies it to all applicable tests.
OddBloke added a commit to canonical/cloud-init that referenced this pull request Feb 22, 2021
pycloudlib has modified the way LXD executes tests
(canonical/pycloudlib#114): it will always use
SSH to access them by default, instead of using `lxc exec`.  This
behaviour is transparent for them majority of cloud-init's integration
tests, but some currently depend on using `lxc exec` to access instances
with (intentionally) broken networking: obviously these are not
accessible via SSH.

pycloudlib retains support for switching an instance to use `lxc exec`.
This commit introduces the `lxd_use_exec` mark, which tests can use to
indicate to the integration testing framework that they should be so
switched, and applies it to all applicable tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LXD connections are attempted via SSH iff a non-default key_pair is specified

3 participants