Add vm support to lxd#40
Conversation
b8d6ddf to
0b2b8b5
Compare
blackboxsw
left a comment
There was a problem hiding this comment.
Thanks @lucasmoura for the first pass here. A number of comments and cleanup to look at
| profile_list = [profile_name] | ||
|
|
||
| if use_ssh: | ||
| error_msg = ( |
There was a problem hiding this comment.
Probably don't need to define this variable in this context. let's move it down inside if user_data_str and search_pattern.match(user_data_str) is None: below
There was a problem hiding this comment.
I have updated that part of the code, since I realized my checks here were not making much sense in the context of an user providing its own ssh keys to issue commands.
| user_data_str = user_data or config_dict.get("user.user-data") | ||
|
|
||
| if user_data_str and search_pattern.match(user_data_str) is None: | ||
| raise Exception(error_msg) |
There was a problem hiding this comment.
We should avoid using generic Exceptions whenever possible. This should be a ValueError instead
| name=None, ephemeral=False, network=None, storage=None, | ||
| profile_list=None, config_dict=None, **kwargs): | ||
| profile_list=None, config_dict=None, is_vm=False, | ||
| use_ssh=False, **kwargs): |
There was a problem hiding this comment.
use_ssh seems a fairly specific thing to surface at launch time with a parameter for a couple of reasons:
- In our FIPS use-case I don't expect for the lifetime of an instance we will exclusively use ssh or `lxc exec. I'd like the ability to switch between access modes for different tests
- I'm not certain we ultimately want to solve this up during launch/init because this is ultimately and LXD/C problem that I think we might have to resolve via either a LXD bug/feature-fix or by providing user-data to the instance at launch time that does this setup for us specifically.
Here are some alternatives:
- allow instance.execute to inspect whether self.key_pair is set on a LXDInstance, and if so always prefer ssh?
- make
instance._ssha public method that callers can use when specific conditions suggest sshing to the container is preferred - Add a
prefer_ssh = Falseattribute to lxd instances which instance.execute could inspect to determine whether to use self._ssh instead oflxc execon every execute call. - Add a
use_ssh=Falseparam to instance.execute which would allow any execute call to prefer ssh to whatever backend was defaulted
There was a problem hiding this comment.
Right now, we are using the first approach. If we have a key pair, lxc will always communicate with the machine through ssh.
But I think refactoring it to support the prefer_ssh approach is an excellent idea. I will refactor the code for that
| raise Exception(error_msg) | ||
|
|
||
| self._manage_ssh_key_pair(name) | ||
| user_data = textwrap.dedent( |
There was a problem hiding this comment.
In this code I believe, you silently ignore any user_data or config_dict["user.user-data"] passed into the lxd.cloud.init() so a user can't specify use_ssh and user_data I don't think. It feels like if we were trying to pre-enable ssh support for launched instances maybe we need an lxd-ssh custom profile that adds the necessary goodness for that case? or the caller just makes sure to provide the necessary ssh setup in user_data to the instance (because it's xenial in the single fips test case we have for instance)
There was a problem hiding this comment.
Yes, I have addressed that in my last code update. What I am doing is appending the ssh_authorized_keys bit into any existing user_data. But maybe we could handle that differently.
fd125b4 to
4ca341e
Compare
| if "16.04" in image_id: | ||
| return "xenial" | ||
|
|
||
| base_release = image_id.split(":")[1] | ||
| if base_release in ["trusty", "xenial", "bionic", "focal"]: | ||
| return base_release |
There was a problem hiding this comment.
It feels like this section could us re.match instead to handle a lot more cases than just 16.04 in the image_id. What do you think about the following? I also want to avoid hardcoding ubuntu series names or versions in more than one place, so we should probably leverage pycloudlib.util.UBUNTU_RELEASE_VERSION_MAP somehow (which should be updated to include groovy and hirsute)
| if "16.04" in image_id: | |
| return "xenial" | |
| base_release = image_id.split(":")[1] | |
| if base_release in ["trusty", "xenial", "bionic", "focal"]: | |
| return base_release | |
| # Check if we are an ubuntu-based image containing codename or version | |
| release_regex = ( | |
| "(.*ubuntu.*(?P<release>(" + | |
| "|".join(UBUNTU_RELEASE_VERSION_MAP) + | |
| "|".join(UBUNTU_RELEASE_VERSION_MAP.values()) + | |
| ")).*)" | |
| ) | |
| ubuntu_match = re.match(release_regex, image_id) | |
| if ubuntu_match: | |
| release = ubuntu_match.groupdict()["release"] | |
| for codename, version in UBUNTU_RELEASE_VERSION_MAP.items(): | |
| if release in (codename, version): | |
| return codename |
67e81fb to
d8ee749
Compare
|
@blackboxsw Code updated.
|
|
Also @blackboxsw this is the script I am using to test this new feature regarding FIPS on xenial machine: import textwrap
import logging
import time
import pycloudlib
RELEASE = 'xenial'
STAGING_TOKEN = "YOUR-TOKEN"
def main():
logging.basicConfig(level=logging.DEBUG)
lxd = pycloudlib.LXD('example-vm')
name = 'pycloudlib-vm'
ssh_user_data = textwrap.dedent(
"""\
#cloud-config
ssh_authorized_keys:
- ssh-rsa blah
"""
)
image_id = lxd.released_image("xenial", is_vm=True)
inst = lxd.launch(name=name, image_id=image_id, is_vm=True, use_ssh=True)
result = inst.execute("lsb_release -a")
print(result)
print(inst.execute("sudo add-apt-repository ppa:canonical-server/ua-client-daily -y"))
print(inst.execute("sudo apt-get update"))
result = inst.execute("sudo apt-get install ubuntu-advantage-tools -y")
print(inst.execute("ua version"))
inst.execute("ssh-import-id lamoura")
inst.execute("sudo sed -i 's/contracts.can/contracts.staging.can/' /etc/ubuntu-advantage/uaclient.conf")
inst.execute("sudo ua attach {}".format(STAGING_TOKEN))
inst.execute("sudo ua disable livepatch")
inst.execute("sudo ua enable fips --beta --assume-yes")
inst.restart(wait=True, force=False)
print(inst.execute("ua status --all"))
print(inst.execute("lsb_release -a"))
print(inst.execute("uname -r"))
inst.shutdown(force=False)
inst.start()
inst.restart(force=False)
inst.delete()
if __name__ == "__main__":
main() |
blackboxsw
left a comment
There was a problem hiding this comment.
just a few nits remaining on this round.
| if is_vm and release == "xenial": | ||
| # xenial needs to launch images:ubuntu/16.04/cloud | ||
| # because it contains the HWE kernel which has vhost-vsock support | ||
| return "images:ubuntu/16.04/cloud" |
There was a problem hiding this comment.
Is it worth a global var for this since it's used in multiple places?
| image_data = self._find_image(release, arch, daily=False) | ||
|
|
||
| if is_vm: | ||
| image_hash_key = "combined_disk1-img_sha256" |
There was a problem hiding this comment.
Is it worth a global var for these two strings as well since they are used in multiple places for filtering?
| 'ftype=lxd.tar.xz', | ||
| 'arch=%s' % arch, | ||
| 'release=%s' % release, | ||
| 'release=%s' % release |
| super()._wait_for_cloudinit( | ||
| raise_on_failure=raise_on_failure) | ||
| break | ||
| except OSError: |
There was a problem hiding this comment.
Do we specifically only want to check for "lxd agent" issues here in the OSError message, or do we prefer to catch any OSerror and retry here?
d8ee749 to
058ac4f
Compare
|
@blackboxsw Code updated |
|
@blackboxsw Just modified how we create the ssh keys. You were right here, it is better to create the ssh keys outside of the launch method |
4b5cf96 to
63ceffe
Compare
| @@ -0,0 +1 @@ | |||
| """Tests related to pycloudlib.lxd module.""" | |||
There was a problem hiding this comment.
I would suggest we align with other projects and have a tests subdirectory below the module we are testing (not a top-level pycloudlib/tests directory that has a full mapping of all submodules.
I think the lxd tests should live under pycloublib/lxd/tests instead of pycloudlib/tests/lxd so devs don't have to cd up and across from a top-level directory.
| subp(['lxc', 'restore', self.name, snapshot_name]) | ||
|
|
||
| def shutdown(self, wait=True): | ||
| def shutdown(self, wait=True, force=True): |
There was a problem hiding this comment.
We should change pycloudlib/instance.py and pycloudlib/*/instance.py:shutdown and restart methods to accept **kwargs
| def shutdown(self, wait=True, force=True): | |
| def shutdown(self, wait=True, force=True, **kwargs): |
| profile_config="profile_config" | ||
| ) | ||
|
|
||
| expected_msg = "The profile named test_profile already exist" |
There was a problem hiding this comment.
| expected_msg = "The profile named test_profile already exist" | |
| expected_msg = "The profile named test_profile already exists" |
|
@blackboxsw Updated |
| When `True`, if the process waiting for cloud-init exits non-zero | ||
| then this method will raise an `OSError`. | ||
| """ | ||
| if self._is_vm and raise_on_failure: |
There was a problem hiding this comment.
when lxd.get_instance is called we don't detect Type: virtual-machine on those returned LXDInstances, as a result self._is_vm is False and we fall through to the setup()._wait_for_cloud_init which fails with
File "/home/csmith/src/pycloudlib/pycloudlib/instance.py", line 392, in _wait_for_cloudinit
raise OSError(
OSError: cloud-init failed to start: out: error: Error: Failed to connect to lxd-agent
An example of that machine type detection could be
+ @property
+ def is_vm(self):
+ """Return boolean if vm type or not
+
+ Will return False if unknown.
+
+ Returns:
+ boolean if virtual-machine
+ """
+ result = subp(['lxc', 'info', self.name])
+
+ try:
+ info_type = re.findall(r'Type: (.*)', result)[0]
+ except IndexError:
+ return False
+
+ return bool(info_type == 'virtual-machine')
| if self.key_pair: | ||
| return super()._run_command(command, stdin) | ||
|
|
||
| base_cmd = ['lxc', 'exec', self.name, '--'] |
There was a problem hiding this comment.
I think we should default to non-root user for lxc exec commands (as we do in ua-tools integration tests)
| base_cmd = ['lxc', 'exec', self.name, '--'] | |
| base_cmd = ['lxc', 'exec', "--user", "1000", self.name, '--'] |
|
@blackboxsw Thanks for the review. I have updated the code |
a50f46d to
153773a
Compare
| super().__init__(key_pair=key_pair) | ||
|
|
||
| self._name = name | ||
| self._is_vm = None |
There was a problem hiding this comment.
This should be a class attr, no need to set in init we can reset that via is_vm() later as you already do
diff --git a/pycloudlib/lxd/instance.py b/pycloudlib/lxd/instance.py
index 20ddf7d..9594f47 100644
--- a/pycloudlib/lxd/instance.py
+++ b/pycloudlib/lxd/instance.py
@@ -11,6 +11,7 @@ class LXDInstance(BaseInstance):
"""LXD backed instance."""
_type = 'lxd'
+ _is_vm = None
def __init__(self, name, key_pair=None):
"""Set up instance.
@@ -22,7 +23,6 @@ class LXDInstance(BaseInstance):
super().__init__(key_pair=key_pair)
self._name = name
- self._is_vm = None
def __repr__(self):
"""Create string representation for class."""
blackboxsw
left a comment
There was a problem hiding this comment.
+1 thanks for this Lucas!
minor nit https://github.com/canonical/pycloudlib/pull/40/files#r521666099 but I can land it with that change
4b8e14c to
1fe170b
Compare
This PR adds LXD support for launching vms. Although this is working version, there are still some aspects of the code that may be updated:
wait_for_cloudinit: Right now, I am looping on anyOSErrorexception raised, meaning that unrelatedlxd-agenterrors will trigger unnecessary sleep calls. I can parse theOSErrorhere and search forlxd-agentbefore looping again.image_id parameter: I am assuming that the user will always pass as image_id the raw release name [xenial,bionic,focal] which may not be always truetrusty: There is no way to launch trusty vm through lxd, but I am not informing the user about that.Although I have those improvements to implement, I think the code is already in a good state for an initial review