Add vm support to lxd by lucasmoura · Pull Request #40 · canonical/pycloudlib

lucasmoura · 2020-10-29T21:16:35Z

This PR adds LXD support for launching vms. Although this is working version, there are still some aspects of the code that may be updated:

wait_for_cloudinit: Right now, I am looping on any OSError exception raised, meaning that unrelated lxd-agent errors will trigger unnecessary sleep calls. I can parse the OSError here and search for lxd-agent before looping again.
image_id parameter: I am assuming that the user will always pass as image_id the raw release name [xenial, bionic, focal] which may not be always true
trusty: There is no way to launch trusty vm through lxd, but I am not informing the user about that.

Although I have those improvements to implement, I think the code is already in a good state for an initial review

blackboxsw

Thanks @lucasmoura for the first pass here. A number of comments and cleanup to look at

blackboxsw · 2020-11-04T21:30:56Z

+                profile_list = [profile_name]
+
+        if use_ssh:
+            error_msg = (


Probably don't need to define this variable in this context. let's move it down inside if user_data_str and search_pattern.match(user_data_str) is None: below

I have updated that part of the code, since I realized my checks here were not making much sense in the context of an user providing its own ssh keys to issue commands.

blackboxsw · 2020-11-04T21:33:05Z

+            user_data_str = user_data or config_dict.get("user.user-data")
+
+            if user_data_str and search_pattern.match(user_data_str) is None:
+                raise Exception(error_msg)


We should avoid using generic Exceptions whenever possible. This should be a ValueError instead

Same thing here

blackboxsw · 2020-11-04T21:48:02Z

               name=None, ephemeral=False, network=None, storage=None,
-               profile_list=None, config_dict=None, **kwargs):
+               profile_list=None, config_dict=None, is_vm=False,
+               use_ssh=False, **kwargs):


use_ssh seems a fairly specific thing to surface at launch time with a parameter for a couple of reasons:

In our FIPS use-case I don't expect for the lifetime of an instance we will exclusively use ssh or `lxc exec. I'd like the ability to switch between access modes for different tests

I'm not certain we ultimately want to solve this up during launch/init because this is ultimately and LXD/C problem that I think we might have to resolve via either a LXD bug/feature-fix or by providing user-data to the instance at launch time that does this setup for us specifically.

Here are some alternatives:

allow instance.execute to inspect whether self.key_pair is set on a LXDInstance, and if so always prefer ssh?

make instance._ssh a public method that callers can use when specific conditions suggest sshing to the container is preferred

Add a prefer_ssh = False attribute to lxd instances which instance.execute could inspect to determine whether to use self._ssh instead of lxc exec on every execute call.

Add a use_ssh=False param to instance.execute which would allow any execute call to prefer ssh to whatever backend was defaulted

Right now, we are using the first approach. If we have a key pair, lxc will always communicate with the machine through ssh.
But I think refactoring it to support the prefer_ssh approach is an excellent idea. I will refactor the code for that

blackboxsw · 2020-11-04T21:56:13Z

+                raise Exception(error_msg)
+
+            self._manage_ssh_key_pair(name)
+            user_data = textwrap.dedent(


In this code I believe, you silently ignore any user_data or config_dict["user.user-data"] passed into the lxd.cloud.init() so a user can't specify use_ssh and user_data I don't think. It feels like if we were trying to pre-enable ssh support for launched instances maybe we need an lxd-ssh custom profile that adds the necessary goodness for that case? or the caller just makes sure to provide the necessary ssh setup in user_data to the instance (because it's xenial in the single fips test case we have for instance)

Yes, I have addressed that in my last code update. What I am doing is appending the ssh_authorized_keys bit into any existing user_data. But maybe we could handle that differently.

blackboxsw

bump regex request.

blackboxsw · 2020-11-05T22:07:44Z

+        if "16.04" in image_id:
+            return "xenial"
+
+        base_release = image_id.split(":")[1]
+        if base_release in ["trusty", "xenial", "bionic", "focal"]:
+            return base_release


It feels like this section could us re.match instead to handle a lot more cases than just 16.04 in the image_id. What do you think about the following? I also want to avoid hardcoding ubuntu series names or versions in more than one place, so we should probably leverage pycloudlib.util.UBUNTU_RELEASE_VERSION_MAP somehow (which should be updated to include groovy and hirsute)

Suggested change

if "16.04" in image_id:

return "xenial"

base_release = image_id.split(":")[1]

if base_release in ["trusty", "xenial", "bionic", "focal"]:

return base_release

# Check if we are an ubuntu-based image containing codename or version

release_regex = (

"(.*ubuntu.*(?P<release>(" +

"|".join(UBUNTU_RELEASE_VERSION_MAP) +

"|".join(UBUNTU_RELEASE_VERSION_MAP.values()) +

")).*)"

)

ubuntu_match = re.match(release_regex, image_id)

if ubuntu_match:

release = ubuntu_match.groupdict()["release"]

for codename, version in UBUNTU_RELEASE_VERSION_MAP.items():

if release in (codename, version):

return codename

lucasmoura · 2020-11-06T20:44:29Z

@blackboxsw Code updated.

It seems that the ssh issue was being caused by using the --force flag on lxc stop
I have kept the the use_ssh parameter for launch. The reason for that is that execute method is being called internally by pycloudlib. Meaning that parameters like force_ssh would only work for directly user call of the execute method. To be consistent, I am always executing commands through ssh when we have ssh keys in the instance. But we can better discuss this.

lucasmoura · 2020-11-06T20:45:46Z

Also @blackboxsw this is the script I am using to test this new feature regarding FIPS on xenial machine:

import textwrap
import logging
import time
import pycloudlib


RELEASE = 'xenial'
STAGING_TOKEN = "YOUR-TOKEN"


def main():
    logging.basicConfig(level=logging.DEBUG)

    lxd = pycloudlib.LXD('example-vm')
    name = 'pycloudlib-vm'

    ssh_user_data = textwrap.dedent(
        """\
        #cloud-config
        ssh_authorized_keys:
            - ssh-rsa blah
        """
    )
    
    image_id = lxd.released_image("xenial", is_vm=True)
    inst = lxd.launch(name=name, image_id=image_id, is_vm=True, use_ssh=True)
    result = inst.execute("lsb_release -a")
    print(result)

    print(inst.execute("sudo add-apt-repository ppa:canonical-server/ua-client-daily -y"))
    print(inst.execute("sudo apt-get update"))
    result = inst.execute("sudo apt-get install ubuntu-advantage-tools -y")

    print(inst.execute("ua version"))
    inst.execute("ssh-import-id lamoura")
    inst.execute("sudo sed -i 's/contracts.can/contracts.staging.can/' /etc/ubuntu-advantage/uaclient.conf")
    inst.execute("sudo ua attach {}".format(STAGING_TOKEN))
    inst.execute("sudo ua disable livepatch")
    inst.execute("sudo ua enable fips --beta --assume-yes")

    inst.restart(wait=True, force=False)
    print(inst.execute("ua status --all"))
    print(inst.execute("lsb_release -a"))
    print(inst.execute("uname -r"))

    inst.shutdown(force=False)
    inst.start()
    inst.restart(force=False)
    inst.delete()


if __name__ == "__main__":
    main()

blackboxsw

just a few nits remaining on this round.

blackboxsw · 2020-11-09T04:17:06Z

+        if is_vm and release == "xenial":
+            # xenial needs to launch images:ubuntu/16.04/cloud
+            # because it contains the HWE kernel which has vhost-vsock support
+            return "images:ubuntu/16.04/cloud"


Is it worth a global var for this since it's used in multiple places?

blackboxsw · 2020-11-09T04:18:04Z

        image_data = self._find_image(release, arch, daily=False)
+
+        if is_vm:
+            image_hash_key = "combined_disk1-img_sha256"


Is it worth a global var for these two strings as well since they are used in multiple places for filtering?

blackboxsw · 2020-11-09T04:18:35Z

            'ftype=lxd.tar.xz',
            'arch=%s' % arch,
-            'release=%s' % release,
+            'release=%s' % release


unintended change?

blackboxsw · 2020-11-09T04:20:27Z

+                    super()._wait_for_cloudinit(
+                        raise_on_failure=raise_on_failure)
+                    break
+                except OSError:


Do we specifically only want to check for "lxd agent" issues here in the OSError message, or do we prefer to catch any OSerror and retry here?

lucasmoura · 2020-11-09T15:36:04Z

@blackboxsw Code updated

lucasmoura · 2020-11-09T20:09:38Z

@blackboxsw Just modified how we create the ssh keys. You were right here, it is better to create the ssh keys outside of the launch method

blackboxsw · 2020-11-10T19:14:31Z

@@ -0,0 +1 @@
+"""Tests related to pycloudlib.lxd module."""


I would suggest we align with other projects and have a tests subdirectory below the module we are testing (not a top-level pycloudlib/tests directory that has a full mapping of all submodules.

I think the lxd tests should live under pycloublib/lxd/tests instead of pycloudlib/tests/lxd so devs don't have to cd up and across from a top-level directory.

blackboxsw · 2020-11-10T19:18:34Z

        subp(['lxc', 'restore', self.name, snapshot_name])

-    def shutdown(self, wait=True):
+    def shutdown(self, wait=True, force=True):


We should change pycloudlib/instance.py and pycloudlib/*/instance.py:shutdown and restart methods to accept **kwargs

Suggested change

def shutdown(self, wait=True, force=True):

def shutdown(self, wait=True, force=True, **kwargs):

blackboxsw · 2020-11-10T19:19:29Z

+                profile_config="profile_config"
+            )
+
+        expected_msg = "The profile named test_profile already exist"


Suggested change

expected_msg = "The profile named test_profile already exist"

expected_msg = "The profile named test_profile already exists"

lucasmoura · 2020-11-10T19:35:24Z

@blackboxsw Updated

blackboxsw · 2020-11-10T20:22:03Z

+            When `True`, if the process waiting for cloud-init exits non-zero
+            then this method will raise an `OSError`.
+        """
+        if self._is_vm and raise_on_failure:


when lxd.get_instance is called we don't detect Type: virtual-machine on those returned LXDInstances, as a result self._is_vm is False and we fall through to the setup()._wait_for_cloud_init which fails with

File "/home/csmith/src/pycloudlib/pycloudlib/instance.py", line 392, in _wait_for_cloudinit raise OSError( OSError: cloud-init failed to start: out: error: Error: Failed to connect to lxd-agent

An example of that machine type detection could be

+ @property + def is_vm(self): + """Return boolean if vm type or not + + Will return False if unknown. + + Returns: + boolean if virtual-machine + """ + result = subp(['lxc', 'info', self.name]) + + try: + info_type = re.findall(r'Type: (.*)', result)[0] + except IndexError: + return False + + return bool(info_type == 'virtual-machine')

blackboxsw · 2020-11-10T20:29:47Z

+        if self.key_pair:
+            return super()._run_command(command, stdin)
+
+        base_cmd = ['lxc', 'exec', self.name, '--']


I think we should default to non-root user for lxc exec commands (as we do in ua-tools integration tests)

Suggested change

base_cmd = ['lxc', 'exec', self.name, '--']

base_cmd = ['lxc', 'exec', "--user", "1000", self.name, '--']

lucasmoura · 2020-11-11T14:19:42Z

@blackboxsw Thanks for the review. I have updated the code

blackboxsw · 2020-11-11T22:04:40Z

+        super().__init__(key_pair=key_pair)

        self._name = name
+        self._is_vm = None


This should be a class attr, no need to set in init we can reset that via is_vm() later as you already do

diff --git a/pycloudlib/lxd/instance.py b/pycloudlib/lxd/instance.py index 20ddf7d..9594f47 100644 --- a/pycloudlib/lxd/instance.py +++ b/pycloudlib/lxd/instance.py @@ -11,6 +11,7 @@ class LXDInstance(BaseInstance): """LXD backed instance.""" _type = 'lxd' + _is_vm = None def __init__(self, name, key_pair=None): """Set up instance. @@ -22,7 +23,6 @@ class LXDInstance(BaseInstance): super().__init__(key_pair=key_pair) self._name = name - self._is_vm = None def __repr__(self): """Create string representation for class."""

blackboxsw

+1 thanks for this Lucas!
minor nit https://github.com/canonical/pycloudlib/pull/40/files#r521666099 but I can land it with that change

Add vm support to lxd

6e84f0b

lucasmoura force-pushed the add-lxd-vm-support branch from b8d6ddf to 0b2b8b5 Compare November 3, 2020 19:17

blackboxsw requested changes Nov 4, 2020

View reviewed changes

lucasmoura force-pushed the add-lxd-vm-support branch 2 times, most recently from fd125b4 to 4ca341e Compare November 5, 2020 21:13

blackboxsw reviewed Nov 6, 2020

View reviewed changes

lucasmoura force-pushed the add-lxd-vm-support branch 2 times, most recently from 67e81fb to d8ee749 Compare November 6, 2020 20:41

blackboxsw approved these changes Nov 9, 2020

View reviewed changes

Add support to run lxd commands using ssh

058ac4f

lucasmoura force-pushed the add-lxd-vm-support branch from d8ee749 to 058ac4f Compare November 9, 2020 15:35

lucasmoura requested a review from blackboxsw November 9, 2020 15:35

Do not create ssh key pair in LXD launch method

63ceffe

lucasmoura force-pushed the add-lxd-vm-support branch from 4b5cf96 to 63ceffe Compare November 9, 2020 21:41

Get trusty image for released or daily images code

37d1f83

blackboxsw requested changes Nov 10, 2020

View reviewed changes

Lucas Moura added 2 commits November 10, 2020 16:33

Move lxd test to lxd folder

06f3ee1

Update restart and shutdown method signature

9dd72a1

blackboxsw reviewed Nov 10, 2020

View reviewed changes

Lucas Moura added 4 commits November 10, 2020 18:29

Do not force shutdown on lxd snapshot

4b6c412

Fix extract release regex

d5193ec

Update is_vm check on LXD instances

d715f7d

Update LXD get_instance to check for ssh keys

9ad610d

Run lxc exec commands as non-root

153773a

lucasmoura force-pushed the add-lxd-vm-support branch from a50f46d to 153773a Compare November 11, 2020 14:24

Lucas Moura added 2 commits November 11, 2020 12:37

Add another check on lxd _wait_for_cloudinit

16da72e

Change LXD force parameter to be False by default

887d1e5

lucasmoura mentioned this pull request Nov 11, 2020

Refactor integration tests to use pycloudlib LXD canonical/ubuntu-pro-client#1241

Merged

blackboxsw reviewed Nov 11, 2020

View reviewed changes

lxdinstance: class attr _is_vm

1fe170b

blackboxsw force-pushed the add-lxd-vm-support branch from 4b8e14c to 1fe170b Compare November 11, 2020 22:12

blackboxsw approved these changes Nov 11, 2020

View reviewed changes

blackboxsw merged commit 0e75410 into canonical:master Nov 11, 2020

lucasmoura mentioned this pull request Nov 13, 2020

Fix lxd related issues #52

Merged

-        if "16.04" in image_id:
-            return "xenial"
-        base_release = image_id.split(":")[1]
-        if base_release in ["trusty", "xenial", "bionic", "focal"]:
-            return base_release
+        # Check if we are an ubuntu-based image containing codename or version
+        release_regex = (
+            "(.*ubuntu.*(?P<release>(" +
+            "|".join(UBUNTU_RELEASE_VERSION_MAP) +
+            "|".join(UBUNTU_RELEASE_VERSION_MAP.values()) +
+            ")).*)"
+        )
+        ubuntu_match = re.match(release_regex, image_id)
+        if ubuntu_match:
+            release = ubuntu_match.groupdict()["release"]
+            for codename, version in UBUNTU_RELEASE_VERSION_MAP.items():
+                if release in (codename, version):
+                    return codename

	def shutdown(self, wait=True, force=True):
	def shutdown(self, wait=True, force=True, **kwargs):

	expected_msg = "The profile named test_profile already exist"
	expected_msg = "The profile named test_profile already exists"

	base_cmd = ['lxc', 'exec', self.name, '--']
	base_cmd = ['lxc', 'exec', "--user", "1000", self.name, '--']

Conversation

lucasmoura commented Oct 29, 2020

Uh oh!

blackboxsw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blackboxsw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucasmoura commented Nov 6, 2020

Uh oh!

lucasmoura commented Nov 6, 2020

Uh oh!

blackboxsw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucasmoura commented Nov 9, 2020

Uh oh!

lucasmoura commented Nov 9, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucasmoura commented Nov 10, 2020

Uh oh!

blackboxsw Nov 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucasmoura commented Nov 11, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blackboxsw left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

blackboxsw Nov 10, 2020 •

edited

Loading