Skip to content

Add vm support to lxd#40

Merged
blackboxsw merged 14 commits into
canonical:masterfrom
lucasmoura:add-lxd-vm-support
Nov 11, 2020
Merged

Add vm support to lxd#40
blackboxsw merged 14 commits into
canonical:masterfrom
lucasmoura:add-lxd-vm-support

Conversation

@lucasmoura
Copy link
Copy Markdown

This PR adds LXD support for launching vms. Although this is working version, there are still some aspects of the code that may be updated:

  • wait_for_cloudinit: Right now, I am looping on any OSError exception raised, meaning that unrelated lxd-agent errors will trigger unnecessary sleep calls. I can parse the OSError here and search for lxd-agent before looping again.
  • image_id parameter: I am assuming that the user will always pass as image_id the raw release name [xenial, bionic, focal] which may not be always true
  • trusty: There is no way to launch trusty vm through lxd, but I am not informing the user about that.

Although I have those improvements to implement, I think the code is already in a good state for an initial review

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lucasmoura for the first pass here. A number of comments and cleanup to look at

Comment thread pycloudlib/lxd/cloud.py Outdated
Comment thread pycloudlib/lxd/cloud.py Outdated
Comment thread pycloudlib/lxd/cloud.py Outdated
Comment thread pycloudlib/lxd/cloud.py Outdated
Comment thread pycloudlib/lxd/cloud.py Outdated
Comment thread pycloudlib/lxd/cloud.py Outdated
profile_list = [profile_name]

if use_ssh:
error_msg = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need to define this variable in this context. let's move it down inside if user_data_str and search_pattern.match(user_data_str) is None: below

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated that part of the code, since I realized my checks here were not making much sense in the context of an user providing its own ssh keys to issue commands.

Comment thread pycloudlib/lxd/cloud.py Outdated
user_data_str = user_data or config_dict.get("user.user-data")

if user_data_str and search_pattern.match(user_data_str) is None:
raise Exception(error_msg)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using generic Exceptions whenever possible. This should be a ValueError instead

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here

Comment thread pycloudlib/lxd/cloud.py Outdated
Comment thread pycloudlib/lxd/cloud.py Outdated
name=None, ephemeral=False, network=None, storage=None,
profile_list=None, config_dict=None, **kwargs):
profile_list=None, config_dict=None, is_vm=False,
use_ssh=False, **kwargs):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_ssh seems a fairly specific thing to surface at launch time with a parameter for a couple of reasons:

  1. In our FIPS use-case I don't expect for the lifetime of an instance we will exclusively use ssh or `lxc exec. I'd like the ability to switch between access modes for different tests
  2. I'm not certain we ultimately want to solve this up during launch/init because this is ultimately and LXD/C problem that I think we might have to resolve via either a LXD bug/feature-fix or by providing user-data to the instance at launch time that does this setup for us specifically.

Here are some alternatives:

  • allow instance.execute to inspect whether self.key_pair is set on a LXDInstance, and if so always prefer ssh?
  • make instance._ssh a public method that callers can use when specific conditions suggest sshing to the container is preferred
  • Add a prefer_ssh = False attribute to lxd instances which instance.execute could inspect to determine whether to use self._ssh instead of lxc exec on every execute call.
  • Add a use_ssh=False param to instance.execute which would allow any execute call to prefer ssh to whatever backend was defaulted

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we are using the first approach. If we have a key pair, lxc will always communicate with the machine through ssh.
But I think refactoring it to support the prefer_ssh approach is an excellent idea. I will refactor the code for that

Comment thread pycloudlib/lxd/cloud.py Outdated
raise Exception(error_msg)

self._manage_ssh_key_pair(name)
user_data = textwrap.dedent(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this code I believe, you silently ignore any user_data or config_dict["user.user-data"] passed into the lxd.cloud.init() so a user can't specify use_ssh and user_data I don't think. It feels like if we were trying to pre-enable ssh support for launched instances maybe we need an lxd-ssh custom profile that adds the necessary goodness for that case? or the caller just makes sure to provide the necessary ssh setup in user_data to the instance (because it's xenial in the single fips test case we have for instance)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have addressed that in my last code update. What I am doing is appending the ssh_authorized_keys bit into any existing user_data. But maybe we could handle that differently.

@lucasmoura lucasmoura force-pushed the add-lxd-vm-support branch 2 times, most recently from fd125b4 to 4ca341e Compare November 5, 2020 21:13
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump regex request.

Comment thread pycloudlib/lxd/cloud.py Outdated
Comment on lines +124 to +129
if "16.04" in image_id:
return "xenial"

base_release = image_id.split(":")[1]
if base_release in ["trusty", "xenial", "bionic", "focal"]:
return base_release
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this section could us re.match instead to handle a lot more cases than just 16.04 in the image_id. What do you think about the following? I also want to avoid hardcoding ubuntu series names or versions in more than one place, so we should probably leverage pycloudlib.util.UBUNTU_RELEASE_VERSION_MAP somehow (which should be updated to include groovy and hirsute)

Suggested change
if "16.04" in image_id:
return "xenial"
base_release = image_id.split(":")[1]
if base_release in ["trusty", "xenial", "bionic", "focal"]:
return base_release
# Check if we are an ubuntu-based image containing codename or version
release_regex = (
"(.*ubuntu.*(?P<release>(" +
"|".join(UBUNTU_RELEASE_VERSION_MAP) +
"|".join(UBUNTU_RELEASE_VERSION_MAP.values()) +
")).*)"
)
ubuntu_match = re.match(release_regex, image_id)
if ubuntu_match:
release = ubuntu_match.groupdict()["release"]
for codename, version in UBUNTU_RELEASE_VERSION_MAP.items():
if release in (codename, version):
return codename

@lucasmoura lucasmoura force-pushed the add-lxd-vm-support branch 2 times, most recently from 67e81fb to d8ee749 Compare November 6, 2020 20:41
@lucasmoura
Copy link
Copy Markdown
Author

@blackboxsw Code updated.

  1. It seems that the ssh issue was being caused by using the --force flag on lxc stop
  2. I have kept the the use_ssh parameter for launch. The reason for that is that execute method is being called internally by pycloudlib. Meaning that parameters like force_ssh would only work for directly user call of the execute method. To be consistent, I am always executing commands through ssh when we have ssh keys in the instance. But we can better discuss this.

@lucasmoura
Copy link
Copy Markdown
Author

Also @blackboxsw this is the script I am using to test this new feature regarding FIPS on xenial machine:

import textwrap
import logging
import time
import pycloudlib


RELEASE = 'xenial'
STAGING_TOKEN = "YOUR-TOKEN"


def main():
    logging.basicConfig(level=logging.DEBUG)

    lxd = pycloudlib.LXD('example-vm')
    name = 'pycloudlib-vm'

    ssh_user_data = textwrap.dedent(
        """\
        #cloud-config
        ssh_authorized_keys:
            - ssh-rsa blah
        """
    )
    
    image_id = lxd.released_image("xenial", is_vm=True)
    inst = lxd.launch(name=name, image_id=image_id, is_vm=True, use_ssh=True)
    result = inst.execute("lsb_release -a")
    print(result)

    print(inst.execute("sudo add-apt-repository ppa:canonical-server/ua-client-daily -y"))
    print(inst.execute("sudo apt-get update"))
    result = inst.execute("sudo apt-get install ubuntu-advantage-tools -y")

    print(inst.execute("ua version"))
    inst.execute("ssh-import-id lamoura")
    inst.execute("sudo sed -i 's/contracts.can/contracts.staging.can/' /etc/ubuntu-advantage/uaclient.conf")
    inst.execute("sudo ua attach {}".format(STAGING_TOKEN))
    inst.execute("sudo ua disable livepatch")
    inst.execute("sudo ua enable fips --beta --assume-yes")

    inst.restart(wait=True, force=False)
    print(inst.execute("ua status --all"))
    print(inst.execute("lsb_release -a"))
    print(inst.execute("uname -r"))

    inst.shutdown(force=False)
    inst.start()
    inst.restart(force=False)
    inst.delete()


if __name__ == "__main__":
    main()

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few nits remaining on this round.

Comment thread pycloudlib/lxd/cloud.py Outdated
if is_vm and release == "xenial":
# xenial needs to launch images:ubuntu/16.04/cloud
# because it contains the HWE kernel which has vhost-vsock support
return "images:ubuntu/16.04/cloud"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth a global var for this since it's used in multiple places?

Comment thread pycloudlib/lxd/cloud.py Outdated
image_data = self._find_image(release, arch, daily=False)

if is_vm:
image_hash_key = "combined_disk1-img_sha256"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth a global var for these two strings as well since they are used in multiple places for filtering?

Comment thread pycloudlib/lxd/cloud.py Outdated
'ftype=lxd.tar.xz',
'arch=%s' % arch,
'release=%s' % release,
'release=%s' % release
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unintended change?

Comment thread pycloudlib/lxd/instance.py Outdated
super()._wait_for_cloudinit(
raise_on_failure=raise_on_failure)
break
except OSError:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we specifically only want to check for "lxd agent" issues here in the OSError message, or do we prefer to catch any OSerror and retry here?

@lucasmoura
Copy link
Copy Markdown
Author

@blackboxsw Code updated

@lucasmoura
Copy link
Copy Markdown
Author

@blackboxsw Just modified how we create the ssh keys. You were right here, it is better to create the ssh keys outside of the launch method

@@ -0,0 +1 @@
"""Tests related to pycloudlib.lxd module."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we align with other projects and have a tests subdirectory below the module we are testing (not a top-level pycloudlib/tests directory that has a full mapping of all submodules.

I think the lxd tests should live under pycloublib/lxd/tests instead of pycloudlib/tests/lxd so devs don't have to cd up and across from a top-level directory.

Comment thread pycloudlib/lxd/instance.py Outdated
subp(['lxc', 'restore', self.name, snapshot_name])

def shutdown(self, wait=True):
def shutdown(self, wait=True, force=True):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change pycloudlib/instance.py and pycloudlib/*/instance.py:shutdown and restart methods to accept **kwargs

Suggested change
def shutdown(self, wait=True, force=True):
def shutdown(self, wait=True, force=True, **kwargs):

Comment thread pycloudlib/tests/lxd/test_cloud.py Outdated
profile_config="profile_config"
)

expected_msg = "The profile named test_profile already exist"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected_msg = "The profile named test_profile already exist"
expected_msg = "The profile named test_profile already exists"

@lucasmoura
Copy link
Copy Markdown
Author

@blackboxsw Updated

Comment thread pycloudlib/lxd/instance.py Outdated
When `True`, if the process waiting for cloud-init exits non-zero
then this method will raise an `OSError`.
"""
if self._is_vm and raise_on_failure:
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when lxd.get_instance is called we don't detect Type: virtual-machine on those returned LXDInstances, as a result self._is_vm is False and we fall through to the setup()._wait_for_cloud_init which fails with

  File "/home/csmith/src/pycloudlib/pycloudlib/instance.py", line 392, in _wait_for_cloudinit
    raise OSError(
OSError: cloud-init failed to start: out:  error: Error: Failed to connect to lxd-agent

An example of that machine type detection could be

+    @property
+    def is_vm(self):
+        """Return boolean if vm type or not
+
+        Will return False if unknown.
+
+        Returns:
+            boolean if virtual-machine
+        """
+        result = subp(['lxc', 'info', self.name])
+
+        try:
+            info_type = re.findall(r'Type: (.*)', result)[0]
+        except IndexError:
+            return False
+
+        return bool(info_type == 'virtual-machine')

Comment thread pycloudlib/lxd/instance.py Outdated
if self.key_pair:
return super()._run_command(command, stdin)

base_cmd = ['lxc', 'exec', self.name, '--']
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should default to non-root user for lxc exec commands (as we do in ua-tools integration tests)

Suggested change
base_cmd = ['lxc', 'exec', self.name, '--']
base_cmd = ['lxc', 'exec', "--user", "1000", self.name, '--']

@lucasmoura
Copy link
Copy Markdown
Author

@blackboxsw Thanks for the review. I have updated the code

Comment thread pycloudlib/lxd/instance.py Outdated
super().__init__(key_pair=key_pair)

self._name = name
self._is_vm = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a class attr, no need to set in init we can reset that via is_vm() later as you already do

diff --git a/pycloudlib/lxd/instance.py b/pycloudlib/lxd/instance.py
index 20ddf7d..9594f47 100644
--- a/pycloudlib/lxd/instance.py
+++ b/pycloudlib/lxd/instance.py
@@ -11,6 +11,7 @@ class LXDInstance(BaseInstance):
     """LXD backed instance."""
 
     _type = 'lxd'
+    _is_vm = None
 
     def __init__(self, name, key_pair=None):
         """Set up instance.
@@ -22,7 +23,6 @@ class LXDInstance(BaseInstance):
         super().__init__(key_pair=key_pair)
 
         self._name = name
-        self._is_vm = None
 
     def __repr__(self):
         """Create string representation for class."""

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks for this Lucas!
minor nit https://github.com/canonical/pycloudlib/pull/40/files#r521666099 but I can land it with that change

@blackboxsw blackboxsw merged commit 0e75410 into canonical:master Nov 11, 2020
@lucasmoura lucasmoura mentioned this pull request Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants