Skip to content

Fix lxd related issues#52

Merged
lucasmoura merged 8 commits into
canonical:masterfrom
lucasmoura:fix-lxd
Nov 18, 2020
Merged

Fix lxd related issues#52
lucasmoura merged 8 commits into
canonical:masterfrom
lucasmoura:fix-lxd

Conversation

@lucasmoura
Copy link
Copy Markdown

@lucasmoura lucasmoura commented Nov 13, 2020

This PR fixes problems that we are experiencing after merging #40 and #45.

We are fixing:

  • LXD not being able to launch images without name parameter
  • LXD always using images remote for xenial instances
  • Paramiko not adding the ssh key type to the public key content
  • LXD image_serial not being able to verify if a image id is related to a vm or not.

PS: I still need to update our integration tests to avoid mistakes like this in the future. However, since I don't know how fast I will be able to do that, I think there is no need to wait for that to land this PR.

Fixes #46 #47 #48

Comment thread pycloudlib/lxd/cloud.py Outdated
Comment on lines +246 to +250
# When we create keys through paramiko, we end up not
# having the key type on the public key content. Because
# of that, we are manually adding the ssh-rsa type into it
if "ssh-" not in pub_key:
pub_key = "ssh-rsa {}".format(pub_key)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perform this fixing in the code that creates the keys with paramiko, rather than here? If we ever create non-RSA keys, this code would have to somehow determine the type of the key it is dealing with; it'll be much easier if it can assume good input.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally right, It is definitely better to update the method that creates the key instead

Comment thread pycloudlib/lxd/cloud.py Outdated
return image_info

def image_serial(self, image_id):
def image_serial(self, image_id, is_vm=False, **kwargs):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about adding cloud-specific keyword arguments to these APIs. In particular, doing so with is_vm means that code that's written against the pure BaseCloud interface needs special handling code to deal with LXD VM instances.

Consider this simple launch example (inspired by cloud-init's integration testing):

def do_launch(cloud: BaseCloud, image_id: str) -> BaseInstance:
    image_id = cloud.released_image(image_id)
    instance = cloud.launch(image_id, wait=False)
    instance.wait(raise_on_cloudinit_failure=False)
    return instance

This can be used to launch EC2 or LXD containers:

do_launch(pycloudlib.ec2.cloud.EC2("our_tag"))
do_launch(pycloudlib.lxd.cloud.LXD("our_tag"))

but there's no way to launch a LXD VM using it. It would need modification to take kwargs (because it doesn't make sense to have is_vm on a function which handles non-LXD clouds too), and of course its callers would also need special handling to pass in kwargs={"is_vm": True} iff they were launching a LXD VM. This would propagate all the way out through APIs which otherwise could just be passing around a BaseCloud instance.

It feels to me like we could modify this abstraction to improve matters. We could have is_vm be an instance attribute, and so passed at instantiation time; this would mean that instances of BaseCloud could still be passed around. However, that still doesn't allow us to pass around BaseCloud subclasses and instantiate them at will: you have to carry (BaseCloudSubclass, {"kwargs": "to instantiate it with"}) around to be able to do that.

So I think that perhaps the better abstraction is to have separate classes for LXDContainer and LXDVirtualMachine. This would allow them both to be treated as equal, first-class "clouds", and would enable use cases like those described above. These would share a lot of their implementation, of course. This also follows the abstraction that James settled on in the cloud-init integration testing: we have a LxdContainerCloud (which wraps pycloudlib.lxd.cloud.LXD), not a LxdCloud.

What do folks think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this proposal. Even though much of the implementation is the same, I think it makes sense to think of them as two separate "clouds". It also allows for further divergence in implementation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As further background, LXD has instance_type and type separate in its API: https://linuxcontainers.org/lxd/docs/master/rest-api#post-. This indicates, to me, that container vs VM is a more fundamental difference in their data model than "instance type" is in most other "clouds", which I think supports the use of separate classes in our own data model.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a great idea, since you totally right that the is_vm modification is breaking the main abstraction of pycloudlib (Which I missed when implementing that). I can update the code without an issue

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OddBloke I was originally likening LXD deployment of vms vs containers as sort of like ec2 managing different instance_types. Generic LXD cloud provides an wrapper around the lxc command set which is used by both lxc --vm and container based instance launches.

TLDR: +1 on LXDContainer/LXDVirtualmachine subclasses.

I spent a good hour trying to justify a common single LXD cloud as it is abstracting the common CLI for vm and container deployments, but a couple of things swayed me to subclasses (other than your well thought out explanation)

  • vms vs containers requiring special image filtering, search and launch knowledge,
  • vms require specific lxc profile love during setup and deployment
  • method signature bloat adding in LXCCloud is_vm param into multiple LXDCloud methods such as init, init, launch and image* methods indicating enough of a smell that this is better separated from a common class.

I also thought (incorrectly) that --vm was nearly a simple alias for --type virtual-machine.

But, when looking at lxc --type vs lxc --vm, as Dan pointed out in MM, one is not --vm is not actually an alias --type virtual-machine. The two params provide separate fields to lxd's API that direct the underlying cloud platform chosen for the instance deployment.

Through @lucasmoura's good work, we are seeing enough discrepancies in these underlying platforms and how to query images as well as which profiles to use that probably warrant the subclasses suggestion.
Ultimately, under the hood below lxc's CLI, they are unique cloud platforms and probably should be treated as such.

@lucasmoura
Copy link
Copy Markdown
Author

@OddBloke @blackboxsw @TheRealFalcon I have refactored the code to separate the behavior of LXD containers from virtual machines.

I believe now we can launch LXD vms as we launch any other cloud in the codebase.

However, there are some abstractions that maybe could be improved, such as the use of _get_image_hash_key. But if anyone has any idea for improvements, just let me know and I will code them.

Also @paride we have kind of an exception for image_serial when we are using Xenial vms. If you want to change how to handle that situation, just let me know as well.

@paride
Copy link
Copy Markdown
Collaborator

paride commented Nov 17, 2020

@lucasmoura I think the exception for Xenial is fine. I'm also +1 for the container/vm split, I think it makes sense and helps maintaining the api consistent. I'm going to run a couple of bootspeed jobs against this branch to check that everything works and report back.

@paride
Copy link
Copy Markdown
Collaborator

paride commented Nov 17, 2020

Works nicely!

Copy link
Copy Markdown
Contributor

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lucas, this looks much cleaner and clearer now. I still have a few inline comments/questions, but none of them should require a refactor!

Comment thread pycloudlib/lxd/cloud.py Outdated
return self._streams_query(filters, daily)[0]


class LXD(BaseLXD):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we rename this to LXDContainer for consistency with LXDVirtualMachine. It's clearer that this only handles containers that way.

If we want to maintain backwards-compat we can do something like:

import warnings


class LXD(LXDContainer):

    def __init__(self, *args, **kwargs):
        warnings.warn("LXD class is deprecated; use LXDContainer instead.")
        return super().__init__(*args, **kwargs)

Comment thread .travis.yml
- sudo usermod -a -G lxd $USER
- git clone https://github.com/canonical/cloud-init.git
- cd cloud-init
- sg lxd -c "pytest tests/integration_tests/"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably use tox and only run the CI tests we run in cloud-init: that should comfortably exercise most of the pycloudlib functionality we use. Given that I'm going to be pushing a cloud-init commit to make that easier to do, let's leave this as-is and I'll propose a separate change which does that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Except tox might have a hard time using the current pycloudlib rather than cloud-init's default one, so I'll give this further thought.)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was exactly why I have not used tox :(

Comment thread pycloudlib/lxd/cloud.py Outdated

class LXD(BaseCloud):
"""LXD Cloud Class."""
class BaseLXD(BaseCloud):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't expect consumers of our API to use this:

Suggested change
class BaseLXD(BaseCloud):
class _BaseLXD(BaseCloud):

Comment thread pycloudlib/lxd/cloud.py Outdated
# If we have a hash in the image_id we need to query simplestreams to
# identify the release.
return self._image_info(image_id, is_vm)[0]["release"]
return "ssh-rsa {}".format(pub_key), priv_str.getvalue()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like we could use key.get_name() here: http://docs.paramiko.org/en/stable/api/keys.html#paramiko.pkey.PKey.get_name

): # pylint: disable=W0212
"""Tests extracting release from non hashed image id."""
cloud = LXD(tag="test")
cloud = LXDVirtualMachine(tag="test")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we're losing coverage of LXD as a result of this? Or are these tests covering only VM behaviour?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should only be run on LXDVirtualMachine instances. I will better document that on the test

Comment thread pycloudlib/lxd/cloud.py
if release == "xenial":
# xenial needs to launch images:ubuntu/16.04/cloud
# because it contains the HWE kernel which has vhost-vsock support
return self.XENIAL_IMAGE_VSOCK_SUPPORT
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different image to the one that consumers will expect (it's from images:, instead of from ubuntu:); I'd like to see us log something to give people a better chance of grokking what's going on.

Comment thread pycloudlib/lxd/cloud.py Outdated

"""
if image_id == self.XENIAL_IMAGE_VSOCK_SUPPORT:
return ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "" instead of None?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I thought that maybe one of ours scripts were always expecting a string to be returned here, but I am not sure if that is okay. @paride would it be okay if this method returns None for xenial images on vms ?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucasmoura the bootspeed measurement script does indeed expect a string, but I agree with Dan in that it should handle None, so +1 for Dan's suggestion. (In practice this is a non-issue as the bootspeed jobs do not measure Xenial, but I like to know it's working in case we want to enable it.)

@lucasmoura
Copy link
Copy Markdown
Author

@OddBloke Code Updated. I will just wait on @paride response regarding the image_serial change

@lucasmoura
Copy link
Copy Markdown
Author

@OddBloke Code update with @paride approval on the updated image_serial

@lucasmoura lucasmoura requested a review from OddBloke November 18, 2020 15:55
Copy link
Copy Markdown
Contributor

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Lucas, this LGTM now!

@lucasmoura lucasmoura merged commit ddf1366 into canonical:master Nov 18, 2020
@lucasmoura lucasmoura deleted the fix-lxd branch November 18, 2020 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lxd launch() fails if name parameter is not specified

5 participants