Fix lxd related issues#52
Conversation
| # When we create keys through paramiko, we end up not | ||
| # having the key type on the public key content. Because | ||
| # of that, we are manually adding the ssh-rsa type into it | ||
| if "ssh-" not in pub_key: | ||
| pub_key = "ssh-rsa {}".format(pub_key) |
There was a problem hiding this comment.
Should we perform this fixing in the code that creates the keys with paramiko, rather than here? If we ever create non-RSA keys, this code would have to somehow determine the type of the key it is dealing with; it'll be much easier if it can assume good input.
There was a problem hiding this comment.
Totally right, It is definitely better to update the method that creates the key instead
| return image_info | ||
|
|
||
| def image_serial(self, image_id): | ||
| def image_serial(self, image_id, is_vm=False, **kwargs): |
There was a problem hiding this comment.
I'm a little concerned about adding cloud-specific keyword arguments to these APIs. In particular, doing so with is_vm means that code that's written against the pure BaseCloud interface needs special handling code to deal with LXD VM instances.
Consider this simple launch example (inspired by cloud-init's integration testing):
def do_launch(cloud: BaseCloud, image_id: str) -> BaseInstance:
image_id = cloud.released_image(image_id)
instance = cloud.launch(image_id, wait=False)
instance.wait(raise_on_cloudinit_failure=False)
return instanceThis can be used to launch EC2 or LXD containers:
do_launch(pycloudlib.ec2.cloud.EC2("our_tag"))
do_launch(pycloudlib.lxd.cloud.LXD("our_tag"))but there's no way to launch a LXD VM using it. It would need modification to take kwargs (because it doesn't make sense to have is_vm on a function which handles non-LXD clouds too), and of course its callers would also need special handling to pass in kwargs={"is_vm": True} iff they were launching a LXD VM. This would propagate all the way out through APIs which otherwise could just be passing around a BaseCloud instance.
It feels to me like we could modify this abstraction to improve matters. We could have is_vm be an instance attribute, and so passed at instantiation time; this would mean that instances of BaseCloud could still be passed around. However, that still doesn't allow us to pass around BaseCloud subclasses and instantiate them at will: you have to carry (BaseCloudSubclass, {"kwargs": "to instantiate it with"}) around to be able to do that.
So I think that perhaps the better abstraction is to have separate classes for LXDContainer and LXDVirtualMachine. This would allow them both to be treated as equal, first-class "clouds", and would enable use cases like those described above. These would share a lot of their implementation, of course. This also follows the abstraction that James settled on in the cloud-init integration testing: we have a LxdContainerCloud (which wraps pycloudlib.lxd.cloud.LXD), not a LxdCloud.
What do folks think?
There was a problem hiding this comment.
I like this proposal. Even though much of the implementation is the same, I think it makes sense to think of them as two separate "clouds". It also allows for further divergence in implementation.
There was a problem hiding this comment.
As further background, LXD has instance_type and type separate in its API: https://linuxcontainers.org/lxd/docs/master/rest-api#post-. This indicates, to me, that container vs VM is a more fundamental difference in their data model than "instance type" is in most other "clouds", which I think supports the use of separate classes in our own data model.
There was a problem hiding this comment.
I think it is a great idea, since you totally right that the is_vm modification is breaking the main abstraction of pycloudlib (Which I missed when implementing that). I can update the code without an issue
There was a problem hiding this comment.
@OddBloke I was originally likening LXD deployment of vms vs containers as sort of like ec2 managing different instance_types. Generic LXD cloud provides an wrapper around the lxc command set which is used by both lxc --vm and container based instance launches.
TLDR: +1 on LXDContainer/LXDVirtualmachine subclasses.
I spent a good hour trying to justify a common single LXD cloud as it is abstracting the common CLI for vm and container deployments, but a couple of things swayed me to subclasses (other than your well thought out explanation)
- vms vs containers requiring special image filtering, search and launch knowledge,
- vms require specific lxc profile love during setup and deployment
- method signature bloat adding in LXCCloud is_vm param into multiple LXDCloud methods such as init, init, launch and image* methods indicating enough of a smell that this is better separated from a common class.
I also thought (incorrectly) that --vm was nearly a simple alias for --type virtual-machine.
But, when looking at lxc --type vs lxc --vm, as Dan pointed out in MM, one is not --vm is not actually an alias --type virtual-machine. The two params provide separate fields to lxd's API that direct the underlying cloud platform chosen for the instance deployment.
Through @lucasmoura's good work, we are seeing enough discrepancies in these underlying platforms and how to query images as well as which profiles to use that probably warrant the subclasses suggestion.
Ultimately, under the hood below lxc's CLI, they are unique cloud platforms and probably should be treated as such.
|
@OddBloke @blackboxsw @TheRealFalcon I have refactored the code to separate the behavior of LXD containers from virtual machines. I believe now we can launch LXD vms as we launch any other cloud in the codebase. However, there are some abstractions that maybe could be improved, such as the use of Also @paride we have kind of an exception for |
|
@lucasmoura I think the exception for Xenial is fine. I'm also +1 for the container/vm split, I think it makes sense and helps maintaining the api consistent. I'm going to run a couple of bootspeed jobs against this branch to check that everything works and report back. |
|
Works nicely! |
OddBloke
left a comment
There was a problem hiding this comment.
Thanks Lucas, this looks much cleaner and clearer now. I still have a few inline comments/questions, but none of them should require a refactor!
| return self._streams_query(filters, daily)[0] | ||
|
|
||
|
|
||
| class LXD(BaseLXD): |
There was a problem hiding this comment.
I suggest that we rename this to LXDContainer for consistency with LXDVirtualMachine. It's clearer that this only handles containers that way.
If we want to maintain backwards-compat we can do something like:
import warnings
class LXD(LXDContainer):
def __init__(self, *args, **kwargs):
warnings.warn("LXD class is deprecated; use LXDContainer instead.")
return super().__init__(*args, **kwargs)| - sudo usermod -a -G lxd $USER | ||
| - git clone https://github.com/canonical/cloud-init.git | ||
| - cd cloud-init | ||
| - sg lxd -c "pytest tests/integration_tests/" |
There was a problem hiding this comment.
This should probably use tox and only run the CI tests we run in cloud-init: that should comfortably exercise most of the pycloudlib functionality we use. Given that I'm going to be pushing a cloud-init commit to make that easier to do, let's leave this as-is and I'll propose a separate change which does that.
There was a problem hiding this comment.
(Except tox might have a hard time using the current pycloudlib rather than cloud-init's default one, so I'll give this further thought.)
There was a problem hiding this comment.
Yes, that was exactly why I have not used tox :(
|
|
||
| class LXD(BaseCloud): | ||
| """LXD Cloud Class.""" | ||
| class BaseLXD(BaseCloud): |
There was a problem hiding this comment.
We don't expect consumers of our API to use this:
| class BaseLXD(BaseCloud): | |
| class _BaseLXD(BaseCloud): |
| # If we have a hash in the image_id we need to query simplestreams to | ||
| # identify the release. | ||
| return self._image_info(image_id, is_vm)[0]["release"] | ||
| return "ssh-rsa {}".format(pub_key), priv_str.getvalue() |
There was a problem hiding this comment.
It looks to me like we could use key.get_name() here: http://docs.paramiko.org/en/stable/api/keys.html#paramiko.pkey.PKey.get_name
| ): # pylint: disable=W0212 | ||
| """Tests extracting release from non hashed image id.""" | ||
| cloud = LXD(tag="test") | ||
| cloud = LXDVirtualMachine(tag="test") |
There was a problem hiding this comment.
Does this mean that we're losing coverage of LXD as a result of this? Or are these tests covering only VM behaviour?
There was a problem hiding this comment.
This function should only be run on LXDVirtualMachine instances. I will better document that on the test
| if release == "xenial": | ||
| # xenial needs to launch images:ubuntu/16.04/cloud | ||
| # because it contains the HWE kernel which has vhost-vsock support | ||
| return self.XENIAL_IMAGE_VSOCK_SUPPORT |
There was a problem hiding this comment.
This is a different image to the one that consumers will expect (it's from images:, instead of from ubuntu:); I'd like to see us log something to give people a better chance of grokking what's going on.
|
|
||
| """ | ||
| if image_id == self.XENIAL_IMAGE_VSOCK_SUPPORT: | ||
| return "" |
There was a problem hiding this comment.
To be honest, I thought that maybe one of ours scripts were always expecting a string to be returned here, but I am not sure if that is okay. @paride would it be okay if this method returns None for xenial images on vms ?
There was a problem hiding this comment.
@lucasmoura the bootspeed measurement script does indeed expect a string, but I agree with Dan in that it should handle None, so +1 for Dan's suggestion. (In practice this is a non-issue as the bootspeed jobs do not measure Xenial, but I like to know it's working in case we want to enable it.)
OddBloke
left a comment
There was a problem hiding this comment.
Thanks Lucas, this LGTM now!
This PR fixes problems that we are experiencing after merging #40 and #45.
We are fixing:
PS: I still need to update our integration tests to avoid mistakes like this in the future. However, since I don't know how fast I will be able to do that, I think there is no need to wait for that to land this PR.
Fixes #46 #47 #48