Ensure proper root permissions in integration tests#664
Conversation
| """Use getent to interrogate the various expected outcomes""" | ||
| result = class_client.execute(["getent"] + getent_args) | ||
| assert re.search(regex, result.stdout) is not None | ||
| assert re.search(regex, result.stdout) is not None, ( |
There was a problem hiding this comment.
Needed some additional debug output for this failure
There was a problem hiding this comment.
The fact that re.* return None for a missing match means that there's no good way to improve this assertion output generically, unfortunately; there's no way to infer the regex or the text it's matching against in pytest_assertrepr_compare.
| ("md5sum </root/file_binary", "3801184b97bb8c6e63fa0e1eae2920d7"), | ||
| ("sha256sum </root/file_binary", ( | ||
| ("md5sum /root/file_binary", "3801184b97bb8c6e63fa0e1eae2920d7"), | ||
| ("sha256sum /root/file_binary", ( |
There was a problem hiding this comment.
Pipes and redirection are still gonna be weird after this change (as is always the case when using sudo).
Pycloudlib currently does a sh -c <command> to any commands passed to it via execute. We could perhaps add a sudo kwarg there, though I'm not sure if that's necessary or better than this change.
a7e62b5 to
e54d7b4
Compare
|
See also, canonical/pycloudlib#54 |
OddBloke
left a comment
There was a problem hiding this comment.
Thanks James! Overall, this looks good, and could probably land as-is (well, with s/mv/cp/ in the place I commented). However, I think we need to bump our pycloudlib dep to at least canonical/pycloudlib@3d02fef to be able to pass use_sudo. I also have some other, less important, thoughts.
| def pull_file(self, remote_path, local_path): | ||
| # First copy to a temporary directory because of permissions issues | ||
| tmp_path = _get_tmp_path() | ||
| self.instance.execute('mv {} {}'.format(remote_path, tmp_path)) |
There was a problem hiding this comment.
| self.instance.execute('mv {} {}'.format(remote_path, tmp_path)) | |
| self.instance.execute('cp {} {}'.format(remote_path, tmp_path)) |
There was a problem hiding this comment.
Not opposed to the change, but curious why cp is better? Is there a reason to keep the temporary file around?
There was a problem hiding this comment.
OH...this is pull. Nevermind 😄
| def execute(self, command): | ||
| return self.instance.execute(command) | ||
| def execute(self, command) -> Result: | ||
| return self.instance.execute(command, use_sudo=self.use_sudo) |
There was a problem hiding this comment.
This means we need to be able to unpin/bump the pin of our pycloudlib dependency to land this, right?
There was a problem hiding this comment.
Yes, but tests will fail if we bump the dependency before canonical/pycloudlib#52 lands.
| return contents | ||
| result = self.execute('/bin/cat {}'.format(remote_path)) | ||
| if result.failed: | ||
| raise CalledProcessException( |
There was a problem hiding this comment.
For read_from_file, it feels a little strange to get a CalledProcessException; the calling of a process is an implementation detail. Ideally, I think we'd fix canonical/pycloudlib#39 and then this could use the common exception class that that (to my mind) implies.
Would we be better off raising a builtin exception of some description in the meantime, and adding a note to that issue (and perhaps a TODO comment) indicating that once we have a better pycloudlib exception we should switch to using that?
|
|
||
| def execute(self, command): | ||
| return self.instance.execute(command) | ||
| def execute(self, command) -> Result: |
There was a problem hiding this comment.
We are likely to want to be able to run commands without privilege as part of testing: should we add use_sudo as a parameter here, defaulting to self.use_sudo?
| """Use getent to interrogate the various expected outcomes""" | ||
| result = class_client.execute(["getent"] + getent_args) | ||
| assert re.search(regex, result.stdout) is not None | ||
| assert re.search(regex, result.stdout) is not None, ( |
There was a problem hiding this comment.
The fact that re.* return None for a missing match means that there's no good way to improve this assertion output generically, unfortunately; there's no way to infer the regex or the text it's matching against in pytest_assertrepr_compare.
|
Moving back to WIP as we need canonical/pycloudlib#52 to land first |
0bae0e2 to
f47d99d
Compare
|
I've removed the WIP flag, addressed the comments, and bumped pycloudlib's version to the latest. |
|
I'm moving a conversation that started in #676 over here.
It feels to me like you're thinking too much about the implementation, and a premature optimization.One of the two "transports" (i'm not sure what they're called) uses ssh as unpriviledged user, and then, yes does have to use sudo to become root. The other (lxd) starts as root. so that seems like a wash. Root is a strict superset of functionality over unprivileged user. What things are you wanting to test that want to run as non-root? It seems like that would be a small subset, and really those would be more OS tests than cloud-init tests.
It seems like the harness has to know what the default username is in the instance, or it would not have gotten in via ssh. So the harness should just make that available to the test if the test needs to know it. You could just do something like: but I really do think that there are probably very few things you want to do that need to be done as unpriviledged user.
Executing the commands over ssh has all sorts of shortcomings compared to some more direct option (like is easily available in LXD).
I had spent some time at one point trying to implement a execution layer for cloud-init tests that did not rely on ssh. My idea was just to set up a service listening on a socket that would provide the necessary functionality (run commands, grab files ... ). The socket would need to be IP on EC2, but it could take advantage of any host<->guest socket (vsock(7)) or the lxd socket. |
|
On Thu, Nov 19, 2020 at 06:34:02AM -0800, Scott Moser wrote:
I'm moving a conversation that started in #676 over here.
> > [smoser]
> > imo the test harness 'exec' should be done as root.
>
> [OddBloke]
> In non-LXD cases (where we SSH into instances as non-root), that
> would mean that running commands as non-root would be going through
> `sudo` twice: once to do the exec as root, and once to get back to
> the default user.
It feels to me like you're thinking too much about the implementation,
and a premature optimization.One of the two "transports" (i'm not sure
what they're called) uses ssh as unpriviledged user, and then, yes
does have to use sudo to become root. The other (lxd) starts as root.
so that seems like a wash.
The key reason for doing this is to maintain consistency between LXD and
other clouds: those will gain access to the machine unprivileged
necessarily. Apologies for not making that clearer.
Root is a strict superset of functionality over unprivileged user.
What things are you wanting to test that want to run as non-root? It
seems like that would be a small subset, and really those would be
more OS tests than cloud-init tests.
I was thinking of tests along the lines of "cloud-init writes things out
in a way that unprivileged users can't read". These could be done as
root with permission checks, of course, so I agree this is a corner
case.
but I really do think that there are probably very few things you want
to do that need to be done as unpriviledged user.
Agreed.
> (Writing this out, I realise there's a gap here if root is the
> default SSH user:
> https://github.com/canonical/cloud-init/pull/664/files#r526453646)
Executing the commands over ssh has all sorts of shortcomings compared
to some more direct option (like is easily available in LXD).
This test framework doesn't only work on LXD, so optimising the way it
interacts with LXD doesn't buy us all that much IMO. It's preferable to
have LXD behave consistently with other clouds (i.e. you get in
unprivileged) so we catch any issues that might be caused by that in our
local/CI testing.
For clarity, though, we don't SSH into LXD hosts, we do use the direct
option and `exec` (but as UID 1000 to match behaviour on other clouds).
This gives us consistent permissions with other clouds, while not losing
the advantages that the more direct option gives us.
* What if default user does not have sudo access?
* What if ssh access is disabled ?
If either of these is true then this is not an appropriate image for use
with the test framework. Do you know of images we might want to use as
a basis for testing for which either of these is true?
* You can't get information at all if ssh doesn't come up?
We can still gather console logs etc., but yeah, if SSH doesn't come up
then we can't gather much. As I mentioned previously, we should still
be able to exec on LXD (and therefore gather full logs).
I had spent some time at one point trying to implement a execution
layer for cloud-init tests that did not rely on ssh. My idea was just
to set up a service listening on a socket that would provide the
necessary functionality (run commands, grab files ... ). The socket
would need to be IP on EC2, but it could take advantage of any
host<->guest socket (vsock(7)) or the lxd socket.
This isn't something that we'll have time to work on, but if you're
interested in continuing to play with it, then patching pycloudlib to
test interacting with it should be fairly simple. (I do have some
broader reservations about such a plan, but we shouldn't get too off
into the weeds here; if you want to discuss it further, perhaps open a
pycloudlib issue?)
|
Consitency is good. Fix the non-LXD scenario to be consistent with LXD, as that I'm voting for both
|
|
On Thu, Nov 19, 2020 at 10:43:18AM -0800, Scott Moser wrote:
> On Thu, Nov 19, 2020 at 06:34:02AM -0800, Scott Moser wrote: I'm
> moving a conversation that started in #676 over here. > > [smoser] >
> > imo the test harness 'exec' should be done as root. > > [OddBloke]
> > > In non-LXD cases (where we SSH into instances as non-root), that
> > > > would mean that running commands as non-root would be going
> > > > through > `sudo` twice: once to do the exec as root, and once
> > > > to get back to > the default user. It feels to me like you're
> > > > thinking too much about the implementation, and a premature
> > > > optimization.One of the two "transports" (i'm not sure what
> > > > they're called) uses ssh as unpriviledged user, and then, yes
> > > > does have to use sudo to become root. The other (lxd) starts
> > > > as root. so that seems like a wash.
> The key reason for doing this is to maintain consistency between LXD
> and other clouds: those will gain access to the machine unprivileged
> necessarily. Apologies for not making that clearer.
Consitency is good. Fix the non-LXD scenario to be consistent with LXD, as that
for this purpose, execution as root is a better default.
I'm voting for both
* consistency: `exec(command)` on all platforms happens with uid=0
It feels like we're conflating two things here: how we execute commands
on the platform at all, and the effective UID with which we execute
commands.
We cannot execute commands directly as uid=0 when we are accessing
systems via SSH. So we default to using `sudo`, meaning the effective
UID we use to execute commands _is_ 0.
So the question is really how to get an effective uid=0 on LXD. We have
two options: either execute `$cmd` as uid=0, or execute `sudo -- $cmd`
as uid=1000.
My position is that we're better off consistently using `sudo` to get
uid=0, because if there are issues with doing so (e.g. escaping,
environment variables, bugs in our sudo code, etc.) then we will catch
those in the LXD testing (which happens in CI and is generally easier
for developers to run themselves) rather than when we first run tests on
a non-LXD platform and go through the sudo codepath.
Am I missing a benefit of using uid=0 directly which outweighs the
benefit of having fewer codepaths in our testing framework?
|
|
Good discussion, but are there any actual disagreements to what's proposed in this PR at this point? If not, can we land it? |
I guess what was confusing me was why are we changing this here. But I think I am understanding now, that pycloudlib's It also says that stdin is bytes, not a string which commit 06fd4fc5d implies. I don't really understand why that was necessary though. that commit though, I don't see where bytes were converted to strings. |
|
|
||
| def execute(self, command): | ||
| return self.instance.execute(command) | ||
| def execute(self, command, *, use_sudo=None) -> Result: |
There was a problem hiding this comment.
What is '*' for ? you dn't use it anywhere.
There was a problem hiding this comment.
Designating keyword only arguments. https://www.python.org/dev/peps/pep-3102/
You can't pass it as a positional arg like this
instance.execute('cat stuff', True)You have to do
instance.execute('cat stuff', use_sudo=True)|
You're correct that pycloudlib isn't consistent and that the docstring you pointed to is wrong. Currently the supported clouds execute as "ubuntu" user while LXD executes as root. canonical/pycloudlib#56 will make everything execute as "ubuntu" user, which I think that's the right decision for pycloudlib generally. The default experience on most clouds is to ssh as non-root, and most don't allow root ssh access. The change in this current PR will allow us to default to running all cloud-init things with root privileges. |
I generally disagree with that change. I do agree that the right user to operate as is root. I misunderstood. I also generally disagree with 'use_sudo' argument to a 'execute' function. So I guess what I suggest should change is that you should not change the function signature. Just invoke pycloudlib's execute with the poorly named 'use_sudo'. |
No they can't . The call is being wrapped. How else do you get root on a cloud instance over ssh than via sudo? What do you mean by having the code "do the right thing"? The point of the change in the cloud-init function signature is to be able to override a default. 99% of the time, I don't expect anybody to specify |
|
The command only gets wrapped if it is a string.
So just using execute(["sudo", "rm", "-rf", "/"]). That will do the right
thing.
…On Thu, Nov 19, 2020, 6:15 PM James Falcon ***@***.***> wrote:
If the caller of 'execute' wants to "use_sudo", they can just use sudo.
No they can't
<https://github.com/canonical/pycloudlib/blob/master/pycloudlib/instance.py#L162>
. The call is being wrapped. How else do you get root on a cloud instance
over ssh than via sudo? What do you mean by having the code "do the right
thing"?
The point of the change in the cloud-init function signature is to be able
to override a default. 99% of the time, I don't expect anybody to specify
use_sudo in the clout-init tests. I can remove it, but I don't see how
having an extra kwarg to override a default is a problem.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#664 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAETMIBSXP2P3SOS3R5ONLSQWRIHANCNFSM4TUVYKPQ>
.
|
smoser
left a comment
There was a problem hiding this comment.
I'm sorry that I commented here without fully understanding what the changes were.
I won't block this, but it really seems like the recent set of changes to pycloudlib made using it more difficult, and we're just inheriting of that here. That is unfortunate. Looking at commit messages and logs there, its not perfectly clear what the problems were, so I can't speak to better ways of having fixed them.
I've left some comments inline that I think you should consider, and I'll add clarification or support if you're interested in such.
| with tmp_file as f: | ||
| contents = f.read() | ||
| return contents | ||
| result = self.execute('/bin/cat {}'.format(remote_path)) |
There was a problem hiding this comment.
don't use full paths to things.
There was a problem hiding this comment.
I can't tell you number of times I've run into some stupid PATH alteration screwing something up because I didn't specify the full path. Admittedly, that's not likely to happen with cat, but I've always thought specifying full paths to things was the best practice. Why do you recommend not doing it?
There was a problem hiding this comment.
Because PATH exists solely for the purpose of allowing you to not specify full PATHs. And if someone has taken the effort of updating PATH so that /mybin/program would be used before /bin/program, then they probably wanted that to happen. Here is an example https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=852569.
And specifically with 'cat', '/bin/cat' is now actually a legacy symlink to be pulled out from under you at some point in the future (https://wiki.debian.org/UsrMerge). On a fresh focal container:
root@f2:~# ls -l /bin
lrwxrwxrwx 1 root root 7 Nov 17 21:40 /bin -> usr/bin
root@f2:~# which cat
/usr/bin/cat
Its really a very similar argument to not "fixing" the broken-ness of 'push_file' or 'pull_file' here, but rather fixing pycloudlib. Stop fighting the system, and fix the system. If PATH is wrong, then figure out why and fix it.
| # First copy to a temporary directory because of permissions issues | ||
| tmp_path = _get_tmp_path() | ||
| self.instance.execute('cp {} {}'.format(remote_path, tmp_path)) | ||
| self.instance.pull_file(tmp_path, local_path) |
There was a problem hiding this comment.
You're not cleaning up the temp file on the other side.
And it seems like if 'pull_file' is broken, then you should fix it in pycloudlib rather than working around with a temp file. Now, instead of just using 'pull_file', you'll have:
- a race condition or failure path based on your _get_tmp_path() colliding with existing files
- an extra process 'cp'
- an extra process for 'rm'
There was a problem hiding this comment.
Changed my tmp file to use a uuid instead so there's no possibility of collisions. I don't think I care about cleaning up the remote end. These are relatively short-lived tests, so I see no harm and it saves us an ssh call.
| # First push to a temporary directory because of permissions issues | ||
| tmp_path = _get_tmp_path() | ||
| self.instance.push_file(local_path, tmp_path) | ||
| self.execute('mv {} {}'.format(tmp_path, remote_path)) |
There was a problem hiding this comment.
Perhaps 'push_file' needs perms and 'owner' arguments ? using 'mv' like this will get you some permissions, but it seems arbitrary. What permissions and ownership should the user expect after calling push_file ?
There was a problem hiding this comment.
I think the conclusion Dan and I came to is, if somebody needs specific permissions they can implement it.
|
@smoser I really think you just have different expectations of what pycloudlib is supposed to be than we do. It's a light wrapper around some of the most common cloud operations. We're not going to add some implicit "execute as root" or "transfer as root" facilities there because that would involve altering the image more than we're (at least me) comfortable with. Especially since we have a number of tests that launch an instance, login via SSH, and then verify the expected ssh settings or /etc/passwd settings. The "right" way to do something as root is login as ubuntu and run sudo. |
Tests previously assumed that when executing commands and transferring files that user will have root permissions. This change updated integration testing infrastructure so that is true.
b1914ae to
fa640f5
Compare
OddBloke
left a comment
There was a problem hiding this comment.
Thanks for this, James. I think we came to the conclusion that the pycloudlib API could probably take some improvement here, but landing this gives us internal consistency that allows us to proceed with writing integration tests.
Proposed Commit Message
Ensure proper root permissions in integration tests
Tests previously assumed that when executing commands and transferring
files that user will have root permissions. This change updated
integration testing infrastructure so that is true.
Additional Context
n/a
Test Steps
pytest tests/integration_tests/
Checklist: