Skip to content

Ensure proper root permissions in integration tests#664

Merged
OddBloke merged 8 commits into
canonical:masterfrom
TheRealFalcon:sudo-tests
Nov 23, 2020
Merged

Ensure proper root permissions in integration tests#664
OddBloke merged 8 commits into
canonical:masterfrom
TheRealFalcon:sudo-tests

Conversation

@TheRealFalcon
Copy link
Copy Markdown
Contributor

Proposed Commit Message

Ensure proper root permissions in integration tests

Tests previously assumed that when executing commands and transferring
files that user will have root permissions. This change updated
integration testing infrastructure so that is true.

Additional Context

n/a

Test Steps

pytest tests/integration_tests/

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

"""Use getent to interrogate the various expected outcomes"""
result = class_client.execute(["getent"] + getent_args)
assert re.search(regex, result.stdout) is not None
assert re.search(regex, result.stdout) is not None, (
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed some additional debug output for this failure

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that re.* return None for a missing match means that there's no good way to improve this assertion output generically, unfortunately; there's no way to infer the regex or the text it's matching against in pytest_assertrepr_compare.

("md5sum </root/file_binary", "3801184b97bb8c6e63fa0e1eae2920d7"),
("sha256sum </root/file_binary", (
("md5sum /root/file_binary", "3801184b97bb8c6e63fa0e1eae2920d7"),
("sha256sum /root/file_binary", (
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipes and redirection are still gonna be weird after this change (as is always the case when using sudo).

Pycloudlib currently does a sh -c <command> to any commands passed to it via execute. We could perhaps add a sudo kwarg there, though I'm not sure if that's necessary or better than this change.

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

See also, canonical/pycloudlib#54

@TheRealFalcon TheRealFalcon removed the wip Work in progress, do not land label Nov 17, 2020
Copy link
Copy Markdown
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks James! Overall, this looks good, and could probably land as-is (well, with s/mv/cp/ in the place I commented). However, I think we need to bump our pycloudlib dep to at least canonical/pycloudlib@3d02fef to be able to pass use_sudo. I also have some other, less important, thoughts.

Comment thread tests/integration_tests/instances.py Outdated
def pull_file(self, remote_path, local_path):
# First copy to a temporary directory because of permissions issues
tmp_path = _get_tmp_path()
self.instance.execute('mv {} {}'.format(remote_path, tmp_path))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.instance.execute('mv {} {}'.format(remote_path, tmp_path))
self.instance.execute('cp {} {}'.format(remote_path, tmp_path))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not opposed to the change, but curious why cp is better? Is there a reason to keep the temporary file around?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH...this is pull. Nevermind 😄

Comment thread tests/integration_tests/instances.py Outdated
def execute(self, command):
return self.instance.execute(command)
def execute(self, command) -> Result:
return self.instance.execute(command, use_sudo=self.use_sudo)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we need to be able to unpin/bump the pin of our pycloudlib dependency to land this, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but tests will fail if we bump the dependency before canonical/pycloudlib#52 lands.

Comment thread tests/integration_tests/instances.py Outdated
return contents
result = self.execute('/bin/cat {}'.format(remote_path))
if result.failed:
raise CalledProcessException(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For read_from_file, it feels a little strange to get a CalledProcessException; the calling of a process is an implementation detail. Ideally, I think we'd fix canonical/pycloudlib#39 and then this could use the common exception class that that (to my mind) implies.

Would we be better off raising a builtin exception of some description in the meantime, and adding a note to that issue (and perhaps a TODO comment) indicating that once we have a better pycloudlib exception we should switch to using that?

Comment thread tests/integration_tests/instances.py Outdated

def execute(self, command):
return self.instance.execute(command)
def execute(self, command) -> Result:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are likely to want to be able to run commands without privilege as part of testing: should we add use_sudo as a parameter here, defaulting to self.use_sudo?

"""Use getent to interrogate the various expected outcomes"""
result = class_client.execute(["getent"] + getent_args)
assert re.search(regex, result.stdout) is not None
assert re.search(regex, result.stdout) is not None, (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that re.* return None for a missing match means that there's no good way to improve this assertion output generically, unfortunately; there's no way to infer the regex or the text it's matching against in pytest_assertrepr_compare.

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

Moving back to WIP as we need canonical/pycloudlib#52 to land first

@TheRealFalcon TheRealFalcon added wip Work in progress, do not land and removed wip Work in progress, do not land labels Nov 17, 2020
@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

I've removed the WIP flag, addressed the comments, and bumped pycloudlib's version to the latest.

Comment thread tests/integration_tests/instances.py
@smoser
Copy link
Copy Markdown
Collaborator

smoser commented Nov 19, 2020

I'm moving a conversation that started in #676 over here.

[smoser]
imo the test harness 'exec' should be done as root.

[OddBloke]
In non-LXD cases (where we SSH into instances as non-root), that would mean that running commands as non-root would be going through sudo twice: once to do the exec as root, and once to get back to the default user.

It feels to me like you're thinking too much about the implementation, and a premature optimization.One of the two "transports" (i'm not sure what they're called) uses ssh as unpriviledged user, and then, yes does have to use sudo to become root. The other (lxd) starts as root. so that seems like a wash.

Root is a strict superset of functionality over unprivileged user. What things are you wanting to test that want to run as non-root? It seems like that would be a small subset, and really those would be more OS tests than cloud-init tests.

It would also mean that tests have to know what the default username is (e.g. ubuntu vs. ec2-user) for the currently running instance; currently that can be contained in the framework code which handles the SSH connection (or lxc exec call).

It seems like the harness has to know what the default username is in the instance, or it would not have gotten in via ssh. So the harness should just make that available to the test if the test needs to know it.

You could just do something like:

def exec(cmd, as_user=root):
    if as_user == DEFAULT_USER:
        as_user = getCorrectDefaultUser()

but I really do think that there are probably very few things you want to do that need to be done as unpriviledged user.

(Writing this out, I realise there's a gap here if root is the default SSH user: https://github.com/canonical/cloud-init/pull/664/files#r526453646)

Executing the commands over ssh has all sorts of shortcomings compared to some more direct option (like is easily available in LXD).

  • What if default user does not have sudo access?
  • You can't get information at all if ssh doesn't come up?
  • What if ssh access is disabled ?

I had spent some time at one point trying to implement a execution layer for cloud-init tests that did not rely on ssh. My idea was just to set up a service listening on a socket that would provide the necessary functionality (run commands, grab files ... ). The socket would need to be IP on EC2, but it could take advantage of any host<->guest socket (vsock(7)) or the lxd socket.

@OddBloke
Copy link
Copy Markdown
Collaborator

OddBloke commented Nov 19, 2020 via email

@smoser
Copy link
Copy Markdown
Collaborator

smoser commented Nov 19, 2020

On Thu, Nov 19, 2020 at 06:34:02AM -0800, Scott Moser wrote: I'm moving a conversation that started in #676 over here. > > [smoser] > > imo the test harness 'exec' should be done as root. > > [OddBloke] > In non-LXD cases (where we SSH into instances as non-root), that > would mean that running commands as non-root would be going through > sudo twice: once to do the exec as root, and once to get back to > the default user. It feels to me like you're thinking too much about the implementation, and a premature optimization.One of the two "transports" (i'm not sure what they're called) uses ssh as unpriviledged user, and then, yes does have to use sudo to become root. The other (lxd) starts as root. so that seems like a wash.
The key reason for doing this is to maintain consistency between LXD and other clouds: those will gain access to the machine unprivileged necessarily. Apologies for not making that clearer.

Consitency is good. Fix the non-LXD scenario to be consistent with LXD, as that
for this purpose, execution as root is a better default.

I'm voting for both

  • consistency: exec(command) on all platforms happens with uid=0
  • usability: no need to worry about whether or not a file is readable or
    executable for the test-case writer.

@OddBloke
Copy link
Copy Markdown
Collaborator

OddBloke commented Nov 19, 2020 via email

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

Good discussion, but are there any actual disagreements to what's proposed in this PR at this point? If not, can we land it?

@smoser
Copy link
Copy Markdown
Collaborator

smoser commented Nov 19, 2020

It feels like we're conflating two things here: how we execute commands on the platform at all, and the effective UID with which we execute commands.

I guess what was confusing me was why are we changing this here.
I had assumed that the underlying lib was doing the right thing, and that it was consistent.

But I think I am understanding now, that pycloudlib's BaseInstance.execute does not provide a consistent interface across clouds. Can we just fix that? Its behavior does not match its docstring pycloudlib/instance.py:BaseInstance.execute. It explicitly says 'the command to execute as root inside the image.'

It also says that stdin is bytes, not a string which commit 06fd4fc5d implies. I don't really understand why that was necessary though. that commit though, I don't see where bytes were converted to strings.


def execute(self, command):
return self.instance.execute(command)
def execute(self, command, *, use_sudo=None) -> Result:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is '*' for ? you dn't use it anywhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Designating keyword only arguments. https://www.python.org/dev/peps/pep-3102/
You can't pass it as a positional arg like this

instance.execute('cat stuff', True)

You have to do

instance.execute('cat stuff', use_sudo=True)

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

You're correct that pycloudlib isn't consistent and that the docstring you pointed to is wrong. Currently the supported clouds execute as "ubuntu" user while LXD executes as root. canonical/pycloudlib#56 will make everything execute as "ubuntu" user, which I think that's the right decision for pycloudlib generally. The default experience on most clouds is to ssh as non-root, and most don't allow root ssh access. The change in this current PR will allow us to default to running all cloud-init things with root privileges.

@smoser
Copy link
Copy Markdown
Collaborator

smoser commented Nov 19, 2020

You're correct that pycloudlib isn't consistent and that the docstring you pointed to is wrong. Currently the supported clouds execute as "ubuntu" user while LXD executes as root. canonical/pycloudlib#56 will make everything execute as "ubuntu" user

I generally disagree with that change.

I do agree that the right user to operate as is root. I misunderstood.

I also generally disagree with 'use_sudo' argument to a 'execute' function.
sudo is a possible implementation of becomming root, but has side effects other than changing the effective userid (it puts SUDO_UID and SUDO_GID in for example). If you think you need to provide a way for the user to execute code as root (and not do that by default), then use 'root=' or ('uid=0', 'gid=0') and have the code do the right thing. If the caller of 'execute' wants to "use_sudo", they can just use sudo. (This is the same mistake that python's subprocess makes with 'shell=True' argument. If the user wants to use a shell, then let them invoke a shell, or let the wrap the call with a one line function).

So I guess what I suggest should change is that you should not change the function signature. Just invoke pycloudlib's execute with the poorly named 'use_sudo'.

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

If the caller of 'execute' wants to "use_sudo", they can just use sudo.

No they can't . The call is being wrapped. How else do you get root on a cloud instance over ssh than via sudo? What do you mean by having the code "do the right thing"?

The point of the change in the cloud-init function signature is to be able to override a default. 99% of the time, I don't expect anybody to specify use_sudo in the clout-init tests. I can remove it, but I don't see how having an extra kwarg to override a default is a problem.

@smoser
Copy link
Copy Markdown
Collaborator

smoser commented Nov 20, 2020 via email

Copy link
Copy Markdown
Collaborator

@smoser smoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry that I commented here without fully understanding what the changes were.

I won't block this, but it really seems like the recent set of changes to pycloudlib made using it more difficult, and we're just inheriting of that here. That is unfortunate. Looking at commit messages and logs there, its not perfectly clear what the problems were, so I can't speak to better ways of having fixed them.

I've left some comments inline that I think you should consider, and I'll add clarification or support if you're interested in such.

Comment thread tests/integration_tests/instances.py Outdated
with tmp_file as f:
contents = f.read()
return contents
result = self.execute('/bin/cat {}'.format(remote_path))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use full paths to things.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't tell you number of times I've run into some stupid PATH alteration screwing something up because I didn't specify the full path. Admittedly, that's not likely to happen with cat, but I've always thought specifying full paths to things was the best practice. Why do you recommend not doing it?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because PATH exists solely for the purpose of allowing you to not specify full PATHs. And if someone has taken the effort of updating PATH so that /mybin/program would be used before /bin/program, then they probably wanted that to happen. Here is an example https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=852569.

And specifically with 'cat', '/bin/cat' is now actually a legacy symlink to be pulled out from under you at some point in the future (https://wiki.debian.org/UsrMerge). On a fresh focal container:

root@f2:~# ls -l /bin
lrwxrwxrwx 1 root root 7 Nov 17 21:40 /bin -> usr/bin
root@f2:~# which cat
/usr/bin/cat

Its really a very similar argument to not "fixing" the broken-ness of 'push_file' or 'pull_file' here, but rather fixing pycloudlib. Stop fighting the system, and fix the system. If PATH is wrong, then figure out why and fix it.

# First copy to a temporary directory because of permissions issues
tmp_path = _get_tmp_path()
self.instance.execute('cp {} {}'.format(remote_path, tmp_path))
self.instance.pull_file(tmp_path, local_path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not cleaning up the temp file on the other side.

And it seems like if 'pull_file' is broken, then you should fix it in pycloudlib rather than working around with a temp file. Now, instead of just using 'pull_file', you'll have:

  • a race condition or failure path based on your _get_tmp_path() colliding with existing files
  • an extra process 'cp'
  • an extra process for 'rm'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed my tmp file to use a uuid instead so there's no possibility of collisions. I don't think I care about cleaning up the remote end. These are relatively short-lived tests, so I see no harm and it saves us an ssh call.

# First push to a temporary directory because of permissions issues
tmp_path = _get_tmp_path()
self.instance.push_file(local_path, tmp_path)
self.execute('mv {} {}'.format(tmp_path, remote_path))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps 'push_file' needs perms and 'owner' arguments ? using 'mv' like this will get you some permissions, but it seems arbitrary. What permissions and ownership should the user expect after calling push_file ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the conclusion Dan and I came to is, if somebody needs specific permissions they can implement it.

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

@smoser I really think you just have different expectations of what pycloudlib is supposed to be than we do. It's a light wrapper around some of the most common cloud operations. We're not going to add some implicit "execute as root" or "transfer as root" facilities there because that would involve altering the image more than we're (at least me) comfortable with. Especially since we have a number of tests that launch an instance, login via SSH, and then verify the expected ssh settings or /etc/passwd settings. The "right" way to do something as root is login as ubuntu and run sudo.

Tests previously assumed that when executing commands and transferring
files that user will have root permissions. This change updated
integration testing infrastructure so that is true.
Copy link
Copy Markdown
Collaborator

@OddBloke OddBloke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, James. I think we came to the conclusion that the pycloudlib API could probably take some improvement here, but landing this gives us internal consistency that allows us to proceed with writing integration tests.

@OddBloke OddBloke merged commit 6e86d2a into canonical:master Nov 23, 2020
@TheRealFalcon TheRealFalcon deleted the sudo-tests branch June 29, 2021 18:59
This was referenced May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants