Skip to content

WIP: Resize encrypted partitions#1032

Closed
TheRealFalcon wants to merge 3 commits into
canonical:mainfrom
TheRealFalcon:encrypted-growpart
Closed

WIP: Resize encrypted partitions#1032
TheRealFalcon wants to merge 3 commits into
canonical:mainfrom
TheRealFalcon:encrypted-growpart

Conversation

@TheRealFalcon
Copy link
Copy Markdown
Contributor

@TheRealFalcon TheRealFalcon commented Sep 22, 2021

Proposed Commit Message

TODO...

Additional Context

Test Steps

Can't test on a fully encrypted system yet, but approximated what we're trying to achieve on openstack with ephemeral drives.

Using this cloud-config:

#cloud-config
growpart:
  mode: auto
  devices: ["/", "/dev/mapper/disk2", "/dev/vdc1"]  # or /mnt instead of /dev/mapper/disk2

Launch an instance with:

openstack server create --flavor m1.small --image  827195be-5e47-40f1-8f40-32c30b4965b7 --key-name newt --nic net-id=falcojr_admin_net test-growpart --ephemeral size=1 --ephemeral size=1 --wait --user-data /home/james/my-user-data

add floating ip here
upload cloud-init deb and install it

Run:

echo '{"key":"XFmCwX2FHIQp0LBWaLEMiHIyfxt1SGm16VvUAVledlY=","slot":5}' > /cc_growpart_keydata  # write to /cc_growpart_keydata
umount /mnt
parted --script /dev/vdb mklabel gpt mkpart primary ext4 100 1000
parted --script /dev/vdc mklabel gpt mkpart primary ext4 100 1000

lsblk should show a /dev/vdb1 and /dev/vdc1 with 859M size of partition.
possibly reboot here because jbd2 won't let go of vdb and rerun parted

After reboot, run:

echo 'XFmCwX2FHIQp0LBWaLEMiHIyfxt1SGm16VvUAVledlY=' | base64 --decode | cryptsetup -y -v luksFormat /dev/vdb1 --key-file - --key-slot 5
echo 'XFmCwX2FHIQp0LBWaLEMiHIyfxt1SGm16VvUAVledlY=' | base64 --decode | cryptsetup luksOpen --key-file - /dev/vdb1 disk2
mkfs.ext4 /dev/mapper/disk2
mount /dev/mapper/disk2 /mnt
cloud-init clean --logs; cloud-init init --local; cloud-init init; cloud-init modules --mode=config; cloud-init modules --mode=final; cloud-init status --wait --long

lsblk should show a resized sizes for both encrypted and unencrypted device with no errors in /var/log/cloud-init.log

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@TheRealFalcon TheRealFalcon added the wip Work in progress, do not land label Sep 22, 2021
Comment thread cloudinit/config/cc_growpart.py Outdated


def resize_encrypted(blockdev):
subp.subp(['cryptsetup', '--key-file', '<keyfile>', 'resize', blockdev])
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't work as-is, but not sure where I'm supposed to get the key from.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would not need key-file at all, as it would be in kernel keyring already, because the drive is unlocked.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, there is backup recovery keyfile on disk too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I pushed a change to remove that argument. It should be testable now.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't actually work unless you supply a key. The cryptsetup documentation is a bit confusing here for the resize action.

By default, cryptsetup passes the volume key to the dm-crypt target via the kernel keyring by supplying the key description to the device mapper. But the key is added to a thread keyring which is deleted once cryptsetup exits. When you run "cryptsetup resize", there is nothing in the keyring that can help you here, and cryptsetup needs to supply the volume key to the device mapper again when reconfiguring the dm-crypt target.

We do add keys to the root user keyring during early boot, but the descriptions of those are in a format that's private to https://github.com/snapcore/secboot/blob/master/keyring.go. And snap-bootstrap probably needs rebasing on the latest snapd code for this to even be true.

Perhaps we need to agree a way to pass a key description from early boot to cloud-init so that it can retrieve the key we add to the kernel, which it would then need to supply via --key-file (which can read from stdin ok).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisccoulson , is there something we can do in the mean time, or is there currently no way for cloud-init to get this key?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is currently no way for cloud-init to get a key.

I have another idea - I could have the provisioning tool add another keyslot and drop a key inside the encrypted root filesystem. Cloud-init could use this, and then delete the file and the corresponding keyslot from the LUKS2 container once the resize is done. How does that sound?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work for me!

@TheRealFalcon TheRealFalcon changed the title Resize encrypted partitions WIP: Resize encrypted partitions Sep 22, 2021
Comment thread cloudinit/config/cc_growpart.py Outdated

def get_underlying_partition(blockdev):
command = 'dmsetup deps -o devname {}'.format(blockdev)
dep = subp.subp(command.split())[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we should prefer the following for which does not use split or format and uses long format arguments (readability).

dep = subp.subp(["dmsetup", "deps", "--options=devname", blockdev])[0]

Also, you really should check the error code... if this returns non-zero, and you try to parse stdout, then you'll likely get an IndexError, and raise a non-helpful exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Was planning some additional error checking in other places too...though...subp will raise error in non-0 exits already. I just don't remember how ugly that is in the logs.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Oct 7, 2021

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Oct 7, 2021
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Oct 7, 2021
@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Oct 22, 2021
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Oct 22, 2021
Copy link
Copy Markdown

@chrisccoulson chrisccoulson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this, I've left a few comments.

Comment thread cloudinit/config/cc_growpart.py Outdated
blockdev should look something like '/dev/mapper/disk1' if True,
otherwise something like /dev/vdb1.
"""
is_mapped = blockdev.startswith('/dev/mapper/')
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't check for /dev/mapper here - I'd resolve the symlink to the underlying /dev/dm-* device node first and check that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisccoulson , how would I know which underlying /dev/dm-* device to use?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/dev/mapper/whatever is a symlink to the underlying block device. It depends a bit what you're really asking here but "os.path.realpath(blockdev).startswith('dm-')" is one possible implementation of this logic that's better than what you have here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry it wasn't clear - the block device path you get could be any one of the user-space managed links in /dev/mapper or /dev/disk/ - the more correct thing to do is to resolve the symbolic link to obtain the actual block device path, and then check if the path you get begins with /dev/dm-

Comment thread cloudinit/config/cc_growpart.py Outdated
is_encrypted = True
if not is_encrypted:
with suppress(subp.ProcessExecutionError):
subp.subp(['cryptsetup', 'isLuks', blockdev])
Copy link
Copy Markdown

@chrisccoulson chrisccoulson Oct 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cryptsetup isLuks only works on the underlying block device.

In any case, the cryptsetup resize action can work on other types of targets created by cryptsetup (other than bitlocker and truecrypt containers). Perhaps parsing the output of cryptsetup status and making sure the "type" field is not "n/a", "TCRYPT" or "BITLK" might be an idea instead, as an alternative to using cryptsetup isLuks? If you did that, you might want to rename is_encrypted to something else.

It's fine to limit the resizing only to mapped devices that are backed by luks containers, but you'll need to resolve the device mapper path to the underlying block device path first before calling cryptsetup isLuks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisccoulson The goal here was to support both types of paths. If cryptsetup status passed, then we return True. If not, check to see if we've been passed the underlying device and try cryptsetup isLuks instead. Given how I'm passing things currently, that isLusk check is kind of pointless so I can just remove it.

Would it be valid to simply call cryptsetup status on the device, and return True if that returns 0?

Comment thread cloudinit/config/cc_growpart.py Outdated
except (TypeError, ValueError) as e:
info.append((devent, RESIZE.SKIPPED,
"device_part_info(%s) failed: %s" % (blockdev, e),))
if is_mapped_device(blockdev) and is_encrypted(blockdev):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to question whether the is_mapped_device() and is_encrypted() is the right logic here, because you could potentially have a mapped device where you want to grow the underlying partition without doing the corresponding cryptsetup resize step, which just updates the live dm-crypt mapping to reflect the new underlying partition size.

But thinking about it, it's probably ok as it is like this - it seems fair enough that we probably don't want to touch the underlying partition if we don't know what tool manages the device mapping and therefore we don't know how to update the live mapping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems fair enough that we probably don't want to touch the underlying partition if we don't know what tool manages the device mapping and therefore we don't know how to update the live mapping.

I don't understand. The code here is meant to resize both the underlying partition via growpart, then the mapped device with the cryptsetup resize. Is that not what you're seeing?

@TheRealFalcon
Copy link
Copy Markdown
Contributor Author

PR updated based on comments. cloud-init now reads the key off a json file on disk as suggested.

Copy link
Copy Markdown
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far this looks good. I made one comment. I assume that testing/docs are planned prior to merge.

I don't see any integration tests for this module so far. I don't know if that's due to the complexity of LXC with filsystems or if we just never got around to it before. Assuming we want to add them, one idea is we could possibly try to mount a loop device (file backed drive) into an LXC container as a block device for testing. If LXC can't handle something about that, qemu would work, (but we should try LXC first I think to avoid adding a dependency to the integration tests). If help is wanted with integration tests for this, feel free to ping.

command = ["dmsetup", "deps", "--options=devname", blockdev]
dep = subp.subp(command)[0]
try:
# Returned result should look something like:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the source code[1] it looks like it is possible that the output could possibly look like:

1 dependencies : (vdb1)

or

2 dependencies : (vdb1) (vdb2)

or

N dependencies : (vdb1) (vdb2) ...(vdbN)

I don't have a sense for how likely this is, however it looks like this won't be handled currently. I think a warning when N>1 might be more appropriate (or perhaps even handling multiple devices).

[1] Relevant snippet:

2721         for (i = 0; i < deps->count; i++) {
2722                 major = (int) MAJOR(deps->device[i]);
2723                 minor = (int) MINOR(deps->device[i]);
2724 
2725                 if ((_dev_name_type == DN_BLK || _dev_name_type == DN_MAP) &&
2726                     dm_device_get_name(major, minor, _dev_name_type == DN_BLK,
2727                                        dev_name, PATH_MAX))
2728                         printf(" (%s)", dev_name);
2729                 else
2730                         printf(" (%d, %d)", major, minor);
2731         }

@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Nov 20, 2021
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Nov 22, 2021
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 7, 2021

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Dec 7, 2021
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Dec 8, 2021
@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Dec 23, 2021
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Dec 23, 2021
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 7, 2022

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Jan 7, 2022
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Jan 7, 2022
@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging mitechie, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag mitechie to reopen it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale-pr Pull request is stale; will be auto-closed soon wip Work in progress, do not land

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants