Resize encrypted partitions (SC-353)#1316
Conversation
|
|
fdb12b5 to
6229c94
Compare
39e5497 to
beae156
Compare
|
This will also require adding |
| LOG.debug( | ||
| "Determined that %s is %sencrypted", | ||
| blockdev, | ||
| "" if is_encrypted else "not", |
There was a problem hiding this comment.
| "" if is_encrypted else "not", | |
| "" if is_encrypted else "not ", |
|
|
||
| def get_underlying_partition(blockdev): | ||
| command = ["dmsetup", "deps", "--options=devname", blockdev] | ||
| dep: str = subp.subp(command)[0] # type: ignore |
There was a problem hiding this comment.
Looking over cryptsetup dependencies, dmsetup is a Depends and we do an early check/exit in is_encrypted. If cryptsetup is absent, I don't think we need to handle CalledProcessErrrors here because we expect dmsetup will exist in order to get to this path.
Question about this in general, do we always expect any images which support encrypted disk resize to add cryptsetup package to the image above and beyond ensuring cloud-init is installed in the image?
There was a problem hiding this comment.
Question about this in general, do we always expect any images which support encrypted disk resize to add cryptsetup package to the image above and beyond ensuring cloud-init is installed in the image?
I'm assuming so, but that'd be a question for CPC. Even if we were to fail here though, it'd get caught here with a slightly uglier error message: https://github.com/canonical/cloud-init/pull/1316/files#diff-326a9a2b453b20b3c3bb518217c1ffc2849e8ee82824ecfe359027b1e4237c77R466
| raise RuntimeError( | ||
| "Could not load encryption key. This is expected if " | ||
| "the volume has been previously resized." | ||
| ) from e |
There was a problem hiding this comment.
Is there a way we can make this idempotent by introspecting that we have already resized and encrypted partition? I'm thinking in terms of if someone runs cloud-init clean --logs --reboot will be generate a Runtime error in this case?
There was a problem hiding this comment.
The actual effect is just an extra log (see my examples up top):
2022-03-28 21:22:55,202 - cc_growpart.py[DEBUG]: '/' FAILED: Resizing encrypted device (/dev/mapper/cloudimg-rootfs-03de45d2-f355-457e-b8e8-87c48127d0a8) failed: Could not load encryption key. This is expected if the volume has been previously resized.
Do you have any suggestions? We can't count on our normal semaphores, so cloud-init would have to store something more permanent which seems could be problematic if disks ever get hot-swapped or anything along those lines.
There was a problem hiding this comment.
The only suggestion I wonder about is if we should really just early exit on absence of KEYDATA_PATH as mentioned above.
| httpretty>=0.7.1 | ||
| pytest | ||
| pytest-cov | ||
| pytest-mock |
There was a problem hiding this comment.
Let's add this to build-depends in packages/pkg-deps.json to make sure we are injecting that dep during package build/tests during local package builds from cloud-init tooling.
There was a problem hiding this comment.
As in add a new build-depends section to the debian section? Should we add other dependencies there too, like the reponses package we just added?
There was a problem hiding this comment.
We looked over packages/bddeb (which sources information via the command ./tools/read-dependencies --distro debian during the local package build ./packages/bddeb. What I neglected to recall is that bddeb grew the ability to also read build dependencies from test-requirements.txt. So, I don't think we need to add those deps in pkg-deps.json too since you already added it to test-requirements.txt. You can validate that with python3-pytest-mock is listed in ./tools/read-dependencies --requirements-file test-requirements.txt --system-pkg-names
| return False | ||
| is_encrypted = False | ||
| with suppress(subp.ProcessExecutionError): | ||
| subp.subp(["cryptsetup", "status", blockdev]) |
There was a problem hiding this comment.
Do we want to make this catch more specific and log a warning in case there are other failure paths we might hit here which would force an is_encypted False value?
There was a problem hiding this comment.
cryptsetup should only return 0 if it's actually encrypted. Any other rc will raise the ProcessExecutionError.
Are you saying we should log the return code, just in case there's a different problem with running cryptsetup?
There was a problem hiding this comment.
I guess what I'm thinking here two things:
- Should we really only surpress a specific non-zero exit code and log any unexpected error messages and codes here
- Also, running cryptsetup status on a thin volume group I created still returns 0 exit code, but it isn't an encrypted device, so a zero exit code may be a false positive in this case for any LVM setup?
csmith@downtown:~$ sudo cryptsetup status /dev/mapper/sru_lvmpool-LXDThinPool_tdata
/dev/mapper/sru_lvmpool-LXDThinPool_tdata is active and is in use.
type: n/a
csmith@downtown:~$ echo $?
0There was a problem hiding this comment.
I'm wondering if we can use dmsetup ls --target crypt here to list known crypt devices.
On my vm with luks enabled for the root FS I get
$ dmsetup ls --target crypt
dm_crypt-0 (253, 0)There was a problem hiding this comment.
The problem with dmsetup ls is that it's not easy to know how to tie values in that list to the specific device being resized.
In a previous iteration of this PR, I had a cryptsetup isLuks check. Based on a misunderstanding of a comment, I removed it. It looks like he was saying that we don't need to restrict resizing mapped devices to only Luks containers, but I want to specifically do that right now as we haven't had support for LVM sizing, and I'd rather not add it here without additional testing. So I added back the isLuks check.
There was a problem hiding this comment.
+1 this context makes sense. And crypsetup isLuks behaves as expected on my non-luks volume group. I'm +1 on restricting this behavior to just encrypted partitions at the moment given the artifacts which allow for resizing. We can iterate toward LVM resizing in separate efforts.
| try: | ||
| KEYDATA_PATH.unlink() | ||
| except Exception: | ||
| util.logexc( |
There was a problem hiding this comment.
It'd be nice to avoid calling logexc where possible as it leaves a large tracebacks in the logs which doesn't generally add a whole lot of valuable content to the failure. Maybe just append the str(e) there with a LOG.warn?
There was a problem hiding this comment.
I agree for the subp call, and I changed that one. For here though, if an unlink() call fails, something very strange is going on and I think a traceback would be extremely helpful.
| ) from e | ||
|
|
||
|
|
||
| def resize_encrypted(blockdev, partition): |
There was a problem hiding this comment.
Can this function return a tuple of (RESIZE.SKIPPED|NOCHANGE|CHANGED) and message/details that can be consumed by the call-site?
It may provide us with a mechanism to early exit in the event that no KEYDATA_PATH.exists which could indicate that resize already happened or that the crypt support just doesn't exist for this device.
| key is base64 encoded. Example: | ||
| {"key":"XFmCwX2FHIQp0LBWaLEMiHIyfxt1SGm16VvUAVledlY=","slot":5} | ||
| """ | ||
| try: |
There was a problem hiding this comment.
Should we pre-flight check existence of this KEYDATA_PATH and return SKIPPED in the event that it doesn't exist?
Then our try/except block below can only handle invalid data/format/decode errors if they exist and we avoid raising a FAILURE message in logs.
| try: | |
| if not KEYDATA_PATH.exists(): | |
| return (RESIZE.SKIPPED, f"No {KEYDATA_PATH} exists to decrypt device {blockdev}") | |
| try: |
| raise RuntimeError( | ||
| "Could not load encryption key. This is expected if " | ||
| "the volume has been previously resized." | ||
| ) from e |
There was a problem hiding this comment.
The only suggestion I wonder about is if we should really just early exit on absence of KEYDATA_PATH as mentioned above.
|
@blackboxsw , I believe I have addressed (or told you why I ignored 😜 ) all of your comments except the |
blackboxsw
left a comment
There was a problem hiding this comment.
Thanks @TheRealFalcon +1 from me on this work. Probably a good move to unlink the keyfile as you did in the event of unexpected errors resizing. The worst case scenario in unexpected failure case is that the encrypted disk isn't big enough.
Take 2 of #1032
Proposed commit message
Testing
I tried to structure the unit tests such that they act more end-to-end, rather than testing individual functions at a time. This involved some additional mocking setup, but I think it's overall worth it.
As of right now, an in-tree integration test is impractical.
Following a Canonical internal guide, I setup an encrypted VM and manually verified the behavior. You can see parts of the logs and commands here: