Conversation
|
@tyler there are a few things I'd like to add to the public API surface and documentation to go with this. It would probably be easiest if I did that on a branch and then PRed into this, is that alright with you? |
| AddrLocation::Globals | AddrLocation::SigStack => { | ||
| tracing::error!("UFFD pagefault at unexpected location: {:?}", loc); | ||
| uffd.wake(fault_page as *mut c_void, host_page_size()) | ||
| .map_err(|e| Error::InternalError(e.into()))?; |
There was a problem hiding this comment.
One of the things I've been seeing in the mmap-related PRs is a re-examination of errors like these. If there had been a UFFD pagefault at an unexpected location, should the lucet runtime return its generic "InternalError" or should it just panic?
See related PR: #486
There was a problem hiding this comment.
This is a good call, it should just panic. I think @acfoltzer wanted to take care of that in a second PR. Though tbh I think we should just do it here.
| // zero the sigstack | ||
| (slot.sigstack, limits.signal_stack_size), | ||
| ] | ||
| .into_iter() |
There was a problem hiding this comment.
I am new to userfaultfd, and I'm wondering about a small difference between its implementation of new_instance_with and the mmap implementation.
lucet/lucet-runtime/lucet-runtime-internals/src/region/mmap.rs
Lines 121 to 122 in 83fd224
There was a problem hiding this comment.
So yeah, that's intentional here. We're making the allocation of the stack lazy. If you look at the other lines changed in that commit, you'll see where I make page faults in the Stack area zero out the page on demand.
That said, it does point to a lack of documentation here that I should fix.
| .map_err(|e| Error::InternalError(e.into()))?, | ||
| ); | ||
|
|
||
| // map the chunk of virtual memory for all of the slots |
There was a problem hiding this comment.
I'm guessing the answer to my question could be here. The memory is already marked as READ/WRITE for all the slots.
There was a problem hiding this comment.
lucet/lucet-runtime/lucet-runtime-internals/src/region/uffd.rs
Lines 136 to 139 in e7fe5f1
There was a problem hiding this comment.
Hm. If I'm reading that correctly, does that mean that the stack isn't zeroed out until it's actually needed?
There was a problem hiding this comment.
It may help to refer to the ioctl_userfaultfd(2) man page on this, since "zeropage" does a lot more than what it sounds like: http://man7.org/linux/man-pages/man2/ioctl_userfaultfd.2.html (search for UFFDIO_ZEROPAGE on that page).
Basically zeropage is one of several ways that the uffd worker thread can resolve a page fault in the memory that it has registered. The stack starts off as mprotect read/write but madvise don't need, so accesses will trigger a uffd fault. We then recognize that the fault address is in the stack, and then call zeropage to both zero out the memory and resolve the fault, so that the instance's thread wakes up and can make progress once again.
6e60af6 to
fc3049a
Compare
|
Now that CircleCI is running mmap and userfaultfd tests, will we be able to merge this? |
CircleCI experiments are currently only happening on branches, but I'm hoping to get a proper PR together today. |
|
So excited! |
…rting with entrypoint_test.
… a fork-related bug.
- Instantiates the suite in `lucet_runtime_internals::alloc::tests` for `UffdRegion - Fixes early return issues in `UffdRegion` similar to #455 - Adds a test to show that the per-instance heap limit applies to runtime expansions, not just initial instantiation - Refactors `validate_runtime_spec` to take the per-instance heap limit as an additional argument. This centralizes the logic for rejecting initially-oversized heap limits, and makes it clearer what's happening in each region's instantiation logic. - Removes the `UffdRegion`'s assertion that signal stack size is a multiple of page size. Since the user can now control this as a parameter, we reject it gracefully when validating `Limits` rather than panicking.
Leaving the question of errors in the handler alone for this commit, since that'll be a more major change.
Notably, this should get us building and running uffd in Linux CI. It turns out to be a tremendous pain to enable a feature flag for just one crate within a workspace. The situation is [being addressed][1], but in the meantime I believe the best route forward is to just have uffd on by default for Linux. [1]: rust-lang/cargo#5364
fc3049a to
a96cb55
Compare
Co-authored-by: Adam C. Foltzer <acfoltzer@fastly.com>
…d of a host page.
…ization works as expected.
… should handle it gracefully, rather than imploding.
|
Ready for review :) |
fst-crenshaw
left a comment
There was a problem hiding this comment.
Approve approve approve!
|
The GitHub Actions tests are failing since they can't run the userfaultfd tests. I thought I'd try a little PR to have the offensive tests get skipped by GitHub Actions, since they are being run by CircleCI. See: #528 |
This pull request integrates Fastly's
userfaultfdbased memory management backend as an alternate Region implementation.Userfaultfd is a Linux-specific mechanism for handling page faults within userspace. (See https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html for a good technical explanation of it.) For Lucet, the use case is that when an Instance is started, none of the linear memory has to be copied initially.
We register the entire region with userfaultfd. When an instance is started, we set up the stacks, metadata, etc as normal, but we leave the pages of the linear memory "missing". When the instance starts and tries to access one of the linear memory pages, this triggers a page fault, as there is no physical memory backing the virtual memory. Since the region is registered with userfaultfd, this triggers a message to be sent to the userfaultfd handler thread. The handler thread determines which instance and module has faulted, and copies the necessary memory into place, before reawakening the instance thread.
At its core what this does is provide a way of reducing startup time and increasing flexibility in how memory is handled at the cost of increased runtime overhead (by was of context switches especially).
(All credit for this actually goes to @acfoltzer.)