Refactor run_async to use custom Future implementation by benaubin · Pull Request #626 · bytecodealliance/lucet

benaubin · 2021-01-15T21:50:34Z

This PR replaces the async fn run_async_internal() with a custom implementation of Future.

Advantages:

Significantly less unsafe code than the original implementation of run_async (just one line, used for pinning the future)
a much sounder execution model: the future is executed from within the guest! When host execution is desired tokio::spawn or similar works from within block_on
Support for defining async fn hostcalls without ever manually calling block_on
Near-zero overhead for polling an immediately-ready future (only the cost of an Arc.clone()). For example, a future like async { 5 } would immediately resolve to 5, without requiring a yield to the host.
Support for regular yielding from an async execution, even from within block_on
A defined type for the future returned by run_async, which makes it easier for embedders to manage the execution, such as by changing the execution bound between polls.

Semantically, guests behave exactly as normal async functions would, where block_on is equivalent to calling await.

It's now possible to declare an async hostcall like an async function:

#[lucet_hostcall]
#[no_mangle]
pub async fn hostcall_async(vmctx: &Vmctx) -> u32 {
    for i in 0..6 {
        YieldingFuture { times: 2 }.await
    }

    return 0;
}

I also a added new try_block_on method, which allows an embedded to define fallback behavior for when an instance is not run from within an async execution context.

Additionally, I removed the inst_count_bound argument in run_async and run_async_start (added in #612, API change is not released) and replaced it with a method and field on the RunAsync future. This is more flexible for users that need the ability to configure a CPU bound (it lets a user change the CPU bound in-between any poll) while being less verbose (and backward compatible with the old api) for users without that need.

benaubin · 2021-01-15T22:43:27Z

In case it's useful, here's one way to implement a soft CPU time limit with this refactor. I'm not sure if it's worth adding an extra dependency to lucet, so I'm not including it in the PR (I'd be happy to, though!).

use std::time::Duration;
use cpu_time;

pub struct RunAsyncWithCPULimit<'a> {
    run_async: RunAsync<'a>,
    pub cpu_time_limit: Duration,
    pub cpu_time_used: Duration,
}

struct CPULimitExceeded;

impl<'a> Future for RunAsyncWithCPULimit<'a> {
    type Output = (
        Result<Result<UntypedRetVal, Error>, CPULimitExceeded>,
        Duration,
    );

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        let start = cpu_time::ThreadTime::now();

        let poll = Pin::new(&mut self.run_async).poll(cx);

        self.cpu_time_used += start.elapsed();

       match poll {
           Poll::Ready(res) => Poll::Ready((Ok(res), self.cpu_time_used)),
           Poll::Pending if self.cpu_time_used > self.cpu_time_limit => {
               Poll::Ready((Err(CPULimitExceeded), self.cpu_time_used))
           }
           Poll::Pending => Poll::Pending
       }
    }
}

Feel free to use this example however you'd like (I'm hereby releasing this example under the CC-0 license)

cfallin

Hi @benaubin,

Thanks for this PR; however, I'm not sure I understand the reasoning behind some of the changes below. If I'm understanding correctly, the changes here create an explicit named type for the future, moving the existing async fn into a custom trait impl with an explicit state machine.

It might help me to understand the need for these if you could summarize why one can't implement the wall-clock bound example above by wrapping the future returned by the current API. For example, I think one could write an add_timeout<F: Future...>(inner: F) -> TimeoutFuture<F> and define TimeoutFuture to store an F, without requiring a named future-state struct on the Lucet side. This is largely like your example above, but with the struct type param to make it work without Lucet changes. Is there a reason this wouldn't work?

lucet-runtime/lucet-runtime-internals/src/future.rs

lucet-runtime/src/c_api.rs

benaubin · 2021-01-19T18:29:29Z

It might help me to understand the need for these if you could summarize why one can't implement the wall-clock bound example above by wrapping the future returned by the current API. For example, I think one could write an add_timeout<F: Future...>(inner: F) -> TimeoutFuture and define TimeoutFuture to store an F, without requiring a named future-state struct on the Lucet side. This is largely like your example above, but with the struct type param to make it work without Lucet changes. Is there a reason this wouldn't work?

To be honest, I completely forgot that you could use generics for this. However, futures created by async/await are Unpin - so it would be more difficult to write (although theoretically possible).

The primary thing that is possible to do with this API but not an async fn is adjusting the instruction bound between polls.

Additionally, the custom Future implementation reduces major sharp edges by making it much more clear how the lucet and the future executor interact.

If an instance polls an immediately ready future, the old API would reset the instruction bound and continue running without yielding. While possible to fix within the async block, having a loop makes this the default behavior, while the custom implementation means an instance will never be called twice due to a single poll.
If you run_async on an instance where bound_expired(), the old implementation would resume the instance rather than fail when attempting to run the instance from the given entry point. Solving this some sort of explicit state machine for the run loop, rather than depending on the instance's state.

In the future, this API makes other features possible:

It would be possible to add an API that allows yielding from a run_async context.
A fully-synchronous API that does not require a std::task::Waker that can drive the RunAsync, except when blocking. This would likely be a much better way to implement timeouts (no need to yield to the executor if the instruction bound expired, but the cpu time-bound did not - improving performance/cache utilization for non-adversarial guests)

benaubin · 2021-02-09T20:40:39Z

@cfallin Hey Chris! I just pushed a new commit that takes advantage of the custom Future implementation to do the first poll of a future without yielding. For futures that are immediately ready, this implementation requires no context switch and no heap allocations - similar to using .await.

lucet-runtime/lucet-runtime-internals/src/instance.rs

lucet-runtime/lucet-runtime-internals/src/future.rs

cfallin · 2021-02-10T23:58:21Z

@benaubin Sorry, I haven't gotten back to this as I've been somewhat short on time. To be honest, I still don't quite understand some of the details and reasoning here. For example, could you explain what you mean by: "takes advantage of the custom Future implementation to do the first poll of a future without yielding." -- first poll of which future? AFAICT in the current code, if the called Wasm function returns soon enough (before first bound expiration), we don't yield and require another poll, we just return a completed value. Or do you mean in some other situation?

From above, the only desired behavior I see that is not yet possible is to set the bound to a different amount on every poll. Everything else is possible with the current API. Is that correct?

Overall, I am hesitant to accept a full rewrite of the core async code, especially one that replaces an async fn with a custom future implementation and especially one that uses unsafe code, without a really really compelling reason. I'd be happy to look at a much smaller-scoped patch that allows for adjusting the bound on each poll, though, if you'd like. Sorry and thanks for the patience here!

benaubin · 2021-02-11T00:00:24Z

No problem!

The change I made yesterday makes it so there is no yield if the future passed to block_on is immediately ready. For example, a call like vmctx.block_on(async { 5 }), would return 5 without needing a yield to the host.

I'm currently working on reducing the amount of unsafe code necessary to make this work.

benaubin · 2021-02-11T00:16:03Z

@cfallin I just pushed a commit that removes nearly all of the unsafe code from run_async, including the transmute that the trampoline-style execution required. There's one unsafe block left related to run_async, which is solely used for pinning the future (which is required in order to poll).

benaubin · 2021-02-11T04:22:18Z

@cfallin Just pushed a new commit that adds support for defining an async hostcall just like an async function:

        #[lucet_hostcall]
        #[no_mangle]
        pub async fn hostcall_async(vmctx: &Vmctx, times: u32, times_double: u32) -> u32 {
            for i in 0..times {
                YieldingFuture { times: 2 }.await
            }

            return times * 2;
        }

Like Rust's .await, there is no overhead for a future that is immediately ready, and all execution of the future takes place from within the guest. To make this work, if the future passed to block_on returns Poll::Pending, block_on yields to RunAsync, which itself returns Poll::Pending.

Later, the async executor polls the RunAsync future, which resumes the guest, which polls the future from within the guest execution context.

benaubin · 2021-02-12T14:59:11Z

Rebased onto main, consolidated commits (happy to squash into a single commit if requested, but I'm separating out a few of them in case some of the changes are rejected).

@cfallin the changes should be ready for review. I know that my original changes weren't massive improvements, but these more recent ones much larger advantages, while nearly-eliminating the unsafe code in run_async:

significantly less unsafe code than the original implementation of run_async (just one line, used for pinning the future)
a much sounder execution model: the future is executed from within the guest! When host execution is desired tokio::spawn or similar works from within block_on
support for defining async fn hostcalls without ever manually calling block_on
near-zero overhead for polling an immediately-ready future (only the cost of an Arc.clone())
support for yielding from an async execution, even from within block_on

Semantically, guests behave exactly as normal async functions would, where block_on is equivalent to calling await.

cfallin

@benaubin, thanks for the updates and sorry for the delay on the further review!

I think this is definitely going in the right direction, and given the reductions in unsafe code and after reading and understanding further the async-hostcall implementation, I'm inclined toward making sure this gets in.

I do have one pending uncertainty, though, below, regarding a possible race (it's possible you've thought through this already and if so I'd appreciate a clarifying comment :-) ).

I think that w.r.t. the async hostcall implementation we should also get any input that @pchickey might have. It looks reasonable to me but I may be missing some requirement here.

lucet-runtime/lucet-runtime-internals/src/instance.rs

lucet-runtime/lucet-runtime-internals/src/future.rs

benaubin · 2021-02-17T05:24:26Z

Awesome! Thanks, @cfallin. I'll make the changes tomorrow.

pchickey · 2021-02-17T21:26:37Z

Thanks for working on this! Many of the futures bits of this are over my head, but the change to the macro to accept async fn is welcome. I had planned to do something just like that for compatibility with wasmtime :)

benaubin · 2021-02-18T18:49:53Z

The changes have been made and the test for yielding from async has been added!

cfallin

Thanks @benaubin -- this LGTM. I think it might be good to have one more sanity-check on the core future/waker logic -- @alexcrichton, would you mind deploying your async expertise here in a final check?

We'll also need to trigger the CircleCI results; usually this occurs when we push to a branch in this repo, but in this case you won't have permissions to do that. @pchickey / @iximeow / @acfoltzer, is there a way around this or will one of us need to push a branch and create a new PR in place of @benaubin?

pchickey · 2021-02-19T00:42:46Z

I am not aware of a workaround to the CircleCI problem, unfortunately - I can push a branch and once it goes green i'll link to the status here and we can merge this PR.

pchickey · 2021-02-19T00:48:13Z

https://app.circleci.com/pipelines/github/bytecodealliance/lucet/430/workflows/36261c81-0205-41df-8626-2479a9261e76

pchickey · 2021-02-19T00:56:57Z

you'll need to merge the latest main to fix the cargo audit error

benaubin · 2021-02-19T01:03:32Z

@pchickey Done.

By replacing the expected yield value from a Box<PhantomData<T>> to TypeId, it becomes possible to resume instance execution when the size of a boxed type is unknown and cannot be passed as an argument. This makes it possible to yield and resume from an async context by passing the resumption value through RunAsyncState. Finally, we have to check that the context didn't change between resumption within try_block_on, in order to make sure the right Waker is scheduled and the future will be resumed when it is ready.

cfallin

re-r+'ing for my part after updates.

alexcrichton

As a baseline I'm pretty unfamiliar with the internals of Lucet, and I'm unfortunately finding it difficult to disentangle "just the futures bits" from the rest of the Lucet embedding/runtime. In that sense I can say that after reading this nothing looks obviously wrong but I can't say with conviction that this all looks good to me.

I'm basically trying to find parallels to the async implementation in wasmtime because I believe at a high-level the two async strategies are the same but I was finding it pretty difficult to draw the parallels because I don't understand enough of Lucet.

That being said it looks like this has gotten reviews from other folks, so that's probably good enough? If needed, though, I can set aside more time to learning more of Lucet first and then digging more into the changes here.

lucet-runtime/lucet-runtime-internals/src/future.rs

benaubin · 2021-02-19T21:18:39Z

@alexcrichton It's indeed extremely similar to your wasmtime PR. Lucet runs every instance within a context switch, so it's very similar to wasmtime's fibers for async instances.

The primary difference in logic is that, in this PR, I avoid having to transmute the std::task::Context by cloning the Waker and creating a new context from inside the instance. This is partly because an instance could yield and last for longer than the lifetime of the Context (although it should be possible to use a stricter lifetime bound to soundly avoid the clone of context, it's probably not worth it).

alexcrichton · 2021-02-19T21:22:47Z

Ah ok that makes sense. While I believe that works well today the intention of Context was to abstract over possible extensions in the future, so in the limit of time that may not be an equivalent operation. For example std may one day expand the abilities of Context with new functionality, and certain features of tokio may take advantage of that, but that means that hostcalls using tokio futures may unexpectedly not have access to these features since we're recreating Context.

I don't think that there's any plans at this time to actually do this so it's a pretty meta-level concern though.

benaubin · 2021-02-19T21:30:11Z

@alexcrichton That's what I figured. My original implementation passed the context to the instance with a transmute, but also required a lot of unsafe code blocks propagated throughout lucet.

It wouldn't be too hard to change this if necessary, although to do so soundly, RunAsync would need to prevent "yielding" (Lucet instances can yield without blocking on a future and later be resumed from the host). Prior to this PR, that was the case - instances were not allowed to yield when in an async context. After switching the implementation to clone a waker rather than transmute the lifetime of Context, I also relaxed the restriction on yielding from an async context.

For future compatibility's sake, it might be worth reinstating the restriction, or somehow marking yielding as "unstable".

alexcrichton · 2021-02-19T21:39:59Z

I'm not really sure what it means to prevent yielding or be in or not be an async context myself, but I think it's a reasonable state of affairs to just create a new Context for now to prevent a lot of unsafe code. I suspect that any change to Context is years out at the earliest, if ever, so we'd have plenty of runway to update things if needed.

cfallin

r+ on latest commit.

pchickey's branch in #633 shows green CircleCI results for 4773acb, one commit back from this branch's HEAD; the last commit is just an early-return out-denting refactor and passes all other CI.

Given that and given the discussion above with alexcrichton, I think we should be good to merge this. I'll go ahead and merge given the manually-verified CI above!

This reverts commit bf24022, reversing changes made to ddaf91e.

#643) This reverts commit bf24022, reversing changes made to ddaf91e.

This was referenced Jan 15, 2021

Expose bounded runtime #625

Closed

Add a "bounded runtime" mode to guest execution. #612

Merged

cfallin suggested changes Jan 18, 2021

View reviewed changes

lucet-runtime/lucet-runtime-internals/src/future.rs Outdated Show resolved Hide resolved

lucet-runtime/lucet-runtime-internals/src/future.rs Show resolved Hide resolved

lucet-runtime/src/c_api.rs Outdated Show resolved Hide resolved

benaubin requested a review from cfallin February 9, 2021 20:35

benaubin commented Feb 9, 2021

View reviewed changes

lucet-runtime/lucet-runtime-internals/src/instance.rs Outdated Show resolved Hide resolved

benaubin force-pushed the custom-run-async-future branch 2 times, most recently from f8fcc38 to c6e83a2 Compare February 9, 2021 21:38

benaubin commented Feb 10, 2021

View reviewed changes

lucet-runtime/lucet-runtime-internals/src/future.rs Outdated Show resolved Hide resolved

lucet-runtime/lucet-runtime-internals/src/future.rs Outdated Show resolved Hide resolved

benaubin force-pushed the custom-run-async-future branch from cba158f to 8579d6b Compare February 10, 2021 20:25

benaubin force-pushed the custom-run-async-future branch 5 times, most recently from 9e4c495 to a7a775b Compare February 12, 2021 14:52

cfallin reviewed Feb 17, 2021

View reviewed changes

benaubin force-pushed the custom-run-async-future branch from 07b2c7f to 2b80350 Compare February 18, 2021 18:50

benaubin requested a review from cfallin February 18, 2021 18:50

cfallin previously approved these changes Feb 19, 2021

View reviewed changes

benaubin added 4 commits February 18, 2021 19:00

Refactor run_async to custom future implementation

e7efc5b

Remove unsafe code from run_async

9c7acdb

Add support to lucet_hostcall macro for async fn hostcalls

bd50d2e

Cleanup and improve documentation of RunAsync

7cd9d83

benaubin dismissed cfallin’s stale review via 0090233 February 19, 2021 01:02

benaubin force-pushed the custom-run-async-future branch from 2b80350 to 0090233 Compare February 19, 2021 01:02

benaubin requested a review from cfallin February 19, 2021 01:03

benaubin added 2 commits February 18, 2021 19:04

Improve RunAsync documentation and formatting

4773acb

benaubin force-pushed the custom-run-async-future branch from 0090233 to 4773acb Compare February 19, 2021 01:05

pchickey requested a review from alexcrichton February 19, 2021 19:33

cfallin previously approved these changes Feb 19, 2021

View reviewed changes

alexcrichton reviewed Feb 19, 2021

View reviewed changes

lucet-runtime/lucet-runtime-internals/src/future.rs Outdated Show resolved Hide resolved

Improve try_block_on styling

1c298c7

benaubin dismissed cfallin’s stale review via 1c298c7 February 19, 2021 21:43

benaubin requested a review from cfallin February 19, 2021 21:44

cfallin approved these changes Feb 19, 2021

View reviewed changes

cfallin merged commit bf24022 into bytecodealliance:main Feb 19, 2021

benaubin deleted the custom-run-async-future branch February 19, 2021 23:51

pchickey added a commit that referenced this pull request Feb 26, 2021

Revert "Merge pull request #626 from benaubin/custom-run-async-future"

5dc6c5d

This reverts commit bf24022, reversing changes made to ddaf91e.

pchickey mentioned this pull request Feb 26, 2021

Revert #626 (Refactor run_async custom Future): unsound #643

Merged

pchickey pushed a commit that referenced this pull request Feb 26, 2021

Revert "Merge pull request #626 from benaubin/custom-run-async-future" (

62ad436

#643) This reverts commit bf24022, reversing changes made to ddaf91e.

benaubin mentioned this pull request Mar 1, 2021

Move AsyncContext to field on Instance rather than field on State #644

Open

Conversation

benaubin commented Jan 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benaubin commented Jan 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benaubin commented Jan 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benaubin commented Feb 9, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cfallin commented Feb 10, 2021

Uh oh!

benaubin commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benaubin commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benaubin commented Feb 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benaubin commented Feb 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benaubin commented Feb 17, 2021

Uh oh!

pchickey commented Feb 17, 2021

Uh oh!

benaubin commented Feb 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

pchickey commented Feb 19, 2021

Uh oh!

pchickey commented Feb 19, 2021

Uh oh!

pchickey commented Feb 19, 2021

Uh oh!

benaubin commented Feb 19, 2021

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

alexcrichton left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benaubin commented Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Feb 19, 2021

Uh oh!

benaubin commented Feb 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcrichton commented Feb 19, 2021

benaubin commented Jan 15, 2021 •

edited

Loading

benaubin commented Jan 15, 2021 •

edited

Loading

benaubin commented Jan 19, 2021 •

edited

Loading

benaubin commented Feb 11, 2021 •

edited

Loading

benaubin commented Feb 11, 2021 •

edited

Loading

benaubin commented Feb 11, 2021 •

edited

Loading

benaubin commented Feb 12, 2021 •

edited

Loading

benaubin commented Feb 18, 2021 •

edited

Loading

benaubin commented Feb 19, 2021 •

edited

Loading

benaubin commented Feb 19, 2021 •

edited

Loading