Refactor run_async to use custom Future implementation #626
Refactor run_async to use custom Future implementation #626cfallin merged 7 commits intobytecodealliance:mainfrom
Conversation
|
In case it's useful, here's one way to implement a soft CPU time limit with this refactor. I'm not sure if it's worth adding an extra dependency to lucet, so I'm not including it in the PR (I'd be happy to, though!). use std::time::Duration;
use cpu_time;
pub struct RunAsyncWithCPULimit<'a> {
run_async: RunAsync<'a>,
pub cpu_time_limit: Duration,
pub cpu_time_used: Duration,
}
struct CPULimitExceeded;
impl<'a> Future for RunAsyncWithCPULimit<'a> {
type Output = (
Result<Result<UntypedRetVal, Error>, CPULimitExceeded>,
Duration,
);
fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
let start = cpu_time::ThreadTime::now();
let poll = Pin::new(&mut self.run_async).poll(cx);
self.cpu_time_used += start.elapsed();
match poll {
Poll::Ready(res) => Poll::Ready((Ok(res), self.cpu_time_used)),
Poll::Pending if self.cpu_time_used > self.cpu_time_limit => {
Poll::Ready((Err(CPULimitExceeded), self.cpu_time_used))
}
Poll::Pending => Poll::Pending
}
}
}Feel free to use this example however you'd like (I'm hereby releasing this example under the CC-0 license) |
cfallin
left a comment
There was a problem hiding this comment.
Hi @benaubin,
Thanks for this PR; however, I'm not sure I understand the reasoning behind some of the changes below. If I'm understanding correctly, the changes here create an explicit named type for the future, moving the existing async fn into a custom trait impl with an explicit state machine.
It might help me to understand the need for these if you could summarize why one can't implement the wall-clock bound example above by wrapping the future returned by the current API. For example, I think one could write an add_timeout<F: Future...>(inner: F) -> TimeoutFuture<F> and define TimeoutFuture to store an F, without requiring a named future-state struct on the Lucet side. This is largely like your example above, but with the struct type param to make it work without Lucet changes. Is there a reason this wouldn't work?
To be honest, I completely forgot that you could use generics for this. However, futures created by async/await are The primary thing that is possible to do with this API but not an Additionally, the custom Future implementation reduces major sharp edges by making it much more clear how the lucet and the future executor interact.
In the future, this API makes other features possible:
|
|
@cfallin Hey Chris! I just pushed a new commit that takes advantage of the custom Future implementation to do the first poll of a future without yielding. For futures that are immediately ready, this implementation requires no context switch and no heap allocations - similar to using |
f8fcc38 to
c6e83a2
Compare
cba158f to
8579d6b
Compare
|
@benaubin Sorry, I haven't gotten back to this as I've been somewhat short on time. To be honest, I still don't quite understand some of the details and reasoning here. For example, could you explain what you mean by: "takes advantage of the custom Future implementation to do the first poll of a future without yielding." -- first poll of which future? AFAICT in the current code, if the called Wasm function returns soon enough (before first bound expiration), we don't yield and require another poll, we just return a completed value. Or do you mean in some other situation? From above, the only desired behavior I see that is not yet possible is to set the bound to a different amount on every poll. Everything else is possible with the current API. Is that correct? Overall, I am hesitant to accept a full rewrite of the core async code, especially one that replaces an |
|
No problem! The change I made yesterday makes it so there is no yield if the future passed to I'm currently working on reducing the amount of unsafe code necessary to make this work. |
|
@cfallin I just pushed a commit that removes nearly all of the unsafe code from |
|
@cfallin Just pushed a new commit that adds support for defining an async hostcall just like an async function: Like Rust's Later, the async executor polls the |
9e4c495 to
a7a775b
Compare
|
Rebased onto main, consolidated commits (happy to squash into a single commit if requested, but I'm separating out a few of them in case some of the changes are rejected). @cfallin the changes should be ready for review. I know that my original changes weren't massive improvements, but these more recent ones much larger advantages, while nearly-eliminating the unsafe code in
Semantically, guests behave exactly as normal async functions would, where |
cfallin
left a comment
There was a problem hiding this comment.
@benaubin, thanks for the updates and sorry for the delay on the further review!
I think this is definitely going in the right direction, and given the reductions in unsafe code and after reading and understanding further the async-hostcall implementation, I'm inclined toward making sure this gets in.
I do have one pending uncertainty, though, below, regarding a possible race (it's possible you've thought through this already and if so I'd appreciate a clarifying comment :-) ).
I think that w.r.t. the async hostcall implementation we should also get any input that @pchickey might have. It looks reasonable to me but I may be missing some requirement here.
|
Awesome! Thanks, @cfallin. I'll make the changes tomorrow. |
|
Thanks for working on this! Many of the futures bits of this are over my head, but the change to the macro to accept |
|
The changes have been made and the test for yielding from async has been added! |
07b2c7f to
2b80350
Compare
cfallin
left a comment
There was a problem hiding this comment.
Thanks @benaubin -- this LGTM. I think it might be good to have one more sanity-check on the core future/waker logic -- @alexcrichton, would you mind deploying your async expertise here in a final check?
We'll also need to trigger the CircleCI results; usually this occurs when we push to a branch in this repo, but in this case you won't have permissions to do that. @pchickey / @iximeow / @acfoltzer, is there a way around this or will one of us need to push a branch and create a new PR in place of @benaubin?
|
I am not aware of a workaround to the CircleCI problem, unfortunately - I can push a branch and once it goes green i'll link to the status here and we can merge this PR. |
|
you'll need to merge the latest |
2b80350 to
0090233
Compare
|
@pchickey Done. |
By replacing the expected yield value from a Box<PhantomData<T>> to TypeId, it becomes possible to resume instance execution when the size of a boxed type is unknown and cannot be passed as an argument. This makes it possible to yield and resume from an async context by passing the resumption value through RunAsyncState. Finally, we have to check that the context didn't change between resumption within try_block_on, in order to make sure the right Waker is scheduled and the future will be resumed when it is ready.
0090233 to
4773acb
Compare
cfallin
left a comment
There was a problem hiding this comment.
re-r+'ing for my part after updates.
alexcrichton
left a comment
There was a problem hiding this comment.
As a baseline I'm pretty unfamiliar with the internals of Lucet, and I'm unfortunately finding it difficult to disentangle "just the futures bits" from the rest of the Lucet embedding/runtime. In that sense I can say that after reading this nothing looks obviously wrong but I can't say with conviction that this all looks good to me.
I'm basically trying to find parallels to the async implementation in wasmtime because I believe at a high-level the two async strategies are the same but I was finding it pretty difficult to draw the parallels because I don't understand enough of Lucet.
That being said it looks like this has gotten reviews from other folks, so that's probably good enough? If needed, though, I can set aside more time to learning more of Lucet first and then digging more into the changes here.
|
@alexcrichton It's indeed extremely similar to your wasmtime PR. Lucet runs every instance within a context switch, so it's very similar to wasmtime's fibers for async instances. The primary difference in logic is that, in this PR, I avoid having to transmute the |
|
Ah ok that makes sense. While I believe that works well today the intention of I don't think that there's any plans at this time to actually do this so it's a pretty meta-level concern though. |
|
@alexcrichton That's what I figured. My original implementation passed the context to the instance with a transmute, but also required a lot of unsafe code blocks propagated throughout lucet. It wouldn't be too hard to change this if necessary, although to do so soundly, For future compatibility's sake, it might be worth reinstating the restriction, or somehow marking yielding as "unstable". |
|
I'm not really sure what it means to prevent yielding or be in or not be an async context myself, but I think it's a reasonable state of affairs to just create a new |
cfallin
left a comment
There was a problem hiding this comment.
r+ on latest commit.
pchickey's branch in #633 shows green CircleCI results for 4773acb, one commit back from this branch's HEAD; the last commit is just an early-return out-denting refactor and passes all other CI.
Given that and given the discussion above with alexcrichton, I think we should be good to merge this. I'll go ahead and merge given the manually-verified CI above!
This PR replaces the
async fn run_async_internal()with a custom implementation of Future.Advantages:
a much sounder execution model: the future is executed from within the guest! When host execution is desired tokio::spawn or similar works from within block_on
async { 5 }would immediately resolve to 5, without requiring a yield to the host.run_async, which makes it easier for embedders to manage the execution, such as by changing the execution bound between polls.Semantically, guests behave exactly as normal async functions would, where block_on is equivalent to calling await.
It's now possible to declare an async hostcall like an async function:
I also a added new
try_block_onmethod, which allows an embedded to define fallback behavior for when an instance is not run from within an async execution context.Additionally, I removed the inst_count_bound argument in
run_asyncandrun_async_start(added in #612, API change is not released) and replaced it with a method and field on the RunAsync future. This is more flexible for users that need the ability to configure a CPU bound (it lets a user change the CPU bound in-between any poll) while being less verbose (and backward compatible with the old api) for users without that need.