[Merged by Bors] - Fix asset_debug_server hang. There should be at most one ThreadExecut…#7825
[Merged by Bors] - Fix asset_debug_server hang. There should be at most one ThreadExecut…#7825shuoli84 wants to merge 13 commits intobevyengine:mainfrom
Conversation
…or's ticker for one thread.
hymm
left a comment
There was a problem hiding this comment.
Are you sure both these changes are necessary? The MainThreadExecutor doesn't get cloned into debug asset server app.
|
Is this only a problem with Is it possible to write a test in bevy_tasks that shows that this deadlocks before this PR? |
|
@NiklasEi can you check if this fixes your use of the debug asset server? I want to make sure this fixes that key issue before giving this a more thorough review. |
There are mainly two changes: use bevy_app::App;
use bevy_ecs::prelude::*;
fn run_sub_app(mut sub_app: NonSendMut<DebugApp>) {
sub_app.app.update();
}
struct DebugApp {
app: App,
}
fn main() {
let mut app = bevy_app::App::new();
let sub_app = bevy_app::App::new();
app.insert_non_send_resource(DebugApp { app: sub_app });
app.add_system(run_sub_app);
app.update();
}
Yes, it is possible to write a test to repro the dead lock, i'll give it a try. This bug is hard to reason, I spend like 20+ hours on it :'). The only fact I am sure of is: "if the Ticker get leaked, then the async_executor enter the "troubled" state, that it can't be notify." But this doesn't promise a deadlock, if the thread can be unparked by any other means, it still able to proceed. I've tried to create a separate thread just unpark the main thread, it also able to run.
In theory EDIT: format |
…onflict check, it would block.
|
Just added an example, without the fix, it would block. You can try disable check by returning |
|
I think I figured out the details. The deadlock is caused by following steps.
let forever = async {
loop {
ticker_1.tick().or(ticker_2.tick()).await
}
}
future::block_on(forever, work_future);
Back to the code fix, if we replace the EDIT: format |
|
@hymm @james7132 ping |
This explanation makes sense to me. Thanks for figuring it out. So my PR #7564 fixes things by not reusing the executor and so it the inner executor doesn't get into the weird state. While this PR fixes things by not ever having the second ticker in an or. My test code in the other pr's comments never deadlocked, because I needed to add a second executor that the outer schedule is using. I'm pretty sure I prefer the change in this PR. We don't need to keep recreating the scope executor and we're no longer doing the double ticking which always felt a little weird. In the longer term, this seems to be a bug in async executor and we should consider upstreaming a fix. |
NiklasEi
left a comment
There was a problem hiding this comment.
@NiklasEi can you check if this fixes your use of the debug asset server? I want to make sure this fixes that key issue before giving this a more thorough review.
I just checked it and yes, this PR also fixes my stuck integration tests 👍
|
Just wanted to quickly chime in and thank @shuoli84 for digging into this rather complex bug. I'll leave a full review soon. Definitely want this fix before 0.10 goes live. |
There was a problem hiding this comment.
Code looks good to me now. The logic for which tickers need to be ticked in scope is getting a little complicated, so it'd be nice to have some unit tests for that, but not going to block on that. The multiple tickers code should be getting removed when we remove !Send resources from the world.
james7132
left a comment
There was a problem hiding this comment.
Sans a few code quality nits, this looks good to me. Great work!
|
Just opened a pr #7865, which basically runs the |
| let scope_ticker = scope_executor.ticker().unwrap(); | ||
| if let Some(external_ticker) = external_executor.ticker() { | ||
| if tick_task_pool_executor { | ||
| let external_ticker = if !external_executor.is_same(scope_executor) { |
There was a problem hiding this comment.
Nice change. This is definitely easier to follow.
|
bors r+ |
#7825) …or's ticker for one thread. # Objective - Fix debug_asset_server hang. ## Solution - Reuse the thread_local executor for MainThreadExecutor resource, so there will be only one ThreadExecutor for main thread. - If ThreadTickers from same executor, they are conflict with each other. Then only tick one.
bevyengine#7825) …or's ticker for one thread. # Objective - Fix debug_asset_server hang. ## Solution - Reuse the thread_local executor for MainThreadExecutor resource, so there will be only one ThreadExecutor for main thread. - If ThreadTickers from same executor, they are conflict with each other. Then only tick one.
…or's ticker for one thread.
Objective
Solution