create a new scope executor for every scope#7564
create a new scope executor for every scope#7564hymm wants to merge 1 commit intobevyengine:mainfrom
Conversation
|
This test works on main. But if my mental model of what is causing the dealock was right, this should deadlock. fn can_run_nested_multithreaded_schedules() {
let mut world = World::default();
world.init_resource::<MainThreadExecutor>();
world.init_resource::<SystemOrder>();
let mut inner_schedule = Schedule::default();
inner_schedule.set_executor_kind(ExecutorKind::MultiThreaded);
inner_schedule.add_system(make_function_system(0));
let mut outer_schedule = Schedule::default();
outer_schedule.set_executor_kind(ExecutorKind::MultiThreaded);
outer_schedule.add_system(move |world: &mut World| {
inner_schedule.run(world);
});
outer_schedule.run(&mut world);
assert_eq!(world.resource::<SystemOrder>().0, vec![0]);
} |
|
There's definitely some timing issues associated with the bug. I eventually added enough |
|
Are there any downsides to this PR? The debug asset server is very important for rendering development. The ability to iterate shaders live saves a ton of time given that bevy render/pbr etc take a long time to compile. |
|
This looks like it removes reuse of a per-thread executor, to instead create a new thread executor every time a scope is used. That sounds like it has the potential for a performance regression due to creating new thread executors for the duration of the scope. |
|
I won't have time to look into this issue more for at least a week. I do plan on investigating more when I do have time. But in case no one else figures anything out, I do consider this change to be low risk. The change to reuse a thread local executor instead of creating a new one was made during this release cycle in #7087. So this is basically just reverting that change |
|
Following code can reproduce dead lock. use bevy_app::App;
use bevy_ecs::prelude::*;
fn run_sub_app(mut sub_app: NonSendMut<DebugApp>) {
sub_app.app.update();
}
struct DebugApp {
app: App,
}
fn main() {
let mut app = bevy_app::App::new();
let sub_app = bevy_app::App::new();
app.insert_non_send_resource(DebugApp { app: sub_app });
app.add_system(run_sub_app);
app.update();
} |
|
I think the reason is, |
|
I wondered how LocalExecutor on main thread get ticked, until see the code |
There are two executors here with one running inside the other one. The exclusivity is per executor, and so running 2 systems that want exclusive access in different executors is allowed.
They're also ticked inside the bevy_tasks::scope which the multithreaded executor runs inside. |
|
The test code above is a little out of date. The inner schedule should have some apply_system_buffers in it. My investigations showed that the deadlock was happening during the startup schedule of the debug app. Typically it happened during the 2nd or 3rd apply_system_buffers in that schedule. The schedule doesn't otherwise have any systems in it. But even with trying to add some apply_system_buffers to the test it doesn't deadlock. |
|
I have run into issues with integration tests of |
james7132
left a comment
There was a problem hiding this comment.
Comparing this to prior to pipelined rendering's merge, this indeed is just a revert. If this is fixing the usage of the debug asset server, we shouldn't ship 0.10 without this.
|
I think I find something interesting. The problem actually is "the async-executor entered a state that it is not able to be triggered just by spawn." Normally, when spawn_exclusive_system_task, it triggers/notifies the main threads spawn thread: ThreadId(4) -> Executor { id: 9, active: 1, global_tasks: 0, local_runners: [], sleepers: 2 } 0 f: bevy_ecs::schedule::executor::multi_threaded::MultiThreadedExecutor::spawn_exclusive_system_task::{{closure}}
executor[9] notify wake 2 Waker { data: 0x600001da4850, vtable: 0x107eb9698 } now: "\"count:2 free_ids:[] wakers:1 <wakers: 1 Waker { data: 0x600001dd2fb0, vtable: 0x107eb9698 }>\\n\""But when the problem happens, it can't trigger the executor. The spawn thread: ThreadId(4) -> Executor { id: 9, active: 1, global_tasks: 0, local_runners: [], sleepers: 2 } 0 f: bevy_ecs::schedule::executor::multi_threaded::MultiThreadedExecutor::spawn_exclusive_system_task::{{closure}}
executor[9] notify no effort Why? That's something I am still figuring out. It appears the executor's Actually when the ticker/sleeper is just notified and running, this is not a problem. In our case, the main thread is already parked, then this is a problem. The executor believes the ticker is running or already notified, so it won't notify it again. Then the main thread parked forever. |
|
This is other executor's states. Only Executor [9]'s count is |
|
I think I found the root cause, there are two /// the thread local one
thread_local! {
static LOCAL_EXECUTOR: async_executor::LocalExecutor<'static> = async_executor::LocalExecutor::new("local executor");
static THREAD_EXECUTOR: Arc<ThreadExecutor<'static>> = Arc::new(ThreadExecutor::new());
}
/// and also the MainExecutor resource
#[derive(Resource, Default, Clone)]
pub struct MainThreadExecutor(pub Arc<ThreadExecutor<'static>>);And the troublesome task is created by code: external_ticker.tick().or(scope_ticker.tick()).await; |
|
Just opened #7825 , check it out? |
|
Closing in favor of #7825. |
Objective
Solution
scopeand remove the reused thread local one.cargo run --example load_gltf --features debug_asset_serverdeadlocks without this pr, but works with it. But I'm unsure of the root cause of the deadlock, so this is not a guarantied fix.Changelog