Skip to content

Stageless: avoid deadlock in TaskPool::scope by polling all tasks#7492

Closed
maniwani wants to merge 12 commits intobevyengine:mainfrom
maniwani:avoid-deadlocking
Closed

Stageless: avoid deadlock in TaskPool::scope by polling all tasks#7492
maniwani wants to merge 12 commits intobevyengine:mainfrom
maniwani:avoid-deadlocking

Conversation

@maniwani
Copy link
Contributor

@maniwani maniwani commented Feb 3, 2023

Objective

Replace #7448 with the "fair polling" alternative it mentioned, so that we can continue to mirror std::thread::scope (i.e. keep "nested scope spawns").

It might be possible to poll all the tasks in get_results in the scope. If we could do that, then the scope could panic if one of the tasks panics. It would require a data structure that we could both poll and push futures through a shared ref to it. I tried FuturesUnordered, but it requires an exclusive ref to poll it.

Solution

  • Revert those changes.
  • Eagerly take spawned tasks from the queue and maintain a list of the pending ones.
  • Poll pending tasks for completion.
  • If any task fails, cancel all taken-but-unfinished tasks and then panic.

@maniwani maniwani changed the title Stageless: avoid executor deadlock Stageless: avoid deadlock in TaskPool::scope by polling all tasks Feb 3, 2023
@maniwani maniwani requested review from cart, hymm and james7132 February 3, 2023 21:27
@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events A-Tasks Tools for parallel and async work labels Feb 3, 2023
@alice-i-cecile alice-i-cecile added this to the 0.10 milestone Feb 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2023

Example alien_cake_addict failed to run, please try running it locally and check the result.

@hymm
Copy link
Contributor

hymm commented Feb 4, 2023

I did some preliminary checking with cargo bench systems. Both the linked list and the current pr version are faster than main. With the linked list version being a little faster on busy systems and contrived systems. I think I like the linked list version better than the double buffer, but I wonder if there's a cleaner approach than that.

In general I'm in favor of something like this. It's pretty close to what I was thinking of for the "fair polling" alternative I mentioned in the other pr.

@hymm
Copy link
Contributor

hymm commented Feb 5, 2023

edit: reran on main and still seeing a regression on many_foxes with this merged into stageless

image

@hymm
Copy link
Contributor

hymm commented Feb 6, 2023

Thinking about this some more and I'm not sure this is the right approach. This doesn't properly cooperate with the tasks. The version of get_results on main will wait for the task that it is awaiting to complete before waking. This version doesn't do that. It will constantly poll the tasks.

We probably want to write a custom future that can properly wait for a wake signal from all the tasks before rechecking things.

@cart
Copy link
Member

cart commented Feb 7, 2023

Yeah I'm also not a fan of the constant polling. Feels too much like we're circumventing the executor.

@cart cart removed this from the 0.10 milestone Feb 7, 2023
@alice-i-cecile
Copy link
Member

@maniwani shall we close this out for now?

@maniwani
Copy link
Contributor Author

Sure.

@maniwani maniwani closed this Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events A-Tasks Tools for parallel and async work

Projects

No open projects
Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants