refactor: Extract parallel queue abstraction#7348
refactor: Extract parallel queue abstraction#7348alice-i-cecile merged 13 commits intobevyengine:mainfrom
Conversation
|
#6817 similarly abstracts the deferred mutation pattern, but seems to do so in a fully compatible way. |
The main difference here is that this is a standalone type. It's used in |
This could be avoided by replacing the |
This makes it difficult to work with several of these together, with a triple nested scope in that case. We can have both, but only having the scope is probably not 100% viable. |
|
Fair enough, didn't consider such a case. |
joseph-gio
left a comment
There was a problem hiding this comment.
Seems like a very useful abstraction, it's a nice way of making thread locals usable. I like the name too -- it feels like a very natural way to describe a set of values stored in parallel across threads.
I am a bit wary of ParRef<>, though. Requiring the drop impl to apply changes to a smart pointer feels strange and overengineered, and it is potentially footgunny as Nile mentioned. Perhaps we should replace .get() with .scope(), and just take the L on nested scopes.
hymm
left a comment
There was a problem hiding this comment.
generally in favor of this. I was considering a pr to do this exact thing. But I'd like to see some benches before approving.
joseph-gio
left a comment
There was a problem hiding this comment.
This is much nicer now that the API is based on scope. I'm on board now, but there's a few API details I'm not sure about.
|
Moving to 0.12 as there are still open conversations. |
hymm
left a comment
There was a problem hiding this comment.
main thing for me to approve is just to see a bench that check_visibility still performs the same.
# Objective `extract_meshes` can easily be one of the most expensive operations in the blocking extract schedule for 3D apps. It also has no fundamentally serialized parts and can easily be run across multiple threads. Let's speed it up by parallelizing it! ## Solution Use the `ThreadLocal<Cell<Vec<T>>>` approach utilized by #7348 in conjunction with `Query::par_iter` to build a set of thread-local queues, and collect them after going wide. ## Performance Using `cargo run --profile stress-test --features trace_tracy --example many_cubes`. Yellow is this PR. Red is main. `extract_meshes`:  An average reduction from 1.2ms to 770us is seen, a 41.6% improvement. Note: this is still not including #9950's changes, so this may actually result in even faster speedups once that's merged in.
# Objective `extract_meshes` can easily be one of the most expensive operations in the blocking extract schedule for 3D apps. It also has no fundamentally serialized parts and can easily be run across multiple threads. Let's speed it up by parallelizing it! ## Solution Use the `ThreadLocal<Cell<Vec<T>>>` approach utilized by bevyengine#7348 in conjunction with `Query::par_iter` to build a set of thread-local queues, and collect them after going wide. ## Performance Using `cargo run --profile stress-test --features trace_tracy --example many_cubes`. Yellow is this PR. Red is main. `extract_meshes`:  An average reduction from 1.2ms to 770us is seen, a 41.6% improvement. Note: this is still not including bevyengine#9950's changes, so this may actually result in even faster speedups once that's merged in.
# Objective `extract_meshes` can easily be one of the most expensive operations in the blocking extract schedule for 3D apps. It also has no fundamentally serialized parts and can easily be run across multiple threads. Let's speed it up by parallelizing it! ## Solution Use the `ThreadLocal<Cell<Vec<T>>>` approach utilized by bevyengine#7348 in conjunction with `Query::par_iter` to build a set of thread-local queues, and collect them after going wide. ## Performance Using `cargo run --profile stress-test --features trace_tracy --example many_cubes`. Yellow is this PR. Red is main. `extract_meshes`:  An average reduction from 1.2ms to 770us is seen, a 41.6% improvement. Note: this is still not including bevyengine#9950's changes, so this may actually result in even faster speedups once that's merged in.
Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
Comparing |
| let cell = self.locals.get_or_default(); | ||
| let mut value = cell.take(); | ||
| let ret = f(&mut value); | ||
| cell.set(value); |
There was a problem hiding this comment.
If a panic occurs in the scope and gets caught, any commands added during the scope will be lost. This is probably desirable but I think it's worth mentioning in a comment.
alice-i-cecile
left a comment
There was a problem hiding this comment.
Nice little bit of abstraction :)
# Objective #7348 added `bevy_utils::Parallel` and replaced the usage of the `ThreadLocal<Cell<Vec<...>>>` in `check_visibility`, but we were also using it in `extract_meshes`. ## Solution Refactor the system to use `Parallel` instead.
# Objective There's a repeating pattern of `ThreadLocal<Cell<Vec<T>>>` which is very useful for low overhead, low contention multithreaded queues that have cropped up in a few places in the engine. This pattern is surprisingly useful when building deferred mutation across multiple threads, as noted by it's use in `ParallelCommands`. However, `ThreadLocal<Cell<Vec<T>>>` is not only a mouthful, it's also hard to ensure the thread-local queue is replaced after it's been temporarily removed from the `Cell`. ## Solution Wrap the pattern into `bevy_utils::Parallel<T>` which codifies the entire pattern and ensures the user follows the contract. Instead of fetching indivdual cells, removing the value, mutating it, and replacing it, `Parallel::get` returns a `ParRef<'a, T>` which contains the temporarily removed value and a reference back to the cell, and will write the mutated value back to the cell upon being dropped. I would like to use this to simplify the remaining part of bevyengine#4899 that has not been adopted/merged. --- ## Changelog TODO --------- Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
# Objective bevyengine#7348 added `bevy_utils::Parallel` and replaced the usage of the `ThreadLocal<Cell<Vec<...>>>` in `check_visibility`, but we were also using it in `extract_meshes`. ## Solution Refactor the system to use `Parallel` instead.
# Objective There's a repeating pattern of `ThreadLocal<Cell<Vec<T>>>` which is very useful for low overhead, low contention multithreaded queues that have cropped up in a few places in the engine. This pattern is surprisingly useful when building deferred mutation across multiple threads, as noted by it's use in `ParallelCommands`. However, `ThreadLocal<Cell<Vec<T>>>` is not only a mouthful, it's also hard to ensure the thread-local queue is replaced after it's been temporarily removed from the `Cell`. ## Solution Wrap the pattern into `bevy_utils::Parallel<T>` which codifies the entire pattern and ensures the user follows the contract. Instead of fetching indivdual cells, removing the value, mutating it, and replacing it, `Parallel::get` returns a `ParRef<'a, T>` which contains the temporarily removed value and a reference back to the cell, and will write the mutated value back to the cell upon being dropped. I would like to use this to simplify the remaining part of bevyengine#4899 that has not been adopted/merged. --- ## Changelog TODO --------- Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
# Objective bevyengine#7348 added `bevy_utils::Parallel` and replaced the usage of the `ThreadLocal<Cell<Vec<...>>>` in `check_visibility`, but we were also using it in `extract_meshes`. ## Solution Refactor the system to use `Parallel` instead.


Objective
There's a repeating pattern of
ThreadLocal<Cell<Vec<T>>>which is very useful for low overhead, low contention multithreaded queues that have cropped up in a few places in the engine. This pattern is surprisingly useful when building deferred mutation across multiple threads, as noted by it's use inParallelCommands.However,
ThreadLocal<Cell<Vec<T>>>is not only a mouthful, it's also hard to ensure the thread-local queue is replaced after it's been temporarily removed from theCell.Solution
Wrap the pattern into
bevy_utils::Parallel<T>which codifies the entire pattern and ensures the user follows the contract. Instead of fetching indivdual cells, removing the value, mutating it, and replacing it,Parallel::getreturns aParRef<'a, T>which contains the temporarily removed value and a reference back to the cell, and will write the mutated value back to the cell upon being dropped.I would like to use this to simplify the remaining part of #4899 that has not been adopted/merged.
Changelog
TODO