From 2c02048d7b5d954009f3bcf490d7b80673056ba3 Mon Sep 17 00:00:00 2001 From: james7132 Date: Mon, 21 Jun 2021 19:17:13 -0700 Subject: [PATCH 1/5] CPU Particles RFC --- rfcs/cpu-particles.md | 398 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 398 insertions(+) create mode 100644 rfcs/cpu-particles.md diff --git a/rfcs/cpu-particles.md b/rfcs/cpu-particles.md new file mode 100644 index 00000000..c93371e1 --- /dev/null +++ b/rfcs/cpu-particles.md @@ -0,0 +1,398 @@ +# Feature Name: `cpu-particles` + +Prototype Implementation: https://github.com/james7132/bevy_prototype_particles + +## Summary +Provide a modular set of components and systems for authoring and controlling +CPU-simulated particle systems. + +## Motivation +Games frequently require visual effects that are more fluid in nature, like fire, +lightining, or smoke. These effects are difficult, or impossible to simulate with +static or skinned meshes. + +This RFC aims to detail an efficient CPU-bound implementation for modularly +creating, simulating, and rendering these kinds of effects. + +## User-facing explanation +Each particle system is a collection of simple billboarded quads with the following +additional properties: position, 2D rotation, velocity, angular velocity, color, +texture UVs, and lifetime. When spawned, a particle is assigned a lifetime, which +ticks down frame by frame, until it's destroyed when it reaches 0. + +Particles themselves are not entities, but are instead contained within a +`Particles` component, which stores the live state of all particles in a given +particle system as the general configuration for particle initialization. +Particles do not have a stable ID and attempting to store one is meaningless. + +`ParticleMaterial` is a material component for rendering each particle. Main hook +for into the rest of the render graph. Adding this to a entity with `Particles` +will render the inner state of `Particles`. In this case, `Particles` does the +same job that a `Handle` component does in the normal rendering flows. + +`ParticleEmitter` is a component for controlling the emission behavior of +particles, must be used in tandem with `Particles`. It controls the timing and +shape of how particles are spawned. One or more `EmitterModifier` trait +object(s) can be added to an emitter to change initialization parameters for each +particle. + +Additional optional components collectively known as simulation modifiers will +alter the post-initialization behavior of particles: + + - `VelocityOverLifetime` - Scales the speed of the particle over the course of + it's lifetime. + - `RotationOverLifetime` - Changes the rotation of the particle over the + course of it's lifetime. + - `SizeOverLifetime` - Scales the size of the particle over the course of it's + lifetime. + - `ColorOverLifetime` - Changes the color of the of the particle over the course + of it's lifetime. + - `RotationBySpeed` - Changes the rotation of the of the particle based on it's + speed. + - `SizeBySpeed` - Scales the size of the particle based on it's speed. + - `ColorBySpeed` - Changes the color of the of the particle based on it's speed. + - `ParticleNoise` - Adds noise to how particles move/rotate. Useful for creating + turbulent systems. + - `ForceOverLifetime` - Applies a constant force to all particles based on how + long it has lived. + +Critical to the "fluid" nature of the simulation, all of these components are +parameterizable using either ranges or animation curves to sample values from. + +For utility, a `ParticleBundle` will be provided with a `Particles`, +`ParticleEmitter`, `ParticleMaterial`, and other required rendering components. + +An example creating a round slow-burning fireball: + +```rust +fn create_fireball( + mut textures: Res>, + mut commands: Commands, +) { + commands + .spawn() + .insert_bundle(ParticleBundle { + particles: Particles::new(ParticleConfig { + max_particles: 2500, + ..Default::default(), + }), + emitter: PaticleEmitter::sphere() + // + .with_initial_velocity(0.0) + .with_initial_size(0.25..0.5) + .with_lifetime(0.75..2.0) + .with_initial_color(Color::WHITE) + // Fifty particles per frame + .add_burst(50, Duration::from_millis(0)) + // Repeat every frame + .repeat(), + particle_material: ParticleMaterial { + texture: textures.load("fire.png"), + }, + ..Default::default(), + }) + // Add color over lifetime to have it fade away + .insert(ColorOverLifetime { + // Fade out like a fire. + color: CurveVariable::( + vec![ + Color::WHITE, + Color::YELLOW, + Color::ORANGE, + Color::RED, + Color::MAROON, + ] + }); +} +``` + +Just by changing the parameters a bit, it's possible to create a single burst +explosion: + +```rust +fn create_fireball( + mut textures: Res>, + mut commands: Commands, +) { + commands + .spawn() + .insert_bundle(ParticleBundle { + particles: Particles::new(ParticleConfig { + // Only support 300 large + max_particles: 300, + ..Default::default(), + }), + emitter: PaticleEmitter::sphere() + // Fast outward movement + .with_initial_velocity(8.0..10.0) + .with_initial_size(3.0) + .with_lifetime(0.25..0.75) + .with_initial_color(Color::WHITE) + // Add one large burst at the start. Do not repeat. + .add_burst(300, Duration::from_millis(10)), + particle_material: ParticleMaterial { + texture: textures.load("fire.png"), + }, + ..Default::default(), + }) + // Add color over lifetime modifier to have particles grow colder as they + // age. + .insert(ColorOverLifetime { + // Fade out like + color: CurveVariable::( + vec![ + Color::WHITE, + Color::YELLOW, + Color::ORANGE, + Color::RED, + Color::MAROON, + ] + }); +} +``` + +## Implementation strategy + +### Particles + +The heart and core of this system is `Particles`, a struct-of-arrays component +that contains all of the particle data: + +```rust +#[derive(Clone, RenderResources)] +pub struct Particles { + positions: Vec, + rotations: Vec, + sizes: Vec, + colors: Vec, + #[render_resources(ignore)] + velociites: Vec, + #[render_resources(ignore)] + angular_velociites: Vec, + #[render_resources(ignore)] + remaining_lifetimes: Vec, + #[render_resources(ignore)] + starting_lifetimes: Vec, +} +``` + +This means that particles are not entities onto themselves, and thus cannot be +queried as entities. The tight locality of all of the fields should allow for +faster single-threaded iteration due to higher cache coherency and easier +auto-vectorization. + +Particles are created by appending the associated data to the end of each field +of each Vec, and can be destroyed by calling `swap_remove` on the given +particle's index. This also can be extended to allow spawning/destroying batches +of particles efficiently. + +When rendering, no local to world Mat4 needs to be computed CPU side, instead +opting to calculate the matrix on the GPU via TRS in the vertex shader. This +normally is considered wasteful when rendering normal meshes, but given the +number of particles with unique transforms, and the number of vertices per +particle (4), this approach may end up being faster. (TODO(james7132): +Benchmark). Separation into field buffers also makes it trivial to include each +buffer in GPU instanced draw calls. + +For utility access `Particle<'a>`, `ParticleMut<'a>`, and `ParticleParams` are +added for read-only, mutable, and owned structs that hold particle fields. + +### Particle Emitters +```rust +#[derive(Clone)] +pub struct Particles { + next_burst: Timer, + bursts: Vec, + shape: EmitterShape, + repeat: bool, + modifiers: Vec>, +} + +pub struct EmitterBurst { + count: Range, + wait: Duration, +} + +pub struct EmitterShape { + Sphere { .. }, + Hemisphere { .. }, + Line { .. }, + ... +} + +pub struct EmitterModifier: Send + Sync + 'static { + fn modify(&mut self, params: &mut ParticleParams); +} +``` + +ParticleEmitters are used to track when the next burst of particles should be +generated. + +The `EmitterBurst` vec details when and how many particles to spawn. When +`next_burst` evaluates to finished, the emitter will randomly select how many +particles to spawn, use `EmitterShape` to sample random points inside the shape, +create one or more `ParticleParams`, modify the initialization parameters with +all provided `EmitterModifiers`, and then use the `ParticleParams` to add new +particles to `Particles`. + +### Simulation Modifiers +Simulation modifiers are simple POD components that apply some modiifer to +particle behavior via Bevy's systems, typically querying only for itself and a +`Particles` on the same entity. + +These modifiers are optional and often not found on most particle systems. These +should use SparseSet storage. + +### Randomness +Randomness is at the core of making visually appealling particle systems, and +almost all parameters in `ParticleEmitter`, `EmittterModifier` and simulation +modifiers should take some form of `Range` as an imput for sampling random +values from. + +If reproducible/deterministric particle systems are desirable, a main RNG +for the entire particle system must be exposed. Likewise, each particle should +contain its own random seed to avoid blocking on the global RNG. + +### Parameterization +To best allow easier parameterization of particle behavior, a set of `Curve`s +are needed. More commonly used in animation, a simple curve allows sampling a +smoothed value based on an input time. This allows developers and designers to +define potentially arbitrary easings for every attribute of particles. For +example, the `SizeOverLifetime` modifier will normalize the lifetime of every +particle via `(start_lifetime - remaining_lifetime) / start_lifetime`, and +evaluate a curve to get the size of the particle at a given frame. + +To allow for randomized parameterization, `Curve` should generally be replacable +with a `MinMaxCurve` which defines two separated curves as the minimum and +maximum values at a given point in time. This can be used to generate a range +from which a RNG can sample from or can be used as the start and end for linear +interpolation for values that require smooth flows from one to another. + +If `Curve` is generic, it can be used to represent a gradient for changing colors +as well. + +There already is a open PR for adding these types to Bevy: +https://github.com/bevyengine/bevy/pulls?q=1837. + +## Drawbacks + +## Rationale and alternatives +The main alternative comes in the form of GPU-based particle simulations, where +both the simualtion and rendering occur on the GPU instead. Compared to the +proposed CPU based design the main benefit is that much higher particle counts +can be used. Whereas the design proposed here may support several thousand or +several tens of thousands particles, GPU particle systems which use compute +shaders for simulation can support millions of particles per system. This enables +many more artistic workflows that are not available with just CPU particles. + +The main drawbacks of a GPU particle system: + + - Compute shader support is needed, both in Bevy and the target platform. Most + notably, WebGL does not have access to compute shaders, and WebGPU, it's + eventual replacement, is not landing anytime soon. + - Particles are difficult/impossible to access from the CPU, depending on the + implementation. + +When pipelined rendering lands, it will likely include compute shader passes, +upon which a GPU particle system can be built. It may be possible to reuse +multiple components from this design to drive compute shaders instead of running +them on the CPU. In these cases, an alternative storage for particle data will +likely be needed, as well as systems for triggering particle simulation, but all +of the emission and simulation modifiers can be reused. + +However even when GPU particles are available, it makes sense to keep a CPU +implementation as a fallback for platforms that don't support compute shaders, as +well as provide a choice to the end-developer as for which fits their use case. + +Another alternative is to not support particles systems as a first party engine +feature. This could very well be just another ecosystem crate; however, like +`bevy_pbr`, it's often better to have well established common convention +supported at a first-party level. Supporting it as a first party crate should +ensure that both end-developers, and third-party plugins target the same common +interface for working with these systems, as well as establish a common authoring +flow for artists when a fully-featured editor is ready. + +### Particle Representation +Particles could be represented easily as Bevy entities and components. This could +be helpful as access to particle data would be as easy as normal components, and +provides more trivial multithreading via `par_iter(_mut)` and +`par_for_each(_mut)`. However, there are some notable pain points in this +approach: + + - Bevy does not curently support GPU-instancing in the general case for + entities. The SoA construction make this notably easier to write + single draw-call renderers for each particle system. + - Destruction requires use of Commands, and must be done at a sync point. This + may result in more work being done than necessary in frames where large + counts of particles die. This may also be heavier than running `swap_remove` + on a SoA. + - Each component in a entity incurs the CPU and memory overhead of a change + detection tick. With these systems, it's comon for particles to constantly be + spawned, destoryed, and altered. Querying for a `Changed` + will likely return the entire set of particles. + +### Particle Death +Other particle system implementations often keep a "alive list" of particles that +are alive vs dead instead of using something like `swap_remove` to +destory/recycle particles. This makes a particle destroy a simple bitflip instead +of a complete particle copy, but adds a O(n) search time when spawning new +particles, and a potential O(n) compaction step before rendering or a large number of +discards in the fragment shader. Note: in this case n is the maximum number of +supported particles, not the size of the alive set. + +This can most easily be implemented with a `Vec`, but can be further +optimized by using a bitmask where each 1 maps to a live particle. This can be +further optimized using SIMD, where scanning 256 particles can be done with one +or two instructions (like SwissTable). Passing the "alive list" in this format as +a uniform to a shader will likely limit it to only 8192 per draw call. + +## Prior Art +At time of writing, there are no public implementations of particle systems +within the bevy ecosystem that are meant for generic use. + +Particle systems have some kind of implementation in almost every notable game +engine: + + * [Unity Particle Systems](https://docs.unity3d.com/Manual/ParticleSystems.html) + * [Unity Visual Effect Graph](https://docs.unity3d.com/Packages/com.unity.visualeffectgraph@8.2/manual/index.html) + * [Unreal Engine 4](https://docs.unrealengine.com/4.26/en-US/Resources/ContentExamples/EffectsGallery/1_A/) + * [Godot](https://docs.godotengine.org/en/stable/classes/class_particles.html) + ([2D](https://docs.godotengine.org/en/stable/tutorials/2d/particle_systems_2d.html)) + * [Source](https://developer.valvesoftware.com/wiki/Particle_System_Overview) + +This RFC uses a CPU-particle simulation, and is heavily inspired by the public +interface from Unity's Shuriken particle system. Both Unity and Unreal offer +separate CPU and GPU particle implementations, the differences are discussed +above. + +In the literature, one of the original papers on the topic is from Lucasfilm in +1983: http://graphics.cs.cmu.edu/courses/15-869-F07/Papers/Reeves-1983-PSA.pdf, +which this implementation replicates the main parts of it's particle definition. + +## Unresolved questions + + - Benchmark the difference between SoA component vs Bevy entities. + - The `swap_remove` approach is likely to break transparent rendering without + some form of Z-sorting. Can we do this on the GPU? + +## Future possibilities +Further extensions like additional emission and simulation modifiers can be added +for more complex particle behavior or visuals. This can either be added as first +party implementations or as a third-party ecosystem crate. + +In particular, several modifiers that Unity's Shuriken particle system has that +is not supported by the RFC include particle trails (adding triangle strips that +follow particles during it's lifetime), collision with physical objects (perhaps +with Rapier), and particle force fields that apply a constant force to particles +within a bounded region of world space. + +One optimization might be to provide slimmed down meshes for particles instead of +always using quads. This should drastically reduce the fill rate as there will be +a large number of overlapping transparent geometry in any parctical particle +system. + +The easily instanced data model could be used for other high-repitiion, +low-impact rendering tasks like grass on terrain. + +Following that same line of thought, providing the utility to specify what mesh +each particle uses might be useful in some use cases. From 3086133ba70983253394e566531b8e35c0f40323 Mon Sep 17 00:00:00 2001 From: james7132 Date: Mon, 21 Jun 2021 19:21:46 -0700 Subject: [PATCH 2/5] Fix particle emitter code snippet --- rfcs/cpu-particles.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/cpu-particles.md b/rfcs/cpu-particles.md index c93371e1..a2ca8abe 100644 --- a/rfcs/cpu-particles.md +++ b/rfcs/cpu-particles.md @@ -200,7 +200,7 @@ added for read-only, mutable, and owned structs that hold particle fields. ### Particle Emitters ```rust #[derive(Clone)] -pub struct Particles { +pub struct ParticleEmitter { next_burst: Timer, bursts: Vec, shape: EmitterShape, From b464367206c409718e2d0a28477ab1d163eff6a1 Mon Sep 17 00:00:00 2001 From: james7132 Date: Mon, 21 Jun 2021 19:26:03 -0700 Subject: [PATCH 3/5] Fixes --- rfcs/cpu-particles.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/rfcs/cpu-particles.md b/rfcs/cpu-particles.md index a2ca8abe..363a1392 100644 --- a/rfcs/cpu-particles.md +++ b/rfcs/cpu-particles.md @@ -266,7 +266,7 @@ To allow for randomized parameterization, `Curve` should generally be replacable with a `MinMaxCurve` which defines two separated curves as the minimum and maximum values at a given point in time. This can be used to generate a range from which a RNG can sample from or can be used as the start and end for linear -interpolation for values that require smooth flows from one to another. +interpolation for values that require smooth flows from one frame to another. If `Curve` is generic, it can be used to represent a gradient for changing colors as well. @@ -274,8 +274,6 @@ as well. There already is a open PR for adding these types to Bevy: https://github.com/bevyengine/bevy/pulls?q=1837. -## Drawbacks - ## Rationale and alternatives The main alternative comes in the form of GPU-based particle simulations, where both the simualtion and rendering occur on the GPU instead. Compared to the From b97891e67ef814857f5905e4230e92c8a84a5707 Mon Sep 17 00:00:00 2001 From: james7132 Date: Sat, 26 Jun 2021 06:24:55 -0700 Subject: [PATCH 4/5] Add note about storage format for rotation --- rfcs/cpu-particles.md | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/rfcs/cpu-particles.md b/rfcs/cpu-particles.md index 363a1392..ae93f3cc 100644 --- a/rfcs/cpu-particles.md +++ b/rfcs/cpu-particles.md @@ -161,17 +161,11 @@ that contains all of the particle data: ```rust #[derive(Clone, RenderResources)] pub struct Particles { - positions: Vec, - rotations: Vec, + positions: Vec, sizes: Vec, colors: Vec, - #[render_resources(ignore)] - velociites: Vec, - #[render_resources(ignore)] - angular_velociites: Vec, - #[render_resources(ignore)] - remaining_lifetimes: Vec, - #[render_resources(ignore)] + velociites: Vec, + lifetimes: Vec, starting_lifetimes: Vec, } ``` @@ -183,9 +177,20 @@ auto-vectorization. Particles are created by appending the associated data to the end of each field of each Vec, and can be destroyed by calling `swap_remove` on the given -particle's index. This also can be extended to allow spawning/destroying batches +particle's index. This creation/destruction scheme ensures that CPU time is +only spent iterating over live particles, at the cost of particle identity +stability. This also can be extended to allow spawning/destroying batches of particles efficiently. +Without modifiers, a particle update requires a position update. Velocities are +multiplied by delta time and added to existing poses and lifetimes are increased +by the provided delta time. If the liftime of a particle is up, it is destroyed. + +Position, rotation, angular velocity, and velocity are represented as +`glam::Vec4` with the rotation/angular velocity stored in the w component. This +allows trivial vectorization of updating particle pose. May also be sped up +further using fused multiply adds with delta time. + When rendering, no local to world Mat4 needs to be computed CPU side, instead opting to calculate the matrix on the GPU via TRS in the vertex shader. This normally is considered wasteful when rendering normal meshes, but given the @@ -197,6 +202,10 @@ buffer in GPU instanced draw calls. For utility access `Particle<'a>`, `ParticleMut<'a>`, and `ParticleParams` are added for read-only, mutable, and owned structs that hold particle fields. +If a maximum number of particles is made mandatory, one other option is to +use `BlobVec` as an unsafe alternative to `Vec`. This has notable challenges +since `BlobVec` is neither Send nor Sync. + ### Particle Emitters ```rust #[derive(Clone)] From f664f274e06ea35fb850e9bdff4f8dea09c5e8d5 Mon Sep 17 00:00:00 2001 From: james7132 Date: Sat, 3 Jul 2021 19:51:14 -0700 Subject: [PATCH 5/5] Improve lifetime management --- rfcs/cpu-particles.md | 36 +++++++++++++++++++++--------------- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/rfcs/cpu-particles.md b/rfcs/cpu-particles.md index ae93f3cc..f7162423 100644 --- a/rfcs/cpu-particles.md +++ b/rfcs/cpu-particles.md @@ -161,12 +161,13 @@ that contains all of the particle data: ```rust #[derive(Clone, RenderResources)] pub struct Particles { + lifetime: f32, positions: Vec, sizes: Vec, colors: Vec, velociites: Vec, - lifetimes: Vec, - starting_lifetimes: Vec, + starts: Vec, + expirations: Vec, } ``` @@ -179,32 +180,37 @@ Particles are created by appending the associated data to the end of each field of each Vec, and can be destroyed by calling `swap_remove` on the given particle's index. This creation/destruction scheme ensures that CPU time is only spent iterating over live particles, at the cost of particle identity -stability. This also can be extended to allow spawning/destroying batches -of particles efficiently. +stability. A potential optimization here would be to defer altering the lengths +of the Vecs until after all of the dead particles are cleared. -Without modifiers, a particle update requires a position update. Velocities are -multiplied by delta time and added to existing poses and lifetimes are increased -by the provided delta time. If the liftime of a particle is up, it is destroyed. +Without modifiers, a particle update only requires a position update and a +lifetime check. Velocities are multiplied by the delta time and added to +existing poses. Position, rotation, angular velocity, and velocity are represented as `glam::Vec4` with the rotation/angular velocity stored in the w component. This allows trivial vectorization of updating particle pose. May also be sped up further using fused multiply adds with delta time. +Lifetimes are represented as a per-particle spawn and expiration timestamps, as +well as a global lifetime timer. When the global lifetime timer exceeds a +particle's expiration, it's considered dead. As the spawn and expiration +timestamps are static throughout a particle's lifetime, it incurs no cost while +normally updating the system. As a tradeoff, this adds a bit of additional +overhead when spawning particles and when computing lifetime based modifiers +(i.e. ColorOverLifetime). + When rendering, no local to world Mat4 needs to be computed CPU side, instead opting to calculate the matrix on the GPU via TRS in the vertex shader. This normally is considered wasteful when rendering normal meshes, but given the number of particles with unique transforms, and the number of vertices per particle (4), this approach may end up being faster. (TODO(james7132): -Benchmark). Separation into field buffers also makes it trivial to include each -buffer in GPU instanced draw calls. +Benchmark). Separation into SoA field buffers also makes it trivial to include +each buffer in GPU instanced draw calls. For utility access `Particle<'a>`, `ParticleMut<'a>`, and `ParticleParams` are -added for read-only, mutable, and owned structs that hold particle fields. - -If a maximum number of particles is made mandatory, one other option is to -use `BlobVec` as an unsafe alternative to `Vec`. This has notable challenges -since `BlobVec` is neither Send nor Sync. +added for read-only, mutable, and owned structs that hold particle fields, and +the appopriate iterator types will be included through `iter` and `iter_mut`. ### Particle Emitters ```rust @@ -300,7 +306,7 @@ The main drawbacks of a GPU particle system: - Particles are difficult/impossible to access from the CPU, depending on the implementation. -When pipelined rendering lands, it will likely include compute shader passes, +When pipelined rendering lands, it will include compute shader passes, upon which a GPU particle system can be built. It may be possible to reuse multiple components from this design to drive compute shaders instead of running them on the CPU. In these cases, an alternative storage for particle data will