Cluster light probes using conservative spherical bounds.#13746
Cluster light probes using conservative spherical bounds.#13746alice-i-cecile merged 12 commits intobevyengine:mainfrom
Conversation
This commit allows the Bevy renderer to use the clustering infrastructure for light probes (reflection probes and irradiance volumes) on platforms where at least 3 storage buffers are available. On such platforms (the vast majority), we stop performing brute-force searches of light probes for each fragment and instead only search the light probes with bounding spheres that intersect the current cluster. This should dramatically improve scalability of irradiance volumes and reflection probes. The primary platform that doesn't support 3 storage buffers is WebGL 2, and we continue using a brute-force search of light probes on that platform, as the UBO that stores per-cluster indices is too small to fit the light probe counts. Note, however, that that platform also doesn't support bindless textures (indeed, it would be very odd for a platform to support bindless textures but not SSBOs), so we only support one of each type of light probe per drawcall in the first place. So this isn't a performance problem, as the search will only have one light probe to consider. (In fact, clustering would probably end up being a performance loss.) Known potential improvements include: 1. We currently cull based on a conservative bounding sphere test and not based on the oriented bounding box (OBB) of the light probe. This is improvable, but in the interests of simplicity, I opted to keep the bounding sphere test for now. The OBB improvement can be a follow-up. 2. This patch doesn't change the fact that each fragment only takes a single light probe into account. Typical light probe implementations detect the case in which multiple light probes cover the current fragment and perform some sort of weighted blend between them. As the light probe fetch function presently returns only a single light probe, implementing that feature would require more code restructuring, so I left it out for now. It can be added as a follow-up. 3. Light probe implementations typically have a falloff range. Although this is a wanted feature in Bevy, this particular commit also doesn't implement that feature, as it's out of scope. 4. This commit doesn't raise the maximum number of light probes past its current value of 8 for each type. This should be addressed later, but would possibly require more bindings on platforms with storage buffers, which would increase this patch's complexity. Even without raising the limit, this patch should constitute a significant performance improvement for scenes that get anywhere close to this limit. In the interest of keeping this patch small, I opted to leave raising the limit to a follow-up.
jgayfer
left a comment
There was a problem hiding this comment.
Overall I think this is making sense. The match statement give us a chance to only perform work on the appropriate types, which should reduce redundant work (and I presume make it easier to add more types in the future).
Is there a specific example or benchmark I can run to verify performance delta? I haven't done much in the way of benchmarking before yet with Bevy.
| /// that order. The remaining fields are filled with zeroes. | ||
| #[size(runtime)] | ||
| data: Vec<UVec4>, | ||
| data: Vec<[UVec4; 2]>, |
There was a problem hiding this comment.
To check my understanding, is this basically a way to let us push more than 4 values into each element in a format that WGSL supports?
Is passing in a vector of a struct that can support more than 4 fields an option? Or is this mostly a way to not need to explicitly pad?
Using a struct could lend itself to better readability throughout this patch (e.g. no need to keep track of what x, y, z, and w map to). But I imagine it's done this way for performance reasons.
Elabajaba
left a comment
There was a problem hiding this comment.
Code looks good, comments are great as always.
Somewhat surprisingly (to me at least) is that this raises our vector register (VGPR) usage in the pbr fragment shader by quite a bit in both the 3d_scene and irradiance_volumes examples. In irradiance_volumes we went from 77 to 94 VGPR, which dropped our wavefront occupancy from 6/16 to 5/16 (16 is the theoretical max on my 6800xt).
In 3d_scene it went from 72 to 77 VGPR, which drops occupancy from 7/16 to 6/16.
Not really sure how to contextualize these numbers, but basically forward PBR shaders are expected to have not great register pressure, but I think in more optimized engines they're in the 8-10/16 occupancy range?
I don't think the increased register use should block this PR, but it's just something to note.
This commit allows the Bevy renderer to use the clustering infrastructure for light probes (reflection probes and irradiance volumes) on platforms where at least 3 storage buffers are available. On such platforms (the vast majority), we stop performing brute-force searches of light probes for each fragment and instead only search the light probes with bounding spheres that intersect the current cluster. This should dramatically improve scalability of irradiance volumes and reflection probes. The primary platform that doesn't support 3 storage buffers is WebGL 2, and we continue using a brute-force search of light probes on that platform, as the UBO that stores per-cluster indices is too small to fit the light probe counts. Note, however, that that platform also doesn't support bindless textures (indeed, it would be very odd for a platform to support bindless textures but not SSBOs), so we only support one of each type of light probe per drawcall in the first place. So this isn't a performance problem, as the search will only have one light probe to consider. (In fact, clustering would probably end up being a performance loss.) Known potential improvements include: 1. We currently cull based on a conservative bounding sphere test and not based on the oriented bounding box (OBB) of the light probe. This is improvable, but in the interests of simplicity, I opted to keep the bounding sphere test for now. The OBB improvement can be a follow-up. 2. This patch doesn't change the fact that each fragment only takes a single light probe into account. Typical light probe implementations detect the case in which multiple light probes cover the current fragment and perform some sort of weighted blend between them. As the light probe fetch function presently returns only a single light probe, implementing that feature would require more code restructuring, so I left it out for now. It can be added as a follow-up. 3. Light probe implementations typically have a falloff range. Although this is a wanted feature in Bevy, this particular commit also doesn't implement that feature, as it's out of scope. 4. This commit doesn't raise the maximum number of light probes past its current value of 8 for each type. This should be addressed later, but would possibly require more bindings on platforms with storage buffers, which would increase this patch's complexity. Even without raising the limit, this patch should constitute a significant performance improvement for scenes that get anywhere close to this limit. In the interest of keeping this patch small, I opted to leave raising the limit to a follow-up.
…ster-light-probes
tychedelia
left a comment
There was a problem hiding this comment.
LGTM! Tested irradiance_volumes and reflection_probes examples.
Cluster light probes
|
Looks like the example failure is real. Also getting |
…ster-light-probes
This patch was doing non-shadows before shadows (and ditto for volumetrics), while the reverse was what `prepare_lights` expected.
# Objective Volumetric fog was broken by #13746. Looks like this particular shader just got missed. I don't see any other instances of `unpack_offset_and_counts` in the codebase. ``` 2024-12-06T03:18:42.297494Z ERROR bevy_render::render_resource::pipeline_cache: failed to process shader: error: no definition in scope for identifier: 'bevy_pbr::clustered_forward::unpack_offset_and_counts' ┌─ crates/bevy_pbr/src/volumetric_fog/volumetric_fog.wgsl:312:29 │ 312 │ let offset_and_counts = bevy_pbr::clustered_forward::unpack_offset_and_counts(cluster_index); │ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unknown identifier │ = no definition in scope for identifier: 'bevy_pbr::clustered_forward::unpack_offset_and_counts' ``` ## Solution Use `unpack_clusterable_object_index_ranges` to get the indices for point/spot lights. ## Testing `cargo run --example volumetric_fog` `cargo run --example fog_volumes` `cargo run --example scrolling_fog`
…#13746) This commit allows the Bevy renderer to use the clustering infrastructure for light probes (reflection probes and irradiance volumes) on platforms where at least 3 storage buffers are available. On such platforms (the vast majority), we stop performing brute-force searches of light probes for each fragment and instead only search the light probes with bounding spheres that intersect the current cluster. This should dramatically improve scalability of irradiance volumes and reflection probes. The primary platform that doesn't support 3 storage buffers is WebGL 2, and we continue using a brute-force search of light probes on that platform, as the UBO that stores per-cluster indices is too small to fit the light probe counts. Note, however, that that platform also doesn't support bindless textures (indeed, it would be very odd for a platform to support bindless textures but not SSBOs), so we only support one of each type of light probe per drawcall there in the first place. Consequently, this isn't a performance problem, as the search will only have one light probe to consider. (In fact, clustering would probably end up being a performance loss.) Known potential improvements include: 1. We currently cull based on a conservative bounding sphere test and not based on the oriented bounding box (OBB) of the light probe. This is improvable, but in the interests of simplicity, I opted to keep the bounding sphere test for now. The OBB improvement can be a follow-up. 2. This patch doesn't change the fact that each fragment only takes a single light probe into account. Typical light probe implementations detect the case in which multiple light probes cover the current fragment and perform some sort of weighted blend between them. As the light probe fetch function presently returns only a single light probe, implementing that feature would require more code restructuring, so I left it out for now. It can be added as a follow-up. 3. Light probe implementations typically have a falloff range. Although this is a wanted feature in Bevy, this particular commit also doesn't implement that feature, as it's out of scope. 4. This commit doesn't raise the maximum number of light probes past its current value of 8 for each type. This should be addressed later, but would possibly require more bindings on platforms with storage buffers, which would increase this patch's complexity. Even without raising the limit, this patch should constitute a significant performance improvement for scenes that get anywhere close to this limit. In the interest of keeping this patch small, I opted to leave raising the limit to a follow-up. ## Changelog ### Changed * Light probes (reflection probes and irradiance volumes) are now clustered on most platforms, improving performance when many light probes are present. --------- Co-authored-by: Benjamin Brienen <Benjamin.Brienen@outlook.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective Volumetric fog was broken by bevyengine#13746. Looks like this particular shader just got missed. I don't see any other instances of `unpack_offset_and_counts` in the codebase. ``` 2024-12-06T03:18:42.297494Z ERROR bevy_render::render_resource::pipeline_cache: failed to process shader: error: no definition in scope for identifier: 'bevy_pbr::clustered_forward::unpack_offset_and_counts' ┌─ crates/bevy_pbr/src/volumetric_fog/volumetric_fog.wgsl:312:29 │ 312 │ let offset_and_counts = bevy_pbr::clustered_forward::unpack_offset_and_counts(cluster_index); │ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unknown identifier │ = no definition in scope for identifier: 'bevy_pbr::clustered_forward::unpack_offset_and_counts' ``` ## Solution Use `unpack_clusterable_object_index_ranges` to get the indices for point/spot lights. ## Testing `cargo run --example volumetric_fog` `cargo run --example fog_volumes` `cargo run --example scrolling_fog`
This commit allows the Bevy renderer to use the clustering infrastructure for light probes (reflection probes and irradiance volumes) on platforms where at least 3 storage buffers are available. On such platforms (the vast majority), we stop performing brute-force searches of light probes for each fragment and instead only search the light probes with bounding spheres that intersect the current cluster. This should dramatically improve scalability of irradiance volumes and reflection probes.
The primary platform that doesn't support 3 storage buffers is WebGL 2, and we continue using a brute-force search of light probes on that platform, as the UBO that stores per-cluster indices is too small to fit the light probe counts. Note, however, that that platform also doesn't support bindless textures (indeed, it would be very odd for a platform to support bindless textures but not SSBOs), so we only support one of each type of light probe per drawcall there in the first place. Consequently, this isn't a performance problem, as the search will only have one light probe to consider. (In fact, clustering would probably end up being a performance loss.)
Known potential improvements include:
We currently cull based on a conservative bounding sphere test and not based on the oriented bounding box (OBB) of the light probe. This is improvable, but in the interests of simplicity, I opted to keep the bounding sphere test for now. The OBB improvement can be a follow-up.
This patch doesn't change the fact that each fragment only takes a single light probe into account. Typical light probe implementations detect the case in which multiple light probes cover the current fragment and perform some sort of weighted blend between them. As the light probe fetch function presently returns only a single light probe, implementing that feature would require more code restructuring, so I left it out for now. It can be added as a follow-up.
Light probe implementations typically have a falloff range. Although this is a wanted feature in Bevy, this particular commit also doesn't implement that feature, as it's out of scope.
This commit doesn't raise the maximum number of light probes past its current value of 8 for each type. This should be addressed later, but would possibly require more bindings on platforms with storage buffers, which would increase this patch's complexity. Even without raising the limit, this patch should constitute a significant performance improvement for scenes that get anywhere close to this limit. In the interest of keeping this patch small, I opted to leave raising the limit to a follow-up.
Changelog
Changed