Skip to content

Cluster light probes using conservative spherical bounds.#13746

Merged
alice-i-cecile merged 12 commits intobevyengine:mainfrom
pcwalton:cluster-light-probes
Dec 5, 2024
Merged

Cluster light probes using conservative spherical bounds.#13746
alice-i-cecile merged 12 commits intobevyengine:mainfrom
pcwalton:cluster-light-probes

Conversation

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Jun 8, 2024

This commit allows the Bevy renderer to use the clustering infrastructure for light probes (reflection probes and irradiance volumes) on platforms where at least 3 storage buffers are available. On such platforms (the vast majority), we stop performing brute-force searches of light probes for each fragment and instead only search the light probes with bounding spheres that intersect the current cluster. This should dramatically improve scalability of irradiance volumes and reflection probes.

The primary platform that doesn't support 3 storage buffers is WebGL 2, and we continue using a brute-force search of light probes on that platform, as the UBO that stores per-cluster indices is too small to fit the light probe counts. Note, however, that that platform also doesn't support bindless textures (indeed, it would be very odd for a platform to support bindless textures but not SSBOs), so we only support one of each type of light probe per drawcall there in the first place. Consequently, this isn't a performance problem, as the search will only have one light probe to consider. (In fact, clustering would probably end up being a performance loss.)

Known potential improvements include:

  1. We currently cull based on a conservative bounding sphere test and not based on the oriented bounding box (OBB) of the light probe. This is improvable, but in the interests of simplicity, I opted to keep the bounding sphere test for now. The OBB improvement can be a follow-up.

  2. This patch doesn't change the fact that each fragment only takes a single light probe into account. Typical light probe implementations detect the case in which multiple light probes cover the current fragment and perform some sort of weighted blend between them. As the light probe fetch function presently returns only a single light probe, implementing that feature would require more code restructuring, so I left it out for now. It can be added as a follow-up.

  3. Light probe implementations typically have a falloff range. Although this is a wanted feature in Bevy, this particular commit also doesn't implement that feature, as it's out of scope.

  4. This commit doesn't raise the maximum number of light probes past its current value of 8 for each type. This should be addressed later, but would possibly require more bindings on platforms with storage buffers, which would increase this patch's complexity. Even without raising the limit, this patch should constitute a significant performance improvement for scenes that get anywhere close to this limit. In the interest of keeping this patch small, I opted to leave raising the limit to a follow-up.

Changelog

Changed

  • Light probes (reflection probes and irradiance volumes) are now clustered on most platforms, improving performance when many light probes are present.

This commit allows the Bevy renderer to use the clustering
infrastructure for light probes (reflection probes and irradiance
volumes) on platforms where at least 3 storage buffers are available. On
such platforms (the vast majority), we stop performing brute-force
searches of light probes for each fragment and instead only search the
light probes with bounding spheres that intersect the current cluster.
This should dramatically improve scalability of irradiance volumes and
reflection probes.

The primary platform that doesn't support 3 storage buffers is WebGL 2,
and we continue using a brute-force search of light probes on that
platform, as the UBO that stores per-cluster indices is too small to fit
the light probe counts. Note, however, that that platform also doesn't
support bindless textures (indeed, it would be very odd for a platform
to support bindless textures but not SSBOs), so we only support one of
each type of light probe per drawcall in the first place. So this isn't
a performance problem, as the search will only have one light probe to
consider. (In fact, clustering would probably end up being a performance
loss.)

Known potential improvements include:

1. We currently cull based on a conservative bounding sphere test and
   not based on the oriented bounding box (OBB) of the light probe. This
   is improvable, but in the interests of simplicity, I opted to keep
   the bounding sphere test for now. The OBB improvement can be a
   follow-up.

2. This patch doesn't change the fact that each fragment only takes a
   single light probe into account. Typical light probe implementations
   detect the case in which multiple light probes cover the current
   fragment and perform some sort of weighted blend between them. As the
   light probe fetch function presently returns only a single light
   probe, implementing that feature would require more code
   restructuring, so I left it out for now. It can be added as a
   follow-up.

3. Light probe implementations typically have a falloff range. Although
   this is a wanted feature in Bevy, this particular commit also doesn't
   implement that feature, as it's out of scope.

4. This commit doesn't raise the maximum number of light probes past its
   current value of 8 for each type. This should be addressed later, but
   would possibly require more bindings on platforms with storage
   buffers, which would increase this patch's complexity. Even without
   raising the limit, this patch should constitute a significant
   performance improvement for scenes that get anywhere close to this
   limit. In the interest of keeping this patch small, I opted to leave
   raising the limit to a follow-up.
@pcwalton pcwalton added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jun 8, 2024
@pcwalton pcwalton added this to the 0.15 milestone Jun 8, 2024
Copy link
Contributor

@jgayfer jgayfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is making sense. The match statement give us a chance to only perform work on the appropriate types, which should reduce redundant work (and I presume make it easier to add more types in the future).

Is there a specific example or benchmark I can run to verify performance delta? I haven't done much in the way of benchmarking before yet with Bevy.

/// that order. The remaining fields are filled with zeroes.
#[size(runtime)]
data: Vec<UVec4>,
data: Vec<[UVec4; 2]>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check my understanding, is this basically a way to let us push more than 4 values into each element in a format that WGSL supports?

Is passing in a vector of a struct that can support more than 4 fields an option? Or is this mostly a way to not need to explicitly pad?

Using a struct could lend itself to better readability throughout this patch (e.g. no need to keep track of what x, y, z, and w map to). But I imagine it's done this way for performance reasons.

Copy link
Contributor

@Elabajaba Elabajaba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, comments are great as always.

Somewhat surprisingly (to me at least) is that this raises our vector register (VGPR) usage in the pbr fragment shader by quite a bit in both the 3d_scene and irradiance_volumes examples. In irradiance_volumes we went from 77 to 94 VGPR, which dropped our wavefront occupancy from 6/16 to 5/16 (16 is the theoretical max on my 6800xt).

In 3d_scene it went from 72 to 77 VGPR, which drops occupancy from 7/16 to 6/16.

Not really sure how to contextualize these numbers, but basically forward PBR shaders are expected to have not great register pressure, but I think in more optimized engines they're in the 8-10/16 occupancy range?

I don't think the increased register use should block this PR, but it's just something to note.

@bas-ie bas-ie added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Sep 30, 2024
@BenjaminBrienen BenjaminBrienen added the D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes label Oct 14, 2024
pcwalton and others added 5 commits October 14, 2024 20:53
This commit allows the Bevy renderer to use the clustering
infrastructure for light probes (reflection probes and irradiance
volumes) on platforms where at least 3 storage buffers are available. On
such platforms (the vast majority), we stop performing brute-force
searches of light probes for each fragment and instead only search the
light probes with bounding spheres that intersect the current cluster.
This should dramatically improve scalability of irradiance volumes and
reflection probes.

The primary platform that doesn't support 3 storage buffers is WebGL 2,
and we continue using a brute-force search of light probes on that
platform, as the UBO that stores per-cluster indices is too small to fit
the light probe counts. Note, however, that that platform also doesn't
support bindless textures (indeed, it would be very odd for a platform
to support bindless textures but not SSBOs), so we only support one of
each type of light probe per drawcall in the first place. So this isn't
a performance problem, as the search will only have one light probe to
consider. (In fact, clustering would probably end up being a performance
loss.)

Known potential improvements include:

1. We currently cull based on a conservative bounding sphere test and
   not based on the oriented bounding box (OBB) of the light probe. This
   is improvable, but in the interests of simplicity, I opted to keep
   the bounding sphere test for now. The OBB improvement can be a
   follow-up.

2. This patch doesn't change the fact that each fragment only takes a
   single light probe into account. Typical light probe implementations
   detect the case in which multiple light probes cover the current
   fragment and perform some sort of weighted blend between them. As the
   light probe fetch function presently returns only a single light
   probe, implementing that feature would require more code
   restructuring, so I left it out for now. It can be added as a
   follow-up.

3. Light probe implementations typically have a falloff range. Although
   this is a wanted feature in Bevy, this particular commit also doesn't
   implement that feature, as it's out of scope.

4. This commit doesn't raise the maximum number of light probes past its
   current value of 8 for each type. This should be addressed later, but
   would possibly require more bindings on platforms with storage
   buffers, which would increase this patch's complexity. Even without
   raising the limit, this patch should constitute a significant
   performance improvement for scenes that get anywhere close to this
   limit. In the interest of keeping this patch small, I opted to leave
   raising the limit to a follow-up.
Copy link
Member

@tychedelia tychedelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Tested irradiance_volumes and reflection_probes examples.

@tychedelia
Copy link
Member

Looks like the example failure is real. Also getting The preprocessing index buffer wasn't present when trying to run alien_cake_addict locally.

@alice-i-cecile alice-i-cecile added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Oct 20, 2024
@alice-i-cecile alice-i-cecile added S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged and removed S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it labels Oct 20, 2024
@alice-i-cecile alice-i-cecile modified the milestones: 0.15, 0.16 Oct 20, 2024
@pcwalton pcwalton added S-Needs-Review Needs reviewer attention (from anyone!) to move forward and removed S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged labels Dec 5, 2024
Copy link
Contributor

@jgayfer jgayfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@pcwalton pcwalton added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Dec 5, 2024
@alice-i-cecile alice-i-cecile added this pull request to the merge queue Dec 5, 2024
Merged via the queue into bevyengine:main with commit b7bcd31 Dec 5, 2024
github-merge-queue bot pushed a commit that referenced this pull request Dec 6, 2024
# Objective

Volumetric fog was broken by #13746.

Looks like this particular shader just got missed. I don't see any other
instances of `unpack_offset_and_counts` in the codebase.

```
2024-12-06T03:18:42.297494Z ERROR bevy_render::render_resource::pipeline_cache: failed to process shader:
error: no definition in scope for identifier: 'bevy_pbr::clustered_forward::unpack_offset_and_counts'
    ┌─ crates/bevy_pbr/src/volumetric_fog/volumetric_fog.wgsl:312:29
    │
312 │     let offset_and_counts = bevy_pbr::clustered_forward::unpack_offset_and_counts(cluster_index);
    │                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unknown identifier
    │
    = no definition in scope for identifier: 'bevy_pbr::clustered_forward::unpack_offset_and_counts'
```

## Solution

Use `unpack_clusterable_object_index_ranges` to get the indices for
point/spot lights.

## Testing

`cargo run --example volumetric_fog`
`cargo run --example fog_volumes`
`cargo run --example scrolling_fog`
ecoskey pushed a commit to ecoskey/bevy that referenced this pull request Jan 6, 2025
…#13746)

This commit allows the Bevy renderer to use the clustering
infrastructure for light probes (reflection probes and irradiance
volumes) on platforms where at least 3 storage buffers are available. On
such platforms (the vast majority), we stop performing brute-force
searches of light probes for each fragment and instead only search the
light probes with bounding spheres that intersect the current cluster.
This should dramatically improve scalability of irradiance volumes and
reflection probes.

The primary platform that doesn't support 3 storage buffers is WebGL 2,
and we continue using a brute-force search of light probes on that
platform, as the UBO that stores per-cluster indices is too small to fit
the light probe counts. Note, however, that that platform also doesn't
support bindless textures (indeed, it would be very odd for a platform
to support bindless textures but not SSBOs), so we only support one of
each type of light probe per drawcall there in the first place.
Consequently, this isn't a performance problem, as the search will only
have one light probe to consider. (In fact, clustering would probably
end up being a performance loss.)

Known potential improvements include:

1. We currently cull based on a conservative bounding sphere test and
not based on the oriented bounding box (OBB) of the light probe. This is
improvable, but in the interests of simplicity, I opted to keep the
bounding sphere test for now. The OBB improvement can be a follow-up.

2. This patch doesn't change the fact that each fragment only takes a
single light probe into account. Typical light probe implementations
detect the case in which multiple light probes cover the current
fragment and perform some sort of weighted blend between them. As the
light probe fetch function presently returns only a single light probe,
implementing that feature would require more code restructuring, so I
left it out for now. It can be added as a follow-up.

3. Light probe implementations typically have a falloff range. Although
this is a wanted feature in Bevy, this particular commit also doesn't
implement that feature, as it's out of scope.

4. This commit doesn't raise the maximum number of light probes past its
current value of 8 for each type. This should be addressed later, but
would possibly require more bindings on platforms with storage buffers,
which would increase this patch's complexity. Even without raising the
limit, this patch should constitute a significant performance
improvement for scenes that get anywhere close to this limit. In the
interest of keeping this patch small, I opted to leave raising the limit
to a follow-up.

## Changelog

### Changed

* Light probes (reflection probes and irradiance volumes) are now
clustered on most platforms, improving performance when many light
probes are present.

---------

Co-authored-by: Benjamin Brienen <Benjamin.Brienen@outlook.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
ecoskey pushed a commit to ecoskey/bevy that referenced this pull request Jan 6, 2025
# Objective

Volumetric fog was broken by bevyengine#13746.

Looks like this particular shader just got missed. I don't see any other
instances of `unpack_offset_and_counts` in the codebase.

```
2024-12-06T03:18:42.297494Z ERROR bevy_render::render_resource::pipeline_cache: failed to process shader:
error: no definition in scope for identifier: 'bevy_pbr::clustered_forward::unpack_offset_and_counts'
    ┌─ crates/bevy_pbr/src/volumetric_fog/volumetric_fog.wgsl:312:29
    │
312 │     let offset_and_counts = bevy_pbr::clustered_forward::unpack_offset_and_counts(cluster_index);
    │                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unknown identifier
    │
    = no definition in scope for identifier: 'bevy_pbr::clustered_forward::unpack_offset_and_counts'
```

## Solution

Use `unpack_clusterable_object_index_ranges` to get the indices for
point/spot lights.

## Testing

`cargo run --example volumetric_fog`
`cargo run --example fog_volumes`
`cargo run --example scrolling_fog`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

7 participants