Meshlet LOD-compatible two-pass occlusion culling by JMS55 · Pull Request #12898 · bevyengine/bevy

JMS55 · 2024-04-06T21:07:42Z

Keeping track of explicit visibility per cluster between frames does not work with LODs, and leads to worse culling (using the final depth buffer from the previous frame is more accurate).

Instead, we need to generate a second depth pyramid after the second raster pass, and then use that in the first culling pass in the next frame to test if a cluster would have been visible last frame or not.

As part of these changes, the write_index_buffer pass has been folded into the culling pass for a large performance gain, and to avoid tracking a lot of extra state that would be needed between passes.

Prepass previous model/view stuff was adapted to work with meshlets as well.

Also fixed a bug with materials, and other misc improvements.

Co-authored-by: François <mockersf@gmail.com>

Fix occlusion culling (partially) and add meshlet bounding sphere debug viewer

Fix occlusion culling in orthographic views

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl

Co-authored-by: Robert Swain <robert.swain@gmail.com>

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl

crates/bevy_pbr/src/meshlet/gpu_scene.rs

crates/bevy_pbr/src/prepass/mod.rs

JMS55 · 2024-04-24T01:12:50Z

Something is up with the bounding spheres being wonky. Occlusion culling isn't working correctly.

JMS55 · 2024-04-24T06:51:14Z

Fixed bugs, ready to go again.

…last-frame-depth-pyramid

atlv24

I still think that there's many literals in the shaders that should be either extracted into constants or conversion functions (like the << 6u/>> 6u cluster id conversions can be functions, 32u can be BITS_PER_ELEM or similar) but this can be addressed later once those numbers actually settle

atlv24 · 2024-04-26T03:49:47Z

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl

-            let sphere_depth = -view.projection[3][2] / (culling_bounding_sphere_center_view_space.z + culling_bounding_sphere_radius);
-            meshlet_visible &= sphere_depth >= occluder_depth;
+        let meshlet_triangle_count = meshlets[meshlet_id].triangle_count;
+        let buffer_start = atomicAdd(&draw_indirect_args.vertex_count, meshlet_triangle_count * 3u) / 3u;


is it not possible to skip the muls and divs by 3 here and just multiply afterwards at the last possible moment?

We'd need an extra pass to just multiply by 3, I doubt it's worth it just to save like 2 instructions.

crates/bevy_pbr/src/meshlet/gpu_scene.rs

Co-authored-by: vero <email@atlasdostal.com>

pcwalton

This looks fine. Since you mentioned performance, I left a few microoptimization suggestions. Feel free to take them or leave them :)

pcwalton · 2024-04-27T22:40:45Z

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl

+    // Project the culling bounding sphere to view-space for occlusion culling
+#ifdef MESHLET_FIRST_CULLING_PASS
+    let previous_model = affine3_to_square(instance_uniform.previous_model);
+    let previous_model_scale = max(length(previous_model[0]), max(length(previous_model[1]), length(previous_model[2])));


Suggested change

let previous_model_scale = max(length(previous_model[0]), max(length(previous_model[1]), length(previous_model[2])));

let previous_model_scale = sqrt(max(dot(previous_model[0], previous_model[0]), max(dot(previous_model[1], previous_model[1]), dot(previous_model[2], previous_model[2]))));

(feel free to split up into multiple lines if you want)

This saves 2 sqrt instructions.

It's hard to tell. These microoptimizations don't seem to show up on profiles, as they're not the bottleneck. The bottleneck for culling is mostly thread divergence, and a teeny bit lack of registers.

pcwalton · 2024-04-27T22:42:24Z

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl

+    let depth_pyramid_size_mip_0 = vec2<f32>(textureDimensions(depth_pyramid, 0)) * 0.5;
+    let width = (aabb.z - aabb.x) * depth_pyramid_size_mip_0.x;
+    let height = (aabb.w - aabb.y) * depth_pyramid_size_mip_0.y;
+    let depth_level = max(0, i32(ceil(log2(max(width, height))))); // TODO: Naga doesn't like this being a u32


Moving the * 0.5 after the max would save 1 multiply.

pcwalton · 2024-04-27T22:49:22Z

crates/bevy_pbr/src/meshlet/meshlet_bindings.wgsl

-fn get_meshlet_occlusion(cluster_id: u32) -> bool {
-    let packed_occlusion = meshlet_occlusion[cluster_id / 32u];
+fn meshlet_is_second_pass_candidate(cluster_id: u32) -> bool {
+    // TODO: Does this read need to be an atomicLoad?


No, not if all the writing happens in a previous compute dispatch. wgpu should have automatically inserted the proper barriers.

Oh whoops, this is left over from the previous version of this PR. I can remove it.

pcwalton · 2024-04-27T22:51:36Z

crates/bevy_pbr/src/meshlet/visibility_buffer_raster_node.rs

+            );
            cull_pass(
-                "meshlet_culling_first_pass",
+                "culling_first",


Why change this name? We'll have non-meshlet GPU culling too, and so it'd be nice to see the meshlet code clearly separated out in RenderDoc.

It is separated out. It's already nested under a larger meshlet debug span. The repeated use of meshlet for individual passes made it harder to read the NSight profiles.

crates/bevy_pbr/src/meshlet/visibility_buffer_raster_node.rs

pcwalton · 2024-04-27T23:03:46Z

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl

    let instance_uniform = meshlet_instance_uniforms[instance_id];
+    let meshlet_id = meshlet_thread_meshlet_ids[cluster_id];
    let model = affine3_to_square(instance_uniform.model);
    let model_scale = max(length(model[0]), max(length(model[1]), length(model[2])));


Suggested change

let model_scale = max(length(model[0]), max(length(model[1]), length(model[2])));

let model_scale = sqrt(max(dot(model[0], model[0]), max(dot(model[1], model[1]), dot(model[2], model[2]))));

Saves 2 sqrt instructions.

Co-authored-by: Robert Swain <robert.swain@gmail.com>

JMS55 and others added 30 commits January 14, 2024 21:19

Fix rendering regular meshes when the MeshletPlugin is added

4ddd312

Fix deferred compilation on wasm

18412d7

Add frustums to shadow views for point and spot lights

a03a37e

Merge commit 'aeab690fdb893b6eeb6dbadc646111ea76a5a782' into meshlet

5f632f1

Update examples/3d/meshlet.rs

f75dc7a

Co-authored-by: François <mockersf@gmail.com>

Merge commit 'ee9a1503edb6ff72cc69514c6336d9f624f0d600' into meshlet

932b34f

Add view instance visibility buffers to gpu scene

91fd217

Merge commit 'c9e1fcdb355b049fa3c3df8cb1cd1f4343f1b9d1' into meshlet

ec11c7a

Fix rebase lighting

9db1a96

Use a single 3d dispatch for write_index_buffer

ea38623

Remove redundant code

07bac3c

Add TODO

d2903ac

Try to fix occlusion culling (does not seem to be working still)

2a6e022

Fix occlusion culling and add meshlet bounding sphere debug viewer

62d8063

Account for scale

3d96a40

Merge pull request #22 from rodolphito/meshlet-debug

54ac8bb

Fix occlusion culling (partially) and add meshlet bounding sphere debug viewer

Fix occlusion culling fully

3d6a7d3

Cleanup example

a0f5c6b

Improve docs (note limitations)

a5cafe3

Misc wording

7979367

Clippy

2a21ed9

Remove rayon

e6faae6

Add floor to example

e0b7bf2

Fix doc

726370a

Merge commit '056b006d4eade8f1bf75e735ed5eda33d9505c9e' into meshlet

d0789ae

Fix occlusion culling in orthographic views

4dcc424

Merge pull request #23 from rodolphito/meshlet-orthographic

2a5c800

Fix occlusion culling in orthographic views

Misc cleanup

ddfbe8a

Merge remote-tracking branch 'jasmine/meshlet' into meshlet

eb4967b

Misc format

fd26948

superdump reviewed Apr 23, 2024

View reviewed changes

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl Outdated Show resolved Hide resolved

superdump reviewed Apr 23, 2024

View reviewed changes

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl Show resolved Hide resolved

Update crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl

918a1be

Co-authored-by: Robert Swain <robert.swain@gmail.com>

superdump reviewed Apr 23, 2024

View reviewed changes

crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl Show resolved Hide resolved

superdump reviewed Apr 23, 2024

View reviewed changes

crates/bevy_pbr/src/meshlet/gpu_scene.rs Show resolved Hide resolved

superdump reviewed Apr 23, 2024

View reviewed changes

crates/bevy_pbr/src/prepass/mod.rs Show resolved Hide resolved

JMS55 marked this pull request as draft April 24, 2024 01:12

JMS55 added 2 commits April 23, 2024 18:36

Fix culling_bounding_sphere_center_view_space

1c25940

Fix broken occlusion culling due to not binding all mips

7a6363a

JMS55 marked this pull request as ready for review April 24, 2024 06:38

JMS55 added 3 commits April 24, 2024 00:07

Merge commit 'c8d214d505bc0bf1470777e17dcdb6fa03510058' into meshlet-…

4e2aa84

…last-frame-depth-pyramid

Use eprintln!

a92a8ba

Merge branch 'main' into meshlet-last-frame-depth-pyramid

dd9752e

JMS55 requested a review from pcwalton April 25, 2024 03:20

JMS55 added 2 commits April 25, 2024 10:55

Store triangle ID in 6 bits instead of 8

407d1ad

Merge commit '36a3e53e10fb10c0d7ff0e224cb3643177abe934' into meshlet-…

7b5fbed

…last-frame-depth-pyramid

atlv24 approved these changes Apr 26, 2024

View reviewed changes

Update crates/bevy_pbr/src/meshlet/gpu_scene.rs

5a8c27a

Co-authored-by: vero <email@atlasdostal.com>

pcwalton approved these changes Apr 27, 2024

View reviewed changes

JMS55 and others added 2 commits April 27, 2024 18:43

Remove old comment

3b4d0f9

Update crates/bevy_pbr/src/meshlet/cull_meshlets.wgsl

0606d46

Co-authored-by: Robert Swain <robert.swain@gmail.com>

JMS55 requested review from IceSentry and robtfm April 28, 2024 02:51

superdump approved these changes Apr 28, 2024

View reviewed changes

superdump added this pull request to the merge queue Apr 28, 2024

Merged via the queue into bevyengine:main with commit e1a0da0 Apr 28, 2024

alice-i-cecile mentioned this pull request Apr 1, 2025

Write release notes for occlusion culling bevyengine/bevy-website#2056

Closed

	let previous_model_scale = max(length(previous_model[0]), max(length(previous_model[1]), length(previous_model[2])));
	let previous_model_scale = sqrt(max(dot(previous_model[0], previous_model[0]), max(dot(previous_model[1], previous_model[1]), dot(previous_model[2], previous_model[2]))));

	let model_scale = max(length(model[0]), max(length(model[1]), length(model[2])));
	let model_scale = sqrt(max(dot(model[0], model[0]), max(dot(model[1], model[1]), dot(model[2], model[2]))));

Uh oh!

Conversation

JMS55 commented Apr 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JMS55 commented Apr 24, 2024

Uh oh!

JMS55 commented Apr 24, 2024

Uh oh!

atlv24 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pcwalton left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JMS55 commented Apr 6, 2024 •

edited

Loading