Move adding DynamicUniformIndex to Extract#5037
Move adding DynamicUniformIndex to Extract#5037james7132 wants to merge 1 commit intobevyengine:mainfrom
Conversation
superdump
left a comment
There was a problem hiding this comment.
I like this. One question - instead of having a default value of 0, wouldn’t it be better to make it an Option and skip drawing the thing if its index was never initialised?
This adds a branch in the middle of the render stage, which I'm hesitant to bloat even more given how heavy it already is, and it's assured to written to during prepare too. It also makes the component bigger, which deflates the performance gains we see here. Perhaps under a |
| impl<C: Component> Clone for DynamicUniformIndex<C> { | ||
| fn clone(&self) -> Self { | ||
| Self { | ||
| index: self.index, | ||
| marker: PhantomData, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Is the manual Clone necessary here?
It doesn't relax the C: Component bound and Copy is still derived.
There was a problem hiding this comment.
PhantomData<T> only implements Clone iff T: Clone, which also transitively holds for the derived impl. This implements Clone and Default regardless of what T is.
| impl<C: Component> Default for DynamicUniformIndex<C> { | ||
| fn default() -> Self { | ||
| Self { | ||
| index: 0, | ||
| marker: PhantomData, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This default impl also could be replaced by a derive.
|
Wanted to just note that if we merge #4902, we can avoid the secondary copy inside the extraction commands by adding a EDIT: Tried this, the I also tested to see if we could defer the |
Could you test and profile it to see if it does make a practical performance difference? It would be a win for correctness in case something doesn’t actually ever get set. |
This again becomes a question of whether the model matrix is used/desired to be used elsewhere in the render schedule such that not calculating it upfront incurs calculation of it multiple times. If it just moves time from one place to another with no other overall performance benefits then the only pro is that it makes the extract stage shorter. That would be enough but only if we think no one ever needs the model matrix. I wonder if TAA would need it for motion vectors or how that works… |
Tried this with a |
Good to know that panic incurs that performance hit. I was thinking that a missing index would somehow cause that entity not to be drawn by propagating up an error or something. But if we don’t already have error returns from draw functions then maybe it’s not worth it. I’m just kind of expecting it to be easy enough to make code where some entities never have their dynamic index updated and then they will be drawn using whatever the transform is for index 0. I suppose another way to handle it would be to make that model matrix produce vertices containing nans in the clip position and then it will be dropped, but that feels like a hack where it would be better to just not draw the thing. |
Having seen the perf hit, I tried the opposite and changed |
Sounds reasonable to do it in a separate PR. If you don't intend to do that straight away, could you add a TODO comment? |
| render_queue: Res<RenderQueue>, | ||
| mut component_uniforms: ResMut<ComponentUniforms<C>>, | ||
| components: Query<(Entity, &C)>, | ||
| mut components: Query<(&C, &mut DynamicUniformIndex<C>)>, |
There was a problem hiding this comment.
Doesn't this break the UniformComponentPlugin in the general case?
This now assumes that DynamicUniformIndex is added in the extract step, but that isn't the case for something using, say, ExtractComponentPlugin.
We aren't currently using this anywhere else, but given that this is intended to be a generalized (and user facing) abstraction, I think we should discuss ways to make this "fool proof".
There was a problem hiding this comment.
I've dropped the ball on this PR, but thinking on this a bit more. I think it makes a lot of sense to take an approach where we keep indices as components while directly writing extracted components to their target staging buffers.
Indices are small. 4-8 bytes typically. Compare this with the equivalent MeshUniform, which is 132 bytes currently. If we are going to heavily leverage commands for rendering, we should be minimizing the number of large copies that are being performed. I'd much rather us copy heavy components once and then just shuffle the indices around.
If we still need the intermediate data during Prepare or Queue, we can always refer back to the buffer in memory. It's less ergonomic, but alleviates the heaviest parts of running the Render World right now.
|
Closing this as the renderer is already moving in a non-direct ECS storage direction, and the introduction of the instancing and batching changes makes this difficult to merge. |
Objective
prepare_uniform_components's commands must be run with exclusive access to the render world and can take quite a bit of time for components on lots of entities, particularly with archetypes with many big components. This is doing redundant work that is already being done inExtract.Solution
DefaultonDynamicUniformIndex.DynamicUniformIndexinto Extract instead ofPrepare.prepare_uniform_componentsto query for&mut DynamicUniformIndexinstead of using commands.Performance
This was tested against the
many_cubesstress test. Here are the respective timing changes:Changelog
TODO
Migration Guide
TODO