avoid realloc memory across frame in GpuArrayBuffer#11290
avoid realloc memory across frame in GpuArrayBuffer#11290re0312 wants to merge 3 commits intobevyengine:mainfrom
Conversation
JMS55
left a comment
There was a problem hiding this comment.
Good spot! Approved, but you need to remove the unused mem import to get CI to pass.
|
will that memory still be dealloced when it's not needed? |
|
We'll want to implement a scheme that shrinks the buffer capacity by a certain percentage if a certain percentage is unused when flip flopping. I have similar logic on my meshlet PR. |
No, memory usage will remain at the peak level required to hold the maximum number of meshes rendered on screen ,but the actual memory cost is relatively small, for example , 160k meshes (a pretty large number) obly cost 36 * 4B * 160000 ~ 21Mb |
|
Could you add a todo comment in the code for that? |
|
@JMS55 any reasons why we keep two vector in GpuArrayBuff::StorageBuffer ? IMO, the second vector seems unnecessary (or I might be overlooking something important). |
james7132
left a comment
There was a problem hiding this comment.
Thanks for making this PR! We can always add a shrink_to_fit function for reclaiming the unused memory.
|
Can't we just call |
This would likely cause more reallocations than we actually need, negating a significant portion of the performance gains. |
|
What I've been doing is when resetting the vec, if there's more than 30% spare capacity, shrink down to 30%. |
I don't know. I don't see why we need the second vec instead of just using buffer.get_mut(). Feel free to remove it. |
# Objective - Remove Vec as described in #11290 (comment) ## Solution - Rely on StorageBuffer's backing Vec instead --- ## Changelog - GpuArrayBuffer no longer has a redundant backing Vec
|
I don't think this is still applicable after #11368, should probably be closed? We still probably need a compaction strategy for the buffers now. |
Yeah, I will close it. maybe we need an individual system to free(shrink) all the memory that is no longer needed for render world. |
|
close in favor of #11368 |
# Objective - since #9685 ,bevy introduce automatic batching of draw commands, - `batch_and_prepare_render_phase` take the responsibility for batching `phaseItem`, - `GetBatchData` trait is used for indentify each phaseitem how to batch. it defines a associated type `Data `used for Query to fetch data from world. - however,the impl of `GetBatchData ` in bevy always set ` type Data=Entity` then we acually get following code `let entity:Entity =query.get(item.entity())` that cause unnecessary overhead . ## Solution - remove associated type `Data ` and `Filter` from `GetBatchData `, - change the type of the `query_item ` parameter in get_batch_data from` Self::Data` to `Entity`. - `batch_and_prepare_render_phase ` no longer takes a query using `F::Data, F::Filter` - `get_batch_data `now returns `Option<(Self::BufferData, Option<Self::CompareData>)>` --- ## Performance based in main merged with #11290 Window 11 ,Intel 13400kf, NV 4070Ti  frame time from 3.34ms to 3 ms, ~ 10%  `batch_and_prepare_render_phase` from 800us ~ 400 us ## Migration Guide trait `GetBatchData` no longer hold associated type `Data `and `Filter` `get_batch_data` `query_item `type from `Self::Data` to `Entity` and return `Option<(Self::BufferData, Option<Self::CompareData>)>` `batch_and_prepare_render_phase` should not have a query
Objective
Solution
Peformance
Window 11 ,Intel 13400kf, NV 4070Ti
(Only the platform which support StorageBuffer will benefit from this pr)
many cubes

yellow is main , red is pr
frame meantime from 3.85ms to 3.23ms,~16% reduction
hot-spot function : batch_and_prepare_render_phase

meantime from 1.01ms to 0.567ms,almost 100% gain