Workaround for memory types and linear images on Nvidia#28
Closed
K0bin wants to merge 10 commits into
Closed
Conversation
18dd377 to
765ae23
Compare
Contributor
Author
|
Given that Mantle apparently also doesn't have uniform buffers, I'm not sure if this is worth it at all. Even if everything worked correctly, it would be way slower than DXVK on Nvidia hardware. We'd have to patch shaders to turn buffers into uniform buffers when binding small, correctly aligned ranges of memory (no idea if that would even work) and I don't think anyone wants to implement that. |
Fixes a validation error.
Probably not correct but silences validation errors.
BF4 uses those. We need to be able to calculate texture sizes for the linear image workaround.
This reverts commit d222c5f. We need to create the buffer on demand when using the memory allocator on Nvidia hardware. Otherwise we might end up doubling the VRAM usage.
when the memory allocator is not used. and remove VK_BUFFER_USAGE_VERTEX_BUFFER_BIT as Mantle doesn't have vertex buffers.
This is necessary because: - Nvidia exposes more than 8 memory types - The regular host visible memory type does not support optimal images. So a workaround for the lack of linear image support will need us to be in control of memory allocations.
…ERAL Makes the barrier required to copy from and to the buffer easier.
|
Any chance to get this revisited? I see your point regarding performance and raise your mine: preservation 📚 |
Contributor
Author
|
It's a very invasive and ugly change and I don't think @libcg is terribly excited about it. (which I can't blame him for.) |
Owner
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WSI Changes
General fixes
Nvidia workarounds
Running BF4 with GRVK ran into the following limitations:
So in order to make it all work, the PR does the following:
We expose the following heaps:
The host visible heaps basically work the same as before. grAllocateMemory will allocate a block of memory (using VMA in this case) that will get bound to an object later.
Device local heaps on the other hand will not allocate in grAllocateMemory. Instead we allocate in grBindObjectMemory when we know what kind of object it is in order to use the right memory type. It also brings back the old behavior of lazily creating Vulkan buffers for GrGpuMemory objects because otherwise memory usage would effectively double. It will still create a buffer in grAllocMemory for host visible heaps and when running on AMD GPUs.
When binding an image with the workaround activated (read: it has both an image and a buffer), we do one of the following things depending on the requested heap:
If the heap is host visible, we bind the buffer to the memory, allocate a new chunk of memory specifically for the image (can even be device local) and bind that. If the heap is not host visible, we just allocate memory for the image and destroy the buffer that was created for that image (effectively turning it into a regular device local optimal image).