Workaround for memory types and linear images on Nvidia by K0bin · Pull Request #28 · libcg/grvk

K0bin · 2021-10-22T23:53:16Z

WSI Changes

Respect creating presentable image after GRVK has presented once
Record the copy command buffer on the fly instead of prerecording it
Respect Surface Capabilities min & max extent and use presentable extent size if possible. This fixes an issue where GetClientRect would return a 0x0 rectangle.

General fixes

Do not use STORE_OP_STORE when the pipeline has a 0 write mask. This breaks rendering of the BF4 options menu on Nvidia.
Ensure that the buffer barrier range in grPrepareMemoryRegion is within the bounds of the buffer. This fixes a validation error in BF4.
Added more formats to GetTexelSize and GetTileSize.
Ensure a minimum size of 1x1 for textures and the viewport. I don't know if that's correct but it at least fixes a validation error in BF4.

Nvidia workarounds

Running BF4 with GRVK ran into the following limitations:

Nvidia has 11 memory types while Mantle only supports 8. Most of those memory types are non-mappable system memory which supports a specific type of (optimal tiled) image.
Nvidia only supports linear 2D images with 1 array layer and 1 mip map.

So in order to make it all work, the PR does the following:

If the game tries to create a linear image that is not supported, we use optimal tiling for the image and also create a accompanying buffer.
Track all usages of images with the workaround applied and copy from the buffer to the image at submission time and from the image to the buffer at the end of a command buffer.
Because the optimal image and a host visible buffer can not go into the same memory type / mantle heap, we need to take control of memory allocations: To do this, this pulls in VMA and uses that for all memory code when enabled.

We expose the following heaps:

Device local
Host visible
Host visible + cached
Host visible + device local (when supported by the Vulkan driver)

The host visible heaps basically work the same as before. grAllocateMemory will allocate a block of memory (using VMA in this case) that will get bound to an object later.

Device local heaps on the other hand will not allocate in grAllocateMemory. Instead we allocate in grBindObjectMemory when we know what kind of object it is in order to use the right memory type. It also brings back the old behavior of lazily creating Vulkan buffers for GrGpuMemory objects because otherwise memory usage would effectively double. It will still create a buffer in grAllocMemory for host visible heaps and when running on AMD GPUs.

When binding an image with the workaround activated (read: it has both an image and a buffer), we do one of the following things depending on the requested heap:
If the heap is host visible, we bind the buffer to the memory, allocate a new chunk of memory specifically for the image (can even be device local) and bind that. If the heap is not host visible, we just allocate memory for the image and destroy the buffer that was created for that image (effectively turning it into a regular device local optimal image).

K0bin · 2021-10-29T19:05:03Z

Given that Mantle apparently also doesn't have uniform buffers, I'm not sure if this is worth it at all. Even if everything worked correctly, it would be way slower than DXVK on Nvidia hardware.

We'd have to patch shaders to turn buffers into uniform buffers when binding small, correctly aligned ranges of memory (no idea if that would even work) and I don't think anyone wants to implement that.

Fixes a validation error.

Probably not correct but silences validation errors.

BF4 uses those. We need to be able to calculate texture sizes for the linear image workaround.

This reverts commit d222c5f. We need to create the buffer on demand when using the memory allocator on Nvidia hardware. Otherwise we might end up doubling the VRAM usage.

when the memory allocator is not used. and remove VK_BUFFER_USAGE_VERTEX_BUFFER_BIT as Mantle doesn't have vertex buffers.

This is necessary because: - Nvidia exposes more than 8 memory types - The regular host visible memory type does not support optimal images. So a workaround for the lack of linear image support will need us to be in control of memory allocations.

…ERAL Makes the barrier required to copy from and to the buffer easier.

sehraf · 2022-03-19T23:34:21Z

Any chance to get this revisited?
My only (half recent) AMD card doesn't support the newer required Vulkan extensions.

I see your point regarding performance and raise your mine: preservation 📚

K0bin · 2022-03-19T23:37:38Z

It's a very invasive and ugly change and I don't think @libcg is terribly excited about it. (which I can't blame him for.)

libcg · 2022-03-20T00:46:12Z

@sehraf You might want to look into NimeZ drivers if you're on Windows. Or use RADV on Linux, it's the only driver currently recommended for GRVK.

@K0bin It's a whole lot of code that I can't test or maintain.. Would be good to revisit when GRVK is mature. I appreciate the effort btw.

K0bin force-pushed the novideo branch 2 times, most recently from 18dd377 to 765ae23 Compare October 26, 2021 19:18

K0bin added 10 commits December 6, 2021 15:56

mantle: Ensure buffer barrier region is within buffer bounds

b7e36c2

Fixes a validation error.

mantle: Enforce minimum size of images and viewport

c8565e6

Probably not correct but silences validation errors.

mantle: Add texem and tile size of additional formats

1d4a7b0

BF4 uses those. We need to be able to calculate texture sizes for the linear image workaround.

mantle: Refactor format check in grCreateImage

a02a371

Revert "mantle: don't defer buffer creation"

2a8900b

This reverts commit d222c5f. We need to create the buffer on demand when using the memory allocator on Nvidia hardware. Otherwise we might end up doubling the VRAM usage.

mantle: don't defer buffer creation

5d13cde

when the memory allocator is not used. and remove VK_BUFFER_USAGE_VERTEX_BUFFER_BIT as Mantle doesn't have vertex buffers.

mantle: Track linear images that need to be synced

b6f838f

mantle: Keep linear images that have associated buffers in LAYOUT_GEN…

519aa33

…ERAL Makes the barrier required to copy from and to the buffer easier.

mantle: Sync linear image workaround buffer

bfe5ffb

K0bin force-pushed the novideo branch from 765ae23 to bfe5ffb Compare December 6, 2021 17:06

K0bin closed this Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workaround for memory types and linear images on Nvidia#28

Workaround for memory types and linear images on Nvidia#28
K0bin wants to merge 10 commits into
libcg:masterfrom
K0bin:novideo

K0bin commented Oct 22, 2021 •

edited

Loading

Uh oh!

K0bin commented Oct 29, 2021

Uh oh!

sehraf commented Mar 19, 2022

Uh oh!

K0bin commented Mar 19, 2022

Uh oh!

libcg commented Mar 20, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

K0bin commented Oct 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WSI Changes

General fixes

Nvidia workarounds

Uh oh!

K0bin commented Oct 29, 2021

Uh oh!

sehraf commented Mar 19, 2022

Uh oh!

K0bin commented Mar 19, 2022

Uh oh!

libcg commented Mar 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

K0bin commented Oct 22, 2021 •

edited

Loading

libcg commented Mar 20, 2022 •

edited

Loading