GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316
GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316slomp wants to merge 57 commits intowolfpld:masterfrom
Conversation
4ce6662 to
bf3694e
Compare
2a1400b to
ee3ba91
Compare
…t makeshift timestamps for dropped timestamps
| #undef TracyD3D12Break | ||
| #undef TracyD3D12Debug | ||
| #undef TRACY_D3D12_PERSISTENT_TIMESTAMP_BUFFER | ||
| #undef TRACY_D3D12_DEBUG |
| TracyD3D12Debug( ZoneValue(m_contextId) ); | ||
| TracyD3D12Debug( ZoneValue(queryTicket) ); | ||
| TracyD3D12Debug( ZoneValue(RingIndex(queryTicket)) ); | ||
| auto Now = GetTickCount64; |
There was a problem hiding this comment.
why introduce this Now?
Also, elapsed will be int instead of uint64_t
| { | ||
| TracyD3D12Panic("Failed to create payload fence.", return); | ||
| } | ||
| UINT64* timestampBuffer = MapTimestampBuffer(); |
There was a problem hiding this comment.
should handle MapTimestampBuffer failure?
| if (Distance(earliestTicket, endTicket) <= 0) | ||
| return; | ||
|
|
||
| UINT64* timestampBuffer = MapTimestampBuffer(); |
There was a problem hiding this comment.
should handle MapTimestampBuffer failure?
| // collect all pending queries up to the latest known query | ||
| uint64_t endTicket = m_queryCounter; | ||
| uint64_t lastIssuedTicket = endTicket - 2; | ||
| Drain(lastIssuedTicket, 200); |
| // to the query that has just been generated. The value of the "late" query may | ||
| // may end up being collected as if it belonged to the the new query. | ||
| const uint64_t ticket = m_queryCounter.fetch_add(2, std::memory_order_relaxed); | ||
| if (Distance(m_previousCheckpoint, ticket) >= RingCapacity()) |
There was a problem hiding this comment.
I'd like the m_previousCheckpoint to have explicit memory ordering here.
| TracyD3D12Assert( lock.owns_lock() ); | ||
| TracyD3D12Debug( ZoneValue(m_contextId) ); | ||
|
|
||
| uint64_t earliestTicket = m_previousCheckpoint; |
There was a problem hiding this comment.
If the only writes done to m_previousCheckpoint are done behind the collect lock, you can make this a relaxed load
| auto ini = Now(); | ||
| auto elapsed = 0; | ||
| // TODO: could use condition variable to avoid spurious lock + collect iterations | ||
| while ((Distance(m_previousCheckpoint, queryTicket) >= 0) && (elapsed < timeout_ms)) |
There was a problem hiding this comment.
explicit memory ordering for the load please
| Collect(lock, queryTicket, false); | ||
| elapsed = Now() - ini; | ||
| } | ||
| return Distance(m_previousCheckpoint, queryTicket) < 0; |
There was a problem hiding this comment.
Do you want the latest value here, or the latest one you saw from the while loop?
Otherwise explicit memory ordering for the load please.
| TracyD3D12Assert( m_previousCheckpoint == ticket ); | ||
| ticket += 2; | ||
| m_previousCheckpoint.store(ticket); | ||
| return ticket; |
There was a problem hiding this comment.
If not needed, might as well not return it.
This PR removes the need for calling
NewFrame()periodically.(it also addresses and fixes #1294)
Some applications may run "headless" or may use the GPU for compute-only purposes, and do not have the concept of a frame boundary. Timestamp queries are now produced and collected in circular fashion.
In addition, the code is now resilient to situations where timestamp queries are reserved/dispensed, but the enclosing command list is never submitted to the GPU for execution. Such cases will flag warning messages and will "stall" progress, forcing "unresolvable" timestamps queries to be dropped. This also tricky covers corner cases where timestamp query ids are dispensed in one order, but the command lists are sent to the GPU queue in opposing order, or when an operation takes unusually long to complete with respect to everything else.
As such, queue signals are no longer required, nor useful: even with Signal/Wait, there are no guarantees that once the queue sets the signal value, all queries up to that value have indeed completed, since queries could be sent/executed out of the order. in which they were generated, or may not have been submitted to the queue yet (or ever).
The only scenario where ambiguity can still happen is as follows:
While technically possible, it is quite a contrived case. There does not seem to be a proper solution for the case above without some amount of costly query tracking along with some CPU-GPU synchronization on top of it. The current implementation is also prone to the same "bad-timing" problem anyway.