Skip to content

GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316

Open
slomp wants to merge 57 commits intowolfpld:masterfrom
slomp:slomp/d3d12-ring
Open

GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316
slomp wants to merge 57 commits intowolfpld:masterfrom
slomp:slomp/d3d12-ring

Conversation

@slomp
Copy link
Copy Markdown
Collaborator

@slomp slomp commented Mar 23, 2026

This PR removes the need for calling NewFrame() periodically.
(it also addresses and fixes #1294)

Some applications may run "headless" or may use the GPU for compute-only purposes, and do not have the concept of a frame boundary. Timestamp queries are now produced and collected in circular fashion.

In addition, the code is now resilient to situations where timestamp queries are reserved/dispensed, but the enclosing command list is never submitted to the GPU for execution. Such cases will flag warning messages and will "stall" progress, forcing "unresolvable" timestamps queries to be dropped. This also tricky covers corner cases where timestamp query ids are dispensed in one order, but the command lists are sent to the GPU queue in opposing order, or when an operation takes unusually long to complete with respect to everything else.

As such, queue signals are no longer required, nor useful: even with Signal/Wait, there are no guarantees that once the queue sets the signal value, all queries up to that value have indeed completed, since queries could be sent/executed out of the order. in which they were generated, or may not have been submitted to the queue yet (or ever).


The only scenario where ambiguity can still happen is as follows:

  1. a command buffer (CB1) is recorded, but not yet executed
  2. time goes by, and several command buffers are subsequently recorded and executed
  3. Collect() ends up giving up on CB1's "reserved" queries
  4. CB1 is finally sent to the queue and executed
  5. but around the same instant, another command buffer is recorded, claiming query ids that clash with CB1
  6. the GPU will resolve the CB1 timestamps, and write to the timestamp heap
  7. Collect() is not going to be able to disambiguate the timestamp

While technically possible, it is quite a contrived case. There does not seem to be a proper solution for the case above without some amount of costly query tracking along with some CPU-GPU synchronization on top of it. The current implementation is also prone to the same "bad-timing" problem anyway.

@slomp slomp marked this pull request as draft March 23, 2026 23:59
@slomp slomp changed the title eliminate NewFrame, and account for "abandoned" and "out-of-order" ti… D3D12: remove NewFrame() API; account for abandoned/out-of-order timestamp queries Mar 24, 2026
@slomp slomp force-pushed the slomp/d3d12-ring branch 3 times, most recently from 4ce6662 to bf3694e Compare March 29, 2026 21:08
@slomp slomp changed the title D3D12: remove NewFrame() API; account for abandoned/out-of-order timestamp queries D3D12: remove NewFrame() API, and introduce a timeout to Collect() Mar 29, 2026
@slomp slomp changed the title D3D12: remove NewFrame() API, and introduce a timeout to Collect() GPU: D3D12: remove NewFrame() API, and introduce a timeout to Collect() Mar 29, 2026
@slomp slomp force-pushed the slomp/d3d12-ring branch 5 times, most recently from 2a1400b to ee3ba91 Compare April 6, 2026 14:23
@slomp slomp changed the title GPU: D3D12: remove NewFrame() API, and introduce a timeout to Collect() GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure Apr 8, 2026
@slomp slomp force-pushed the slomp/d3d12-ring branch from 1d49121 to 4b15bfa Compare April 8, 2026 21:50
@slomp slomp marked this pull request as ready for review April 9, 2026 16:29
@slomp slomp requested a review from Lectem April 9, 2026 16:29
@slomp slomp force-pushed the slomp/d3d12-ring branch from b5096f9 to c97c895 Compare April 15, 2026 21:00
#undef TracyD3D12Break
#undef TracyD3D12Debug
#undef TRACY_D3D12_PERSISTENT_TIMESTAMP_BUFFER
#undef TRACY_D3D12_DEBUG
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRACY_D3D12_DEBUG_LEVEL ?

TracyD3D12Debug( ZoneValue(m_contextId) );
TracyD3D12Debug( ZoneValue(queryTicket) );
TracyD3D12Debug( ZoneValue(RingIndex(queryTicket)) );
auto Now = GetTickCount64;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why introduce this Now?
Also, elapsed will be int instead of uint64_t

{
TracyD3D12Panic("Failed to create payload fence.", return);
}
UINT64* timestampBuffer = MapTimestampBuffer();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should handle MapTimestampBuffer failure?

if (Distance(earliestTicket, endTicket) <= 0)
return;

UINT64* timestampBuffer = MapTimestampBuffer();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should handle MapTimestampBuffer failure?

// collect all pending queries up to the latest known query
uint64_t endTicket = m_queryCounter;
uint64_t lastIssuedTicket = endTicket - 2;
Drain(lastIssuedTicket, 200);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 200ms ?

// to the query that has just been generated. The value of the "late" query may
// may end up being collected as if it belonged to the the new query.
const uint64_t ticket = m_queryCounter.fetch_add(2, std::memory_order_relaxed);
if (Distance(m_previousCheckpoint, ticket) >= RingCapacity())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like the m_previousCheckpoint to have explicit memory ordering here.

TracyD3D12Assert( lock.owns_lock() );
TracyD3D12Debug( ZoneValue(m_contextId) );

uint64_t earliestTicket = m_previousCheckpoint;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only writes done to m_previousCheckpoint are done behind the collect lock, you can make this a relaxed load

auto ini = Now();
auto elapsed = 0;
// TODO: could use condition variable to avoid spurious lock + collect iterations
while ((Distance(m_previousCheckpoint, queryTicket) >= 0) && (elapsed < timeout_ms))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit memory ordering for the load please

Collect(lock, queryTicket, false);
elapsed = Now() - ini;
}
return Distance(m_previousCheckpoint, queryTicket) < 0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want the latest value here, or the latest one you saw from the while loop?
Otherwise explicit memory ordering for the load please.

TracyD3D12Assert( m_previousCheckpoint == ticket );
ticket += 2;
m_previousCheckpoint.store(ticket);
return ticket;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not needed, might as well not return it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DirectX 12] Tracy GPU profiling deadlocks if the device is removed

2 participants