GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure by slomp · Pull Request #1316 · wolfpld/tracy

slomp · 2026-03-23T23:59:04Z

This PR removes the need for calling NewFrame() periodically.
(it also addresses and fixes #1294)

Some applications may run "headless" or may use the GPU for compute-only purposes, and do not have the concept of a frame boundary. Timestamp queries are now produced and collected in circular fashion.

In addition, the code is now resilient to situations where timestamp queries are reserved/dispensed, but the enclosing command list is never submitted to the GPU for execution. Such cases will flag warning messages and will "stall" progress, forcing "unresolvable" timestamps queries to be dropped. This also tricky covers corner cases where timestamp query ids are dispensed in one order, but the command lists are sent to the GPU queue in opposing order, or when an operation takes unusually long to complete with respect to everything else.

As such, queue signals are no longer required, nor useful: even with Signal/Wait, there are no guarantees that once the queue sets the signal value, all queries up to that value have indeed completed, since queries could be sent/executed out of the order. in which they were generated, or may not have been submitted to the queue yet (or ever).

The only scenario where ambiguity can still happen is as follows:

a command buffer (CB1) is recorded, but not yet executed
time goes by, and several command buffers are subsequently recorded and executed
Collect() ends up giving up on CB1's "reserved" queries
CB1 is finally sent to the queue and executed
but around the same instant, another command buffer is recorded, claiming query ids that clash with CB1
the GPU will resolve the CB1 timestamps, and write to the timestamp heap
Collect() is not going to be able to disambiguate the timestamp

While technically possible, it is quite a contrived case. There does not seem to be a proper solution for the case above without some amount of costly query tracking along with some CPU-GPU synchronization on top of it. The current implementation is also prone to the same "bad-timing" problem anyway.

…mestamp queries

…t makeshift timestamps for dropped timestamps

… condition

…the server)

Lectem · 2026-04-20T14:22:18Z

+#undef TracyD3D12Break
+#undef TracyD3D12Debug
+#undef TRACY_D3D12_PERSISTENT_TIMESTAMP_BUFFER
+#undef TRACY_D3D12_DEBUG


TRACY_D3D12_DEBUG_LEVEL ?

Lectem · 2026-04-20T14:23:36Z

+            TracyD3D12Debug( ZoneValue(m_contextId) );
+            TracyD3D12Debug( ZoneValue(queryTicket) );
+            TracyD3D12Debug( ZoneValue(RingIndex(queryTicket)) );
+            auto Now = GetTickCount64;


why introduce this Now?
Also, elapsed will be int instead of uint64_t

Lectem · 2026-04-20T14:24:15Z

-            {
-                TracyD3D12Panic("Failed to create payload fence.", return);
-            }
+            UINT64* timestampBuffer = MapTimestampBuffer();


should handle MapTimestampBuffer failure?

Lectem · 2026-04-20T14:24:23Z

+            if (Distance(earliestTicket, endTicket) <= 0)
                return;
+
+            UINT64* timestampBuffer = MapTimestampBuffer();


should handle MapTimestampBuffer failure?

Lectem · 2026-04-20T14:26:54Z

+            // collect all pending queries up to the latest known query
+            uint64_t endTicket = m_queryCounter;
+            uint64_t lastIssuedTicket = endTicket - 2;
+            Drain(lastIssuedTicket, 200);


Why 200ms ?

Lectem · 2026-04-20T14:55:40Z

+            // to the query that has just been generated. The value of the "late" query may
+            // may end up being collected as if it belonged to the the new query.
+            const uint64_t ticket = m_queryCounter.fetch_add(2, std::memory_order_relaxed);
+            if (Distance(m_previousCheckpoint, ticket) >= RingCapacity())


I'd like the m_previousCheckpoint to have explicit memory ordering here.

Lectem · 2026-04-20T14:56:40Z

+            TracyD3D12Assert( lock.owns_lock() );
+            TracyD3D12Debug( ZoneValue(m_contextId) );
+
+            uint64_t earliestTicket = m_previousCheckpoint;


If the only writes done to m_previousCheckpoint are done behind the collect lock, you can make this a relaxed load

Lectem · 2026-04-20T14:57:17Z

+            auto ini = Now();
+            auto elapsed = 0;
+            // TODO: could use condition variable to avoid spurious lock + collect iterations
+            while ((Distance(m_previousCheckpoint, queryTicket) >= 0) && (elapsed < timeout_ms))


explicit memory ordering for the load please

Lectem · 2026-04-20T14:57:20Z

+                Collect(lock, queryTicket, false);
+                elapsed = Now() - ini;
            }
+            return Distance(m_previousCheckpoint, queryTicket) < 0;


Do you want the latest value here, or the latest one you saw from the while loop?
Otherwise explicit memory ordering for the load please.

Lectem · 2026-04-20T15:03:19Z

+            TracyD3D12Assert( m_previousCheckpoint == ticket );
+            ticket += 2;
+            m_previousCheckpoint.store(ticket);
+            return ticket;


If not needed, might as well not return it.

slomp marked this pull request as draft March 23, 2026 23:59

slomp changed the title ~~eliminate NewFrame, and account for "abandoned" and "out-of-order" ti…~~ D3D12: remove NewFrame() API; account for abandoned/out-of-order timestamp queries Mar 24, 2026

slomp force-pushed the slomp/d3d12-ring branch 3 times, most recently from 4ce6662 to bf3694e Compare March 29, 2026 21:08

slomp changed the title ~~D3D12: remove NewFrame() API; account for abandoned/out-of-order timestamp queries~~ D3D12: remove NewFrame() API, and introduce a timeout to Collect() Mar 29, 2026

slomp changed the title ~~D3D12: remove NewFrame() API, and introduce a timeout to Collect()~~ GPU: D3D12: remove NewFrame() API, and introduce a timeout to Collect() Mar 29, 2026

slomp force-pushed the slomp/d3d12-ring branch 5 times, most recently from 2a1400b to ee3ba91 Compare April 6, 2026 14:23

slomp mentioned this pull request Apr 7, 2026

[DirectX 12] Tracy GPU profiling deadlocks if the device is removed #1294

Open

slomp changed the title ~~GPU: D3D12: remove NewFrame() API, and introduce a timeout to Collect()~~ GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure Apr 8, 2026

slomp force-pushed the slomp/d3d12-ring branch from 1d49121 to 4b15bfa Compare April 8, 2026 21:50

slomp marked this pull request as ready for review April 9, 2026 16:29

slomp requested a review from Lectem April 9, 2026 16:29

slomp added 13 commits April 15, 2026 14:00

eliminate NewFrame, and account for "abandoned" and "out-of-order" ti…

826a69e

…mestamp queries

comments and eminders

c74667c

collect timestamps in pairs

017f753

keep track of the last known emitted gpu timestamp, and use it to emi…

801ec2a

…t makeshift timestamps for dropped timestamps

misc

e53e54b

debugging the UI freaking out

7b59913

timestamp aging and race debugging

a8c1e70

disabling debug dump

f300cbc

reveting overflow suppresion

62d5d83

re-enable range assert

163bdf9

persistent map, debug toggle, etc

2ec4169

explaining the race condition

b4aa49d

re-enabbling post-collect calibration; rephrasing comments about race…

2054a79

… condition

slomp added 28 commits April 15, 2026 14:00

debugging

17a5d3c

adopting new producer-consumer "panic" strategy

56705e7

cleanup

f80b8aa

refactoring

60af6e5

cleanup

d1ff93f

refactoring Collect

a3e8233

track latest known GPU timestamp (to avoid "time-travel" later on in …

3c74436

…the server)

comments about ResolveQueryData

27ee391

debugging

a0c185b

cleanup includes

1ffeaf4

refactoring debug macros

23d964b

implementing Drain/Wait

5c54464

minor refactoring

48e488a

debugger check

2118910

nomenclature

2c8aa25

simplify debug ZoneValue

7fdcf99

handling special wait cases, and misc comments

921aa33

simplified private API

e1e8027

refactoring

f7e9cbb

debug cleanup

3de00b3

debug cleanup

200dd6d

re-enabling assert

86cf9d6

refactoring Drain/Collect

74af2c6

claim shared ownership of the queue to keep the queue (and device) alive

66295e5

comments

f29ec01

removing debug pragma

6693bd1

removing debug code

90bd031

debug cleanup

c97c895

slomp force-pushed the slomp/d3d12-ring branch from b5096f9 to c97c895 Compare April 15, 2026 21:00

Lectem requested changes Apr 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316

GPU: D3D12: remove NewFrame() API + drop unresolved queries when under pressure#1316
slomp wants to merge 57 commits intowolfpld:masterfrom
slomp:slomp/d3d12-ring

slomp commented Mar 23, 2026 •

edited

Loading

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Lectem Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

slomp commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

slomp commented Mar 23, 2026 •

edited

Loading