TracyRocprof: fix on-demand profiling crash and missing context name by bmilanich · Pull Request #1336 · wolfpld/tracy

bmilanich · 2026-04-16T14:31:16Z

Problem

The rocprofiler GPU backend crashes when a profiler connects to an application built with TRACY_ON_DEMAND:

Assertion `ctx' failed in ProcessGpuZoneBeginImplCommon

Even if the crash is worked around, the GPU context appears unnamed and kernel names are missing.

Root cause

gpu_context_allocate() writes GpuNewContext and GpuContextName queue items but never calls DeferItem() for either. Under on-demand mode, a late-connecting client never receives these messages, so it has no GPU context when GPU zone events start arriving.

Separately, tool_callback_tracing_callback() gates all callbacks on data->init, which is only set after the calibration thread allocates the GPU context. Kernel symbol registrations (CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER) happen at HIP init time, well before data->init is set, so they are silently dropped. This was a regression introduced in 86de397 ("Add calibration thread") — the earlier delay_init() approach in 98047ff had the guard placed after the code_object block, so symbols were always recorded.

Fix

Add DeferItem() calls for both GpuNewContext and GpuContextName under #ifdef TRACY_ON_DEMAND, replicating the pattern already used by the CUDA backend (TracyCUDA.hpp SubmitQueueItem).
Move the data->init guard after the code object registration block, restoring the pre-86de3970 behavior so kernel symbols are always recorded.

Repro case

examples/RocprofOnDemandRepro/ contains a minimal HIP program and a check_gpu_ctx_name tool. See the README in that directory for details.

Test results

Tested on AMD MI300X (gfx950), ROCm 7.1.1, both release and debug builds:

Build	Unpatched	Patched
Release (`-O2`)	`tracy-capture` segfaults	Capture succeeds, ~50 GPU zones
Debug (`-g -O0`)	`Assertion 'ctx' failed` in `ProcessGpuZoneBeginImplCommon`	No assertions, ~50 GPU zones
Context name	N/A (crash)	`rocprofv3`
Kernel names	N/A (crash)	Resolved (`vectorAdd`)

Two issues prevented the rocprofiler GPU backend from working with TRACY_ON_DEMAND: 1. GpuNewContext not deferred: When a Tracy client connects late (on-demand mode), it never receives the GPU context creation message because the GpuNewContext queue item was not buffered via DeferItem. This caused an assertion failure (ctx == nullptr) in the capture/profiler when processing GPU zone events. Add the same DeferItem pattern used by the CUDA backend. 2. Kernel symbols dropped before init: The data->init guard at the top of tool_callback_tracing_callback() blocked kernel symbol registrations (CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER) which happen at HIP init time, before any Tracy client connects. Move the init guard after the code_object block so symbols are always recorded, while dispatch and memory-copy events are still gated on initialization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Minimal HIP program that demonstrates the assertion failure in tracy-capture when connecting to a TRACY_ON_DEMAND + TRACY_ROCPROF application. See examples/RocprofOnDemandRepro/README.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Without this, a late-connecting client receives the deferred GpuNewContext but not the GpuContextName, so the GPU context appears unnamed in the profiler. Add check_gpu_ctx_name tool to verify context names in captured traces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

bmilanich and others added 3 commits April 15, 2026 15:56

bmilanich changed the title ~~Fix rocprofiler on-demand profiling support~~ TracyRocprof: fix on-demand profiling crash and missing context name Apr 16, 2026

bmilanich marked this pull request as ready for review April 16, 2026 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TracyRocprof: fix on-demand profiling crash and missing context name#1336

TracyRocprof: fix on-demand profiling crash and missing context name#1336
bmilanich wants to merge 3 commits intowolfpld:masterfrom
bmilanich:rocm-on-demand-fix

bmilanich commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

bmilanich commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root cause

Fix

Repro case

Test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bmilanich commented Apr 16, 2026 •

edited

Loading