Skip to content

TracyRocprof: fix on-demand profiling crash and missing context name#1336

Open
bmilanich wants to merge 3 commits intowolfpld:masterfrom
bmilanich:rocm-on-demand-fix
Open

TracyRocprof: fix on-demand profiling crash and missing context name#1336
bmilanich wants to merge 3 commits intowolfpld:masterfrom
bmilanich:rocm-on-demand-fix

Conversation

@bmilanich
Copy link
Copy Markdown
Contributor

@bmilanich bmilanich commented Apr 16, 2026

Problem

The rocprofiler GPU backend crashes when a profiler connects to an application built with TRACY_ON_DEMAND:

Assertion `ctx' failed in ProcessGpuZoneBeginImplCommon

Even if the crash is worked around, the GPU context appears unnamed and kernel names are missing.

Root cause

gpu_context_allocate() writes GpuNewContext and GpuContextName queue items but never calls DeferItem() for either. Under on-demand mode, a late-connecting client never receives these messages, so it has no GPU context when GPU zone events start arriving.

Separately, tool_callback_tracing_callback() gates all callbacks on data->init, which is only set after the calibration thread allocates the GPU context. Kernel symbol registrations (CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER) happen at HIP init time, well before data->init is set, so they are silently dropped. This was a regression introduced in 86de397 ("Add calibration thread") — the earlier delay_init() approach in 98047ff had the guard placed after the code_object block, so symbols were always recorded.

Fix

  • Add DeferItem() calls for both GpuNewContext and GpuContextName under #ifdef TRACY_ON_DEMAND, replicating the pattern already used by the CUDA backend (TracyCUDA.hpp SubmitQueueItem).
  • Move the data->init guard after the code object registration block, restoring the pre-86de3970 behavior so kernel symbols are always recorded.

Repro case

examples/RocprofOnDemandRepro/ contains a minimal HIP program and a check_gpu_ctx_name tool. See the README in that directory for details.

Test results

Tested on AMD MI300X (gfx950), ROCm 7.1.1, both release and debug builds:

Build Unpatched Patched
Release (-O2) tracy-capture segfaults Capture succeeds, ~50 GPU zones
Debug (-g -O0) Assertion 'ctx' failed in ProcessGpuZoneBeginImplCommon No assertions, ~50 GPU zones
Context name N/A (crash) rocprofv3
Kernel names N/A (crash) Resolved (vectorAdd)

bmilanich and others added 3 commits April 15, 2026 15:56
Two issues prevented the rocprofiler GPU backend from working with
TRACY_ON_DEMAND:

1. GpuNewContext not deferred: When a Tracy client connects late (on-demand
   mode), it never receives the GPU context creation message because the
   GpuNewContext queue item was not buffered via DeferItem. This caused an
   assertion failure (ctx == nullptr) in the capture/profiler when
   processing GPU zone events. Add the same DeferItem pattern used by the
   CUDA backend.

2. Kernel symbols dropped before init: The data->init guard at the top of
   tool_callback_tracing_callback() blocked kernel symbol registrations
   (CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER) which happen at HIP init
   time, before any Tracy client connects. Move the init guard after the
   code_object block so symbols are always recorded, while dispatch and
   memory-copy events are still gated on initialization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minimal HIP program that demonstrates the assertion failure in
tracy-capture when connecting to a TRACY_ON_DEMAND + TRACY_ROCPROF
application. See examples/RocprofOnDemandRepro/README.md for details.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, a late-connecting client receives the deferred
GpuNewContext but not the GpuContextName, so the GPU context appears
unnamed in the profiler.

Add check_gpu_ctx_name tool to verify context names in captured traces.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@bmilanich bmilanich changed the title Fix rocprofiler on-demand profiling support TracyRocprof: fix on-demand profiling crash and missing context name Apr 16, 2026
@bmilanich bmilanich marked this pull request as ready for review April 16, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant