TracyRocprof: fix on-demand profiling crash and missing context name#1336
Open
bmilanich wants to merge 3 commits intowolfpld:masterfrom
Open
TracyRocprof: fix on-demand profiling crash and missing context name#1336bmilanich wants to merge 3 commits intowolfpld:masterfrom
bmilanich wants to merge 3 commits intowolfpld:masterfrom
Conversation
Two issues prevented the rocprofiler GPU backend from working with TRACY_ON_DEMAND: 1. GpuNewContext not deferred: When a Tracy client connects late (on-demand mode), it never receives the GPU context creation message because the GpuNewContext queue item was not buffered via DeferItem. This caused an assertion failure (ctx == nullptr) in the capture/profiler when processing GPU zone events. Add the same DeferItem pattern used by the CUDA backend. 2. Kernel symbols dropped before init: The data->init guard at the top of tool_callback_tracing_callback() blocked kernel symbol registrations (CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER) which happen at HIP init time, before any Tracy client connects. Move the init guard after the code_object block so symbols are always recorded, while dispatch and memory-copy events are still gated on initialization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Minimal HIP program that demonstrates the assertion failure in tracy-capture when connecting to a TRACY_ON_DEMAND + TRACY_ROCPROF application. See examples/RocprofOnDemandRepro/README.md for details. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Without this, a late-connecting client receives the deferred GpuNewContext but not the GpuContextName, so the GPU context appears unnamed in the profiler. Add check_gpu_ctx_name tool to verify context names in captured traces. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The rocprofiler GPU backend crashes when a profiler connects to an application built with
TRACY_ON_DEMAND:Even if the crash is worked around, the GPU context appears unnamed and kernel names are missing.
Root cause
gpu_context_allocate()writesGpuNewContextandGpuContextNamequeue items but never callsDeferItem()for either. Under on-demand mode, a late-connecting client never receives these messages, so it has no GPU context when GPU zone events start arriving.Separately,
tool_callback_tracing_callback()gates all callbacks ondata->init, which is only set after the calibration thread allocates the GPU context. Kernel symbol registrations (CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER) happen at HIP init time, well beforedata->initis set, so they are silently dropped. This was a regression introduced in 86de397 ("Add calibration thread") — the earlierdelay_init()approach in 98047ff had the guard placed after the code_object block, so symbols were always recorded.Fix
DeferItem()calls for bothGpuNewContextandGpuContextNameunder#ifdef TRACY_ON_DEMAND, replicating the pattern already used by the CUDA backend (TracyCUDA.hppSubmitQueueItem).data->initguard after the code object registration block, restoring the pre-86de3970 behavior so kernel symbols are always recorded.Repro case
examples/RocprofOnDemandRepro/contains a minimal HIP program and acheck_gpu_ctx_nametool. See the README in that directory for details.Test results
Tested on AMD MI300X (gfx950), ROCm 7.1.1, both release and debug builds:
-O2)tracy-capturesegfaults-g -O0)Assertion 'ctx' failedinProcessGpuZoneBeginImplCommonrocprofv3vectorAdd)