[CI] Fix Vulkan debugPrintf flake with session-scoped warmup#571
[CI] Fix Vulkan debugPrintf flake with session-scoped warmup#571hughperkins merged 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 349d291e6e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| _warmup() | ||
| qd.sync() | ||
| sys.stdout.flush() | ||
| _vulkan_debug_warmed_up = True |
There was a problem hiding this comment.
Move warmup dispatch out of per-test capture window
This warmup runs during wanted_arch setup and intentionally triggers an overflow print, but the race described in the comment means that print callback can arrive after qd.sync(). When that delayed output lands during the first test’s call phase, stdout assertions become flaky (for example, tests that assert no overflow text or strict print ordering). Because _vulkan_debug_warmed_up is set immediately, the fixture will not retry or drain that delayed output, so the flake can be reintroduced in a new form.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
LGTM — small, well-scoped CI flake fix in test conftest only.
Extended reasoning...
Overview
This PR adds a session-scoped warmup to tests/python/conftest.py that runs one throwaway kernel when Vulkan debug mode is requested, to prime the validation layer's debugPrintf callback pipeline and avoid a race in vkQueueWaitIdle() on the first dispatch. The change is gated by req_arch == qd.vulkan and req_options.get('debug', False), and uses a module-level flag to ensure it runs at most once per worker process.
Security risks
None. The change is confined to test infrastructure and only executes during pytest runs against Vulkan with debug enabled. No production code paths, auth, or data handling are affected.
Level of scrutiny
Low. This is a test-only fix in a single file, the implementation is straightforward (a guard flag plus a trivial warmup kernel), and the author validated the fix empirically with 15 consecutive rounds (975 tests) on the T4 CI runner. The bug hunting system reported no issues.
Other factors
The warmup kernel is intentionally simple (i8(64) + i8(64) overflow) and the comment clearly explains the purpose. The use of a module-level global flag instead of a session-scoped fixture is a minor stylistic choice but works correctly given xdist's per-worker process model.
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 22% |
| Overall project coverage | 73% |
Total: 9 lines, 7 missing, 22% covered
🔴 tests/python/conftest.py (22%)
🟢 56 _vulkan_debug_warmed_up = False
57
58
61 global _vulkan_debug_warmed_up
🟢 80 if req_arch == qd.vulkan and req_options.get("debug", False) and not _vulkan_debug_warmed_up:
81 # Prime the Vulkan debugPrintf callback pipeline. The validation
82 # layer has a race where vkQueueWaitIdle() returns before the
83 # callback fires on the first dispatch. One throwaway kernel
84 # ensures subsequent dispatches deliver callbacks reliably.
🔴 85 @qd.kernel
🔴 86 def _warmup() -> qd.i8:
🔴 87 return qd.i8(64) + qd.i8(64)
88
🔴 89 _warmup()
🔴 90 qd.sync()
🔴 91 sys.stdout.flush()
🔴 92 _vulkan_debug_warmed_up = True
349d291 to
2a3f431
Compare
|
@claude review |
The validation layer's debugPrintf callback has a race condition on the first kernel dispatch in a fresh process: vkQueueWaitIdle() can return before the callback fires, causing empty captured stdout. This only manifests under pytest-xdist with multiple workers contending for the GPU. Add a session-scoped fixture that runs one throwaway overflow kernel at worker startup to prime the callback pipeline before real tests run. Validated with 15 consecutive rounds (975 tests) passing on T4 CI runner, where previously ~60% of rounds failed.
2a3f431 to
3215b46
Compare
|
@claude review |
There was a problem hiding this comment.
Addressed both of my prior concerns — arch check now uses test_utils.expected_archs() and try/except/finally wraps the full warmup with qd.reset() in finally. Test-only change, looks good.
Extended reasoning...
Overview
PR adds a single session-scoped autouse fixture _vulkan_debug_warmup to tests/python/conftest.py that primes the Vulkan debugPrintf callback pipeline once per worker process. This is purely test infrastructure to address a CI flake under pytest-xdist. The change has been iterated based on review feedback from both the codex bot and my own prior comments.
Security risks
None. The change only modifies pytest test setup; no production code paths, auth, crypto, network, or data handling are touched.
Level of scrutiny
Low. This is CI/test-only code with no production impact. The author has empirically validated the fix (10 consecutive rounds passing on T4 CI runner where ~60% previously failed). Failure modes are fail-soft — exceptions during warmup are swallowed and qd.reset() always runs in finally.
Other factors
Both of my prior review comments were addressed in this revision:
- The substring-based arch check was replaced with
qd.vulkan not in test_utils.expected_archs(), which correctly handles the^exclusion prefix fromrun_tests.py --exclusive. - The
try/exceptnow wraps the full warmup (kernel decoration, dispatch, sync) andqd.reset()is in afinallyblock, preventing a session-scoped fixture exception from erroring every test on the worker.
The codex bot's P1 concern about the warmup running within a per-test capture window was also addressed by making the fixture scope="session" rather than running inside wanted_arch. Coverage is low (22%) but expected — these lines only execute on Vulkan-capable runners with debug=True.
Made-with: Cursor
There was a problem hiding this comment.
Both issues from my prior review (broken QD_WANTED_ARCHS parsing and unguarded post-init dispatch) have been addressed cleanly.
Extended reasoning...
Overview
PR #571 adds a session-scoped autouse pytest fixture in tests/python/conftest.py that primes the Vulkan debugPrintf callback pipeline once per worker process. Scope is limited to test infrastructure — no production code paths touched.
Security risks
None. The change runs only in pytest sessions, gated to Vulkan-enabled test runs, and performs a constant-arithmetic kernel dispatch before reset.
Level of scrutiny
Low — this is test-only infrastructure addressing a documented CI flake. The fixture self-isolates via qd.reset() in a guarded finally block, and expected_archs() is the same canonical resolver used elsewhere in the test suite.
Other factors
Both concerns from my prior review have been addressed in commit 13c8f78: (1) the arch detection now correctly delegates to test_utils.expected_archs(), properly handling the ^ exclusion prefix; (2) the try/except now covers the full init/dispatch/sync block with qd.reset() in a guarded finally, preserving fail-soft intent. The remaining P1 from the codex bot (delayed callback bleeding into the first test) is a theoretical concern but the author has empirically validated 10 consecutive rounds passing on the T4 CI runner where ~60% previously failed, which is strong evidence the warmup fully drains before tests start.
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 26% |
| Overall project coverage | 73% |
Total: 19 lines, 14 missing, 26% covered
🔴 tests/python/conftest.py (26%)
🟢 56 @pytest.fixture(scope="session", autouse=True)
🟢 57 def _vulkan_debug_warmup():
58 """Prime the Vulkan debugPrintf callback pipeline once per worker process.
59
60 The validation layer's debugPrintf callback delivery has a race condition on the first kernel dispatch in a new
61 process: vkQueueWaitIdle() can return before the callback fires. A full init/dispatch/sync/reset cycle here warms
62 the driver-level debug infrastructure so subsequent inits work reliably.
63 """
🟢 64 from tests import test_utils
65
🟢 66 if qd.vulkan not in test_utils.expected_archs():
🟢 67 return
68
🔴 69 try:
🔴 70 qd.init(arch=qd.vulkan, debug=True, enable_fallback=False, print_full_traceback=True)
71
🔴 72 @qd.kernel
🔴 73 def _warmup() -> qd.i8:
🔴 74 return qd.i8(64) + qd.i8(64)
75
🔴 76 _warmup()
🔴 77 qd.sync()
🔴 78 sys.stdout.flush()
🔴 79 except Exception:
🔴 80 pass
81 finally:
🔴 82 try:
🔴 83 qd.reset()
🔴 84 except Exception:
🔴 85 pass
86
87
The debugPrintf warmup fixture hangs on Mac CI runners because MoltenVK does not support the validation-layer debugPrintf pipeline. The warmup is only needed on Linux Vulkan drivers. Made-with: Cursor
There was a problem hiding this comment.
Prior feedback addressed (proper expected_archs() usage, full fail-soft try/finally with reset, macOS skip).
Extended reasoning...
Overview
Single-file change to tests/python/conftest.py adding a session-scoped autouse pytest fixture that runs one throwaway Vulkan kernel at worker startup to prime the validation layer's debugPrintf callback pipeline. The PR targets a CI flake observed under pytest-xdist and is purely test-infrastructure code — no production paths touched.
Security risks
None. The fixture only initializes the Vulkan backend with debug=True, dispatches a constant-arithmetic kernel, syncs, and resets. No external input, no I/O beyond stdout flush, no privilege boundaries.
Level of scrutiny
Low. This is a CI-only test fixture. The fixture is opt-in by environment (only runs when Vulkan is in expected_archs() and not on macOS) and fully fail-soft (entire body wrapped in try/except with qd.reset() in finally). Worst case if anything goes wrong is the warmup is skipped and the original flake recurs — same state as before this PR.
Other factors
My two prior review comments on this PR are both resolved:
- The naive substring check on QD_WANTED_ARCHS was replaced with the canonical
qd.vulkan not in test_utils.expected_archs(), which correctly handles the^exclusion-prefix grammar. - The try/except scope was extended to cover the kernel decoration, dispatch, and sync, with
qd.reset()placed in afinallyblock — preserving the author's fail-soft intent across the full warmup path.
The codex-bot's concern about delayed callbacks landing in per-test capture windows is mitigated by the session-scoped placement (runs once before any test) plus the explicit qd.sync() + sys.stdout.flush(). The author validated 10 consecutive rounds passing on the T4 runner. The only remaining risk surface is whether the fixture sufficiently primes the pipeline on all Vulkan stacks, which is empirically verifiable and self-healing if it doesn't (qd.reset() is called in finally).
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 29% |
| Overall project coverage | 73% |
Total: 21 lines, 15 missing, 29% covered
🔴 tests/python/conftest.py (29%)
🟢 56 @pytest.fixture(scope="session", autouse=True)
🟢 57 def _vulkan_debug_warmup():
58 """Prime the Vulkan debugPrintf callback pipeline once per worker process.
59
60 The validation layer's debugPrintf callback delivery has a race condition on the first kernel dispatch in a new
61 process: vkQueueWaitIdle() can return before the callback fires. A full init/dispatch/sync/reset cycle here warms
62 the driver-level debug infrastructure so subsequent inits work reliably.
63 """
🟢 64 if sys.platform == "darwin":
🔴 65 return
66
🟢 67 from tests import test_utils
68
🟢 69 if qd.vulkan not in test_utils.expected_archs():
🟢 70 return
71
🔴 72 try:
🔴 73 qd.init(arch=qd.vulkan, debug=True, enable_fallback=False, print_full_traceback=True)
74
🔴 75 @qd.kernel
🔴 76 def _warmup() -> qd.i8:
🔴 77 return qd.i8(64) + qd.i8(64)
78
🔴 79 _warmup()
🔴 80 qd.sync()
🔴 81 sys.stdout.flush()
🔴 82 except Exception:
🔴 83 pass
84 finally:
🔴 85 try:
🔴 86 qd.reset()
🔴 87 except Exception:
🔴 88 pass
89
90
There was a problem hiding this comment.
Both my prior concerns (arch-grammar parsing and exception scope) are addressed; small test-only conftest change, LGTM.
Extended reasoning...
Overview
The PR adds a single session-scoped autouse pytest fixture in tests/python/conftest.py that primes the Vulkan validation layer's debugPrintf callback once per worker before real tests run, working around a first-dispatch race in vkQueueWaitIdle(). Latest revisions also skip the warmup on macOS (where Vulkan tests aren't run) and use test_utils.expected_archs() to decide when to run.
Security risks
None. The change is confined to test infrastructure (CI), runs only on the test workers, and does not touch auth, crypto, IO boundaries, or any production code paths.
Level of scrutiny
Low. This is a CI-only fix in test bootstrap code, the modification is localized to one file, and the author has provided empirical validation (10 consecutive rounds passing on T4 vs ~60% prior failure rate). Risk of regression in the actual product is essentially zero.
Other factors
My two earlier review comments (^-prefix exclusion grammar in QD_WANTED_ARCHS and the unguarded post-init dispatch) have both been resolved in the current revision: the fixture now delegates to expected_archs() for the inclusion check and wraps the entire init+dispatch+sync block in try/except with qd.reset() in a finally clause. The remaining codex-bot concern about delayed callbacks bleeding into the first real test is mitigated by the session-scoped placement (it runs before any test, not in the per-test capture window) and the explicit sys.stdout.flush(). The author has been responsive to feedback across multiple revisions.
There was a problem hiding this comment.
Both concerns from my prior review have been addressed (proper QD_WANTED_ARCHS parsing via expected_archs(), and try/except now wrapping the full init+dispatch with qd.reset() in a finally). Test-only change, fail-soft, scope is small.
Extended reasoning...
Overview
The PR adds a session-scoped autouse pytest fixture in tests/python/conftest.py that primes the Vulkan debugPrintf callback pipeline once per worker process to fix a CI flake under pytest-xdist. Single file, ~30 lines, test infrastructure only.
Security risks
None. This is test infrastructure that runs only during pytest setup. No production code paths, no user input handling, no auth/crypto/permissions surface.
Level of scrutiny
Low scrutiny is appropriate. This is pytest conftest plumbing for CI flake mitigation, gated by an arch check and platform check, and it fail-softs on every error path. The blast radius is limited to test session setup.
Other factors
My prior review (claude[bot] at 2026-04-26T20:16:12Z) raised two issues which the author addressed in the latest revisions:
- The naive substring check on QD_WANTED_ARCHS has been replaced with qd.vulkan not in test_utils.expected_archs(), which is the canonical parser and correctly handles the ^ exclusion prefix.
- The try/except previously only wrapped qd.init(); it now wraps the full init+kernel+_warmup+sync block, with qd.reset() in a finally clause guarded by its own try/except. This matches the fail-soft intent.
A subsequent commit (32cb6b1) also added an early return when sys.platform == 'darwin' to fix a separate CI hang on macOS.
The codex bot raised a P1 concern about delayed callback output bleeding into the first test's capture window, but the warmup includes qd.sync() followed by qd.reset() (which tears down the Vulkan context entirely) before any test runs, so any pending callback should be drained or discarded. The premise of the PR is precisely that this warmup pass forces the callback pipeline to settle, after which subsequent dispatches are reliable.
Coverage Report (
|
| Metric | Value |
|---|---|
| Diff coverage (changed lines only) | 0% |
| Overall project coverage | 73% |
Total: 0 lines, 0 missing, 0% covered
The validation layer's debugPrintf callback has a race condition on the first kernel dispatch in a fresh process: vkQueueWaitIdle() can return before the callback fires, causing empty captured stdout. This only manifests under pytest-xdist with multiple workers contending for the GPU.
Run one throwaway overflow kernel, for each worker that runs vulkan tests with debug=True, to prime the callback pipeline before real tests run. Validated with 10 consecutive rounds passing on T4 CI runner, where previously ~60% of rounds failed.
Issue: #
Brief Summary
copilot:summary
Walkthrough
copilot:walkthrough