Skip to content

[CI] Fix Vulkan debugPrintf flake with session-scoped warmup#571

Merged
hughperkins merged 5 commits intomainfrom
turtle/vulkan-debug-warmup
Apr 27, 2026
Merged

[CI] Fix Vulkan debugPrintf flake with session-scoped warmup#571
hughperkins merged 5 commits intomainfrom
turtle/vulkan-debug-warmup

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

@hughperkins hughperkins commented Apr 26, 2026

The validation layer's debugPrintf callback has a race condition on the first kernel dispatch in a fresh process: vkQueueWaitIdle() can return before the callback fires, causing empty captured stdout. This only manifests under pytest-xdist with multiple workers contending for the GPU.

Run one throwaway overflow kernel, for each worker that runs vulkan tests with debug=True, to prime the callback pipeline before real tests run. Validated with 10 consecutive rounds passing on T4 CI runner, where previously ~60% of rounds failed.

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 349d291e6e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tests/python/conftest.py Outdated
Comment on lines +89 to +92
_warmup()
qd.sync()
sys.stdout.flush()
_vulkan_debug_warmed_up = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move warmup dispatch out of per-test capture window

This warmup runs during wanted_arch setup and intentionally triggers an overflow print, but the race described in the comment means that print callback can arrive after qd.sync(). When that delayed output lands during the first test’s call phase, stdout assertions become flaky (for example, tests that assert no overflow text or strict print ordering). Because _vulkan_debug_warmed_up is set immediately, the fixture will not retry or drain that delayed output, so the flake can be reintroduced in a new form.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, well-scoped CI flake fix in test conftest only.

Extended reasoning...

Overview

This PR adds a session-scoped warmup to tests/python/conftest.py that runs one throwaway kernel when Vulkan debug mode is requested, to prime the validation layer's debugPrintf callback pipeline and avoid a race in vkQueueWaitIdle() on the first dispatch. The change is gated by req_arch == qd.vulkan and req_options.get('debug', False), and uses a module-level flag to ensure it runs at most once per worker process.

Security risks

None. The change is confined to test infrastructure and only executes during pytest runs against Vulkan with debug enabled. No production code paths, auth, or data handling are affected.

Level of scrutiny

Low. This is a test-only fix in a single file, the implementation is straightforward (a guard flag plus a trivial warmup kernel), and the author validated the fix empirically with 15 consecutive rounds (975 tests) on the T4 CI runner. The bug hunting system reported no issues.

Other factors

The warmup kernel is intentionally simple (i8(64) + i8(64) overflow) and the comment clearly explains the purpose. The use of a module-level global flag instead of a session-scoped fixture is a minor stylistic choice but works correctly given xdist's per-worker process model.

@github-actions
Copy link
Copy Markdown

Coverage Report (13c8f78af)

Metric Value
Diff coverage (changed lines only) 22%
Overall project coverage 73%

Total: 9 lines, 7 missing, 22% covered

🔴 tests/python/conftest.py (22%)
🟢   56  _vulkan_debug_warmed_up = False
     57  
     58  
     61      global _vulkan_debug_warmed_up
🟢   80          if req_arch == qd.vulkan and req_options.get("debug", False) and not _vulkan_debug_warmed_up:
     81              # Prime the Vulkan debugPrintf callback pipeline. The validation
     82              # layer has a race where vkQueueWaitIdle() returns before the
     83              # callback fires on the first dispatch. One throwaway kernel
     84              # ensures subsequent dispatches deliver callbacks reliably.
🔴   85              @qd.kernel
🔴   86              def _warmup() -> qd.i8:
🔴   87                  return qd.i8(64) + qd.i8(64)
     88  
🔴   89              _warmup()
🔴   90              qd.sync()
🔴   91              sys.stdout.flush()
🔴   92              _vulkan_debug_warmed_up = True

@hughperkins hughperkins force-pushed the turtle/vulkan-debug-warmup branch from 349d291 to 2a3f431 Compare April 26, 2026 20:03
@hughperkins
Copy link
Copy Markdown
Collaborator Author

@claude review

Comment thread tests/python/conftest.py Outdated
Comment thread tests/python/conftest.py Outdated
The validation layer's debugPrintf callback has a race condition on the
first kernel dispatch in a fresh process: vkQueueWaitIdle() can return
before the callback fires, causing empty captured stdout. This only
manifests under pytest-xdist with multiple workers contending for the GPU.

Add a session-scoped fixture that runs one throwaway overflow kernel at
worker startup to prime the callback pipeline before real tests run.
Validated with 15 consecutive rounds (975 tests) passing on T4 CI runner,
where previously ~60% of rounds failed.
@hughperkins hughperkins force-pushed the turtle/vulkan-debug-warmup branch from 2a3f431 to 3215b46 Compare April 26, 2026 20:39
@hughperkins
Copy link
Copy Markdown
Collaborator Author

@claude review

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed both of my prior concerns — arch check now uses test_utils.expected_archs() and try/except/finally wraps the full warmup with qd.reset() in finally. Test-only change, looks good.

Extended reasoning...

Overview

PR adds a single session-scoped autouse fixture _vulkan_debug_warmup to tests/python/conftest.py that primes the Vulkan debugPrintf callback pipeline once per worker process. This is purely test infrastructure to address a CI flake under pytest-xdist. The change has been iterated based on review feedback from both the codex bot and my own prior comments.

Security risks

None. The change only modifies pytest test setup; no production code paths, auth, crypto, network, or data handling are touched.

Level of scrutiny

Low. This is CI/test-only code with no production impact. The author has empirically validated the fix (10 consecutive rounds passing on T4 CI runner where ~60% previously failed). Failure modes are fail-soft — exceptions during warmup are swallowed and qd.reset() always runs in finally.

Other factors

Both of my prior review comments were addressed in this revision:

  • The substring-based arch check was replaced with qd.vulkan not in test_utils.expected_archs(), which correctly handles the ^ exclusion prefix from run_tests.py --exclusive.
  • The try/except now wraps the full warmup (kernel decoration, dispatch, sync) and qd.reset() is in a finally block, preventing a session-scoped fixture exception from erroring every test on the worker.

The codex bot's P1 concern about the warmup running within a per-test capture window was also addressed by making the fixture scope="session" rather than running inside wanted_arch. Coverage is low (22%) but expected — these lines only execute on Vulkan-capable runners with debug=True.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both issues from my prior review (broken QD_WANTED_ARCHS parsing and unguarded post-init dispatch) have been addressed cleanly.

Extended reasoning...

Overview

PR #571 adds a session-scoped autouse pytest fixture in tests/python/conftest.py that primes the Vulkan debugPrintf callback pipeline once per worker process. Scope is limited to test infrastructure — no production code paths touched.

Security risks

None. The change runs only in pytest sessions, gated to Vulkan-enabled test runs, and performs a constant-arithmetic kernel dispatch before reset.

Level of scrutiny

Low — this is test-only infrastructure addressing a documented CI flake. The fixture self-isolates via qd.reset() in a guarded finally block, and expected_archs() is the same canonical resolver used elsewhere in the test suite.

Other factors

Both concerns from my prior review have been addressed in commit 13c8f78: (1) the arch detection now correctly delegates to test_utils.expected_archs(), properly handling the ^ exclusion prefix; (2) the try/except now covers the full init/dispatch/sync block with qd.reset() in a guarded finally, preserving fail-soft intent. The remaining P1 from the codex bot (delayed callback bleeding into the first test) is a theoretical concern but the author has empirically validated 10 consecutive rounds passing on the T4 CI runner where ~60% previously failed, which is strong evidence the warmup fully drains before tests start.

@github-actions
Copy link
Copy Markdown

Coverage Report (d27666434)

Metric Value
Diff coverage (changed lines only) 26%
Overall project coverage 73%

Total: 19 lines, 14 missing, 26% covered

🔴 tests/python/conftest.py (26%)
🟢   56  @pytest.fixture(scope="session", autouse=True)
🟢   57  def _vulkan_debug_warmup():
     58      """Prime the Vulkan debugPrintf callback pipeline once per worker process.
     59  
     60      The validation layer's debugPrintf callback delivery has a race condition on the first kernel dispatch in a new
     61      process: vkQueueWaitIdle() can return before the callback fires. A full init/dispatch/sync/reset cycle here warms
     62      the driver-level debug infrastructure so subsequent inits work reliably.
     63      """
🟢   64      from tests import test_utils
     65  
🟢   66      if qd.vulkan not in test_utils.expected_archs():
🟢   67          return
     68  
🔴   69      try:
🔴   70          qd.init(arch=qd.vulkan, debug=True, enable_fallback=False, print_full_traceback=True)
     71  
🔴   72          @qd.kernel
🔴   73          def _warmup() -> qd.i8:
🔴   74              return qd.i8(64) + qd.i8(64)
     75  
🔴   76          _warmup()
🔴   77          qd.sync()
🔴   78          sys.stdout.flush()
🔴   79      except Exception:
🔴   80          pass
     81      finally:
🔴   82          try:
🔴   83              qd.reset()
🔴   84          except Exception:
🔴   85              pass
     86  
     87  

The debugPrintf warmup fixture hangs on Mac CI runners because
MoltenVK does not support the validation-layer debugPrintf pipeline.
The warmup is only needed on Linux Vulkan drivers.

Made-with: Cursor
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior feedback addressed (proper expected_archs() usage, full fail-soft try/finally with reset, macOS skip).

Extended reasoning...

Overview

Single-file change to tests/python/conftest.py adding a session-scoped autouse pytest fixture that runs one throwaway Vulkan kernel at worker startup to prime the validation layer's debugPrintf callback pipeline. The PR targets a CI flake observed under pytest-xdist and is purely test-infrastructure code — no production paths touched.

Security risks

None. The fixture only initializes the Vulkan backend with debug=True, dispatches a constant-arithmetic kernel, syncs, and resets. No external input, no I/O beyond stdout flush, no privilege boundaries.

Level of scrutiny

Low. This is a CI-only test fixture. The fixture is opt-in by environment (only runs when Vulkan is in expected_archs() and not on macOS) and fully fail-soft (entire body wrapped in try/except with qd.reset() in finally). Worst case if anything goes wrong is the warmup is skipped and the original flake recurs — same state as before this PR.

Other factors

My two prior review comments on this PR are both resolved:

  • The naive substring check on QD_WANTED_ARCHS was replaced with the canonical qd.vulkan not in test_utils.expected_archs(), which correctly handles the ^ exclusion-prefix grammar.
  • The try/except scope was extended to cover the kernel decoration, dispatch, and sync, with qd.reset() placed in a finally block — preserving the author's fail-soft intent across the full warmup path.

The codex-bot's concern about delayed callbacks landing in per-test capture windows is mitigated by the session-scoped placement (runs once before any test) plus the explicit qd.sync() + sys.stdout.flush(). The author validated 10 consecutive rounds passing on the T4 runner. The only remaining risk surface is whether the fixture sufficiently primes the pipeline on all Vulkan stacks, which is empirically verifiable and self-healing if it doesn't (qd.reset() is called in finally).

@github-actions
Copy link
Copy Markdown

Coverage Report (d676d5cef)

Metric Value
Diff coverage (changed lines only) 29%
Overall project coverage 73%

Total: 21 lines, 15 missing, 29% covered

🔴 tests/python/conftest.py (29%)
🟢   56  @pytest.fixture(scope="session", autouse=True)
🟢   57  def _vulkan_debug_warmup():
     58      """Prime the Vulkan debugPrintf callback pipeline once per worker process.
     59  
     60      The validation layer's debugPrintf callback delivery has a race condition on the first kernel dispatch in a new
     61      process: vkQueueWaitIdle() can return before the callback fires. A full init/dispatch/sync/reset cycle here warms
     62      the driver-level debug infrastructure so subsequent inits work reliably.
     63      """
🟢   64      if sys.platform == "darwin":
🔴   65          return
     66  
🟢   67      from tests import test_utils
     68  
🟢   69      if qd.vulkan not in test_utils.expected_archs():
🟢   70          return
     71  
🔴   72      try:
🔴   73          qd.init(arch=qd.vulkan, debug=True, enable_fallback=False, print_full_traceback=True)
     74  
🔴   75          @qd.kernel
🔴   76          def _warmup() -> qd.i8:
🔴   77              return qd.i8(64) + qd.i8(64)
     78  
🔴   79          _warmup()
🔴   80          qd.sync()
🔴   81          sys.stdout.flush()
🔴   82      except Exception:
🔴   83          pass
     84      finally:
🔴   85          try:
🔴   86              qd.reset()
🔴   87          except Exception:
🔴   88              pass
     89  
     90  

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both my prior concerns (arch-grammar parsing and exception scope) are addressed; small test-only conftest change, LGTM.

Extended reasoning...

Overview

The PR adds a single session-scoped autouse pytest fixture in tests/python/conftest.py that primes the Vulkan validation layer's debugPrintf callback once per worker before real tests run, working around a first-dispatch race in vkQueueWaitIdle(). Latest revisions also skip the warmup on macOS (where Vulkan tests aren't run) and use test_utils.expected_archs() to decide when to run.

Security risks

None. The change is confined to test infrastructure (CI), runs only on the test workers, and does not touch auth, crypto, IO boundaries, or any production code paths.

Level of scrutiny

Low. This is a CI-only fix in test bootstrap code, the modification is localized to one file, and the author has provided empirical validation (10 consecutive rounds passing on T4 vs ~60% prior failure rate). Risk of regression in the actual product is essentially zero.

Other factors

My two earlier review comments (^-prefix exclusion grammar in QD_WANTED_ARCHS and the unguarded post-init dispatch) have both been resolved in the current revision: the fixture now delegates to expected_archs() for the inclusion check and wraps the entire init+dispatch+sync block in try/except with qd.reset() in a finally clause. The remaining codex-bot concern about delayed callbacks bleeding into the first real test is mitigated by the session-scoped placement (it runs before any test, not in the per-test capture window) and the explicit sys.stdout.flush(). The author has been responsive to feedback across multiple revisions.

@hughperkins hughperkins merged commit d8ca127 into main Apr 27, 2026
11 checks passed
@hughperkins hughperkins deleted the turtle/vulkan-debug-warmup branch April 27, 2026 07:06
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both concerns from my prior review have been addressed (proper QD_WANTED_ARCHS parsing via expected_archs(), and try/except now wrapping the full init+dispatch with qd.reset() in a finally). Test-only change, fail-soft, scope is small.

Extended reasoning...

Overview

The PR adds a session-scoped autouse pytest fixture in tests/python/conftest.py that primes the Vulkan debugPrintf callback pipeline once per worker process to fix a CI flake under pytest-xdist. Single file, ~30 lines, test infrastructure only.

Security risks

None. This is test infrastructure that runs only during pytest setup. No production code paths, no user input handling, no auth/crypto/permissions surface.

Level of scrutiny

Low scrutiny is appropriate. This is pytest conftest plumbing for CI flake mitigation, gated by an arch check and platform check, and it fail-softs on every error path. The blast radius is limited to test session setup.

Other factors

My prior review (claude[bot] at 2026-04-26T20:16:12Z) raised two issues which the author addressed in the latest revisions:

  • The naive substring check on QD_WANTED_ARCHS has been replaced with qd.vulkan not in test_utils.expected_archs(), which is the canonical parser and correctly handles the ^ exclusion prefix.
  • The try/except previously only wrapped qd.init(); it now wraps the full init+kernel+_warmup+sync block, with qd.reset() in a finally clause guarded by its own try/except. This matches the fail-soft intent.

A subsequent commit (32cb6b1) also added an early return when sys.platform == 'darwin' to fix a separate CI hang on macOS.

The codex bot raised a P1 concern about delayed callback output bleeding into the first test's capture window, but the warmup includes qd.sync() followed by qd.reset() (which tears down the Vulkan context entirely) before any test runs, so any pending callback should be drained or discarded. The premise of the PR is precisely that this warmup pass forces the callback pipeline to settle, after which subsequent dispatches are reliable.

@github-actions
Copy link
Copy Markdown

Coverage Report (58bfd049a)

Metric Value
Diff coverage (changed lines only) 0%
Overall project coverage 73%

Total: 0 lines, 0 missing, 0% covered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant