Commit 7d429f1

and

committed

diag(c2): β-coop-vs-legacy diagnostic harness (env-gated, halt-on-divergence)

Adds CUTE_C2_DIAG=1 probe that compares β-coop's outputs against the legacy post-attn-LN outputs in dual-fire under PIECEWISE+graphs. New module vllm/v1/attention/backends/cute_paged/_c2_diag.py with 17 unit tests; call site env-gated in vllm/nvllm/models/qwen3_5.py; serve-cute.sh plumbs env vars + /tmp/c2_diag mount across the EngineCore subprocess boundary. Architectural limit found and documented: under PIECEWISE+graphs the op's Python body executes only at capture (where it skips to avoid cudaErrorStreamCaptureInvalidated), never during decode replay — the diag cannot observe steady-state β-coop. See docs/research/uber_kernel_migration/2026-04-27-c2-diagnostic-results.md for full verdict + decision to proceed with CUTE_DUMP_TENSORS-based forensics instead. Plumbing wins kept (reusable for future fused-kernel diagnostics): - vLLM EngineCore env stripping workaround (/tmp/c2_diag/ENV file) - direct_register_custom_op pattern for fullgraph compatibility - prefill-skip + capture-skip runtime guards in op impl - os.getenv(name) or default — set-but-empty trap Production behavior unchanged when CUTE_C2_DIAG is unset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

1 parent 788697b commit 7d429f1Copy full SHA for 7d429f1

6 files changed

docs/research/uber_kernel_migration
- 2026-04-26-c2-diagnostic-plan.md
- 2026-04-27-c2-diagnostic-results.md
scripts
- serve-cute.sh
tests/v1/cute_paged
- test_c2_diag.py
vllm
- nvllm/models
  - qwen3_5.py
- v1/attention/backends/cute_paged
  - _c2_diag.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 7d429f1

File tree

0 commit comments