Commit 788697b
docs(uber-kernel): C2 diagnostic spec — β-coop vs legacy under PIECEWISE+graphs
Spec for a one-shot diagnostic harness that disambiguates the C2 migration's
PIECEWISE+graphs failure: is β-coop's kernel graph-replay-broken, or is the
consume-gate op pattern at fault? Implementation deferred to next session on
fresh branch diag/c2-beta-coop-vs-legacy.
Decisions captured (Option 1, shape (a), strategy (i), probe (P2)):
- Backend-only opaque ops modeled on cute_phase_e_dispatch (no framework op
signature change, no kernel work in the probe itself).
- Diagnose first, design second — the B-fix (514b88c, reverted) was already
shape (a) and broke under graphs; we need to know whether the kernel or the
op-pattern is the culprit before committing to a redesign.
- Probe = comparison + dump on divergence at qwen3_5.py:466-476 in dual-fire
under PIECEWISE+graphs; stashed companion eager-replay harness available
via CUTE_C2_DIAG_EAGER for if the primary probe is inconclusive.
- Sanity rung 0 (paged-only + NVFP4 + PIECEWISE+graphs) added to rule out
NVFP4+graphs as a confound before the main run.
Includes:
- 5 design sections (architecture, components, data flow, error handling,
testing) plus host-safety bounds (no flashinfer-autotune, no .item()
syncs, no infinite loops, no OOM, no driver-wedge paths).
- Container baseline snapshot at design time (PIECEWISE+graphs+dual-fire is
healthy in production right now → diagnostic premise is testable).
- Upstream-issue check (vllm-project/vllm vllm-project#35659, vllm-project#38208, vllm-project#37060) confirms
none match our kernel stack — symptom class differs (crash vs gibberish),
GEMM/attention backends differ.
- Open questions surfaced for probe-wiring time (CUTE_FUSION_DISABLE env
name verification, _fusion_bound semantic).
Refs:
docs/research/uber_kernel_migration/2026-04-26-consume-gate-dce-and-graph-capture.md
commit 514b88c (B-fix attempt, reverted in 3ffcf87)
memory: project_uber_kernel_migration, project_beta_coop_residual_solo_bug,
feedback_mutates_args_not_dce_safe, feedback_item_breaks_cuda_graphs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 90b06d5 commit 788697b
1 file changed
Lines changed: 429 additions & 0 deletions
0 commit comments