Commit a65bcef
fix(cute): C1 — β-coop and β-lite read residual_buf, not residual_output
β-coop's Phase 1C residual_in pointed at self.residual_output, which
paged_attention_forward had already filled with (h+r) + wo_out =
residual_post_attn. β-coop then re-added wo_out inside its own Phase 1C,
producing 2·wo_out + h + r — gibberish output cascading through 16
fused full-attn layers, observed as " 2 ".
Same alias existed in β-lite's residual_post_ln source (audit Finding 6;
β-lite never re-ran Phase C so the corruption only manifested when β-coop
fired, but β-lite was structurally on the same buggy path).
Fixed both call sites:
- vllm/v1/attention/backends/cute_paged/_backend.py:1175 (β-coop)
- vllm/v1/attention/backends/cute_paged/_backend.py:1268 (β-lite)
Both now read self.residual_buf — the post-input-LN residual mirrored
from qwen3_5.py:460 — matching the math the kernels expect.
L2 buffer-contracts test added at tests/v1/cute_paged/test_uber_kernel_buffer_contracts.py.
Pure source-text inspection via inspect.getsource on CutePagedAttentionImpl.forward;
catches the class structurally without requiring a GPU run.
Validation:
- Pre-fix pytest: 2 FAILED (test caught the bug)
- Post-fix pytest: 2 PASSED
- Live serve probe with CUTE_PHASE_E_FUSION=1 produced coherent reasoning
output (not pre-fix " 2 ..." gibberish).
gsm8k_eval_50 ≥90% gate DEFERRED to C2. At this commit's state β-coop and
paged_attention_forward both fire Phase A+B+C, costing ~+15 ms per
fused-full-attn layer × 16 layers ≈ 0.7 tok/s observed (predicted by
memory:project_phase_e_phantom_speedup). The 180s per-question timeout
in scripts/gsm8k_eval_50.py can't accommodate. C2 retires paged_attention_forward
from the decode path and recovers throughput; the gsm8k gate runs there.
Refs: docs/superpowers/specs/2026-04-25-uber-kernel-migration-design.md
docs/research/uber_kernel_migration/spec_audit_2026-04-25.md (Finding 6)
memory:project_phase_e_beta_math_bug
memory:project_phase_e_phantom_speedup
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 2b21f34 commit a65bcef
2 files changed
Lines changed: 50 additions & 2 deletions
File tree
- tests/v1/cute_paged
- vllm/v1/attention/backends/cute_paged
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1172 | 1172 | | |
1173 | 1173 | | |
1174 | 1174 | | |
1175 | | - | |
| 1175 | + | |
1176 | 1176 | | |
1177 | 1177 | | |
1178 | 1178 | | |
| |||
1265 | 1265 | | |
1266 | 1266 | | |
1267 | 1267 | | |
1268 | | - | |
| 1268 | + | |
1269 | 1269 | | |
1270 | 1270 | | |
1271 | 1271 | | |
| |||
0 commit comments