Commit 54da780
refactor(cute): C1.5 — delete Phase 4 + F.1 layer-LN bake plumbing
Per audit Finding 1 and the Q4 self-review, the F.1 layer-LN bake
machinery couldn't survive Qwen3.5's stride-4 layer pattern: Phase 4
in-place added mlp_out into residual_output, and the next layer
(linear-attn, every 4th layer) doesn't honor the F.1 skip-op — so its
input_layernorm re-applied LN over the pre-baked output, corrupting
the residual stream.
Resolution: per-layer input_layernorm at every decoder layer entry,
matching the unfused flow and every surveyed hybrid model
(Jamba, Zamba2, Qwen3-Next, Megatron hybrid). β-coop's output is now
(mlp_output, residual_output=residual_post_attn); layer N+1's
input_layernorm in Python does the residual+mlp accumulation.
Deletions:
- cute_phase_e_skip_input_layernorm op (_mlp_op.py)
- attach_input_layernorm + attach_next_input_layernorm methods
commented out (kept commented per feedback_comment_not_delete; C4
fully removes)
- _phase_e_skip_next_ln, _input_layernorm_module field inits
- Phase 4 ε epilogue from run_beta_coop_full body and from
_kernel_phase_0_to_4 JIT (~150 lines removed)
- run_beta_coop_full's next_input_layernorm_gamma, next_hidden_output,
emit_next_layernorm parameters
- attach loops in Qwen3_5Model.__init__
- skip-op call site in Qwen3_5DecoderLayer.forward — replaced with
unconditional self.input_layernorm(hidden_states, residual)
Cascade fixes (authorized in implementer dispatch):
- next_hidden_scratch allocation moved from attach_next_input_layernorm
to __init__ — β-lite (kept through C3) still references it
- _phase_e_attached gate at _backend.py:1147 rewired from
hasattr(_next_input_layernorm_module) to
(_phase_e_coop_kernel is not None or _mlp_fusion_bound)
- cute_phase_e_dispatch consume branch reads impl.mlp_output[:nat]
(was impl.next_hidden_scratch[:nat])
- _next_input_layernorm_module + _emit_next_layernorm field inits
KEPT as defensive defaults (β-lite reads via getattr-with-default)
Out of scope (kept untouched):
- β-lite launch site at _backend.py:1278+ (deletes in C3 with the
rest of β-lite)
- Standalone Phase 4 launcher (run_phase_4_only,
_jit_launch_phase_4_only, _kernel_phase_4_only) at
phase_e_kernel.py:2412-2683 — test-only / β-lite-style infra
- paged_attention_forward in kernel.py (C2 retires from decode)
L3 multi-layer test added at tests/v1/cute_paged/test_uber_kernel_multi_layer.py
with 5 source-text assertions covering the deletions and the
unconditional input_layernorm regime. Pytest: 7/7 PASS (2 C1 + 5 C1.5).
Validation:
- Live serve probe with CUTE_PHASE_E_FUSION=1: coherent reasoning
output; "The capital of France is" → " Paris, and Paris is located
in France, so Paris is" — math fix holds.
- gsm8k_eval_50 ≥90% gate DEFERRED to C2: throughput still collapsed
at ~0.7 tok/s by the paged_attention_forward + β-coop double-fire
Phase A+B+C. C2 retires paged_attention_forward from decode and
recovers throughput; gsm8k gate runs there.
Diff: 4 modified + 1 new file, -217 net lines.
Refs: docs/superpowers/specs/2026-04-25-uber-kernel-migration-design.md
docs/research/uber_kernel_migration/spec_audit_2026-04-25.md (Finding 1)
docs/research/uber_kernel_migration/q4_brainstorm_layer_LN_2026-04-25.md
memory:feedback_layer_output_contract
memory:feedback_comment_not_delete
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent a65bcef commit 54da780
5 files changed
Lines changed: 354 additions & 457 deletions
File tree
- tests/v1/cute_paged
- vllm
- nvllm/models
- v1/attention/backends/cute_paged
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
413 | 413 | | |
414 | 414 | | |
415 | 415 | | |
416 | | - | |
417 | | - | |
| 416 | + | |
418 | 417 | | |
419 | 418 | | |
420 | 419 | | |
421 | 420 | | |
422 | | - | |
423 | | - | |
424 | | - | |
425 | | - | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | | - | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | | - | |
434 | | - | |
435 | | - | |
436 | | - | |
437 | | - | |
438 | | - | |
439 | | - | |
440 | | - | |
441 | | - | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
442 | 426 | | |
443 | 427 | | |
444 | 428 | | |
| |||
624 | 608 | | |
625 | 609 | | |
626 | 610 | | |
627 | | - | |
628 | | - | |
629 | | - | |
630 | | - | |
631 | | - | |
632 | | - | |
633 | | - | |
634 | | - | |
635 | | - | |
636 | | - | |
637 | | - | |
638 | | - | |
639 | | - | |
640 | | - | |
641 | | - | |
642 | | - | |
643 | | - | |
644 | | - | |
645 | | - | |
646 | | - | |
647 | | - | |
648 | | - | |
649 | | - | |
650 | | - | |
651 | | - | |
652 | | - | |
653 | | - | |
654 | | - | |
655 | | - | |
656 | | - | |
657 | | - | |
658 | | - | |
659 | | - | |
660 | | - | |
661 | | - | |
662 | | - | |
663 | | - | |
664 | | - | |
665 | | - | |
666 | | - | |
667 | | - | |
668 | | - | |
669 | | - | |
670 | | - | |
671 | | - | |
672 | | - | |
673 | | - | |
674 | | - | |
675 | | - | |
676 | | - | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
677 | 619 | | |
678 | 620 | | |
679 | 621 | | |
| |||
0 commit comments