Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
(EngineCore pid=144) INFO 04-17 17:23:02 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.3.self_attn.attn nat=1 phaseB ref: absmax=220.6929 mean=5.4770e-02 kernel: absmax=220.6929 mean=5.4770e-02 diff: max=0.0000 mean=2.0900e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:02 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.3.self_attn.attn nat=1 phaseC hidden_ref_absmax=0.0000 hidden_kernel_absmax=0.0000 h_max_diff=0.0000 res_ref_absmax=170141183460469231731687303715884105728.0000 res_kernel_absmax=170141183460469231731687303715884105728.0000 r_max_diff=224.0000 close_h=True close_r=True
(EngineCore pid=144) INFO 04-17 17:23:02 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.7.self_attn.attn nat=1 phaseB ref: absmax=50.6131 mean=1.1355e-02 kernel: absmax=50.6131 mean=1.1355e-02 diff: max=0.0000 mean=1.4925e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:02 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.7.self_attn.attn nat=1 phaseC hidden_ref_absmax=65.5958 hidden_kernel_absmax=65.5000 h_max_diff=0.0958 res_ref_absmax=50.6131 res_kernel_absmax=50.5000 r_max_diff=0.1131 close_h=True close_r=True
(EngineCore pid=144) INFO 04-17 17:23:02 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.11.self_attn.attn nat=1 phaseB ref: absmax=39.3754 mean=4.9943e-04 kernel: absmax=39.3754 mean=4.9943e-04 diff: max=0.0000 mean=2.1335e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:02 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.11.self_attn.attn nat=1 phaseC hidden_ref_absmax=53.2364 hidden_kernel_absmax=53.2500 h_max_diff=0.0136 res_ref_absmax=39.3754 res_kernel_absmax=39.5000 r_max_diff=0.1246 close_h=True close_r=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.15.self_attn.attn nat=1 phaseB ref: absmax=14.0381 mean=2.9821e-03 kernel: absmax=14.0381 mean=2.9821e-03 diff: max=0.0000 mean=1.3829e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.15.self_attn.attn nat=1 phaseC hidden_ref_absmax=36.7743 hidden_kernel_absmax=36.7500 h_max_diff=0.0243 res_ref_absmax=14.0381 res_kernel_absmax=14.0625 r_max_diff=0.0244 close_h=True close_r=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.19.self_attn.attn nat=1 phaseB ref: absmax=2.4686 mean=6.3049e-04 kernel: absmax=2.4686 mean=6.3050e-04 diff: max=0.0000 mean=1.5019e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.19.self_attn.attn nat=1 phaseC hidden_ref_absmax=6.6036 hidden_kernel_absmax=6.5938 h_max_diff=0.0098 res_ref_absmax=2.4686 res_kernel_absmax=2.4688 r_max_diff=0.0039 close_h=True close_r=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.23.self_attn.attn nat=1 phaseB ref: absmax=8.1704 mean=1.4569e-03 kernel: absmax=8.1704 mean=1.4569e-03 diff: max=0.0000 mean=1.8064e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.23.self_attn.attn nat=1 phaseC hidden_ref_absmax=16.0783 hidden_kernel_absmax=16.1250 h_max_diff=0.0467 res_ref_absmax=8.1704 res_kernel_absmax=8.1875 r_max_diff=0.0171 close_h=True close_r=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.27.self_attn.attn nat=1 phaseB ref: absmax=3.0244 mean=2.7613e-04 kernel: absmax=3.0244 mean=2.7613e-04 diff: max=0.0000 mean=2.4245e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.27.self_attn.attn nat=1 phaseC hidden_ref_absmax=3.2705 hidden_kernel_absmax=3.2656 h_max_diff=0.0049 res_ref_absmax=3.0244 res_kernel_absmax=3.0312 r_max_diff=0.0068 close_h=True close_r=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.31.self_attn.attn nat=1 phaseB ref: absmax=8.9649 mean=-4.0835e-03 kernel: absmax=8.9649 mean=-4.0835e-03 diff: max=0.0000 mean=2.2713e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.31.self_attn.attn nat=1 phaseC hidden_ref_absmax=12.0613 hidden_kernel_absmax=12.0625 h_max_diff=0.0037 res_ref_absmax=8.9649 res_kernel_absmax=8.9375 r_max_diff=0.0274 close_h=True close_r=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:578] [CUTE_DEBUG_FUSION] layer=model.layers.35.self_attn.attn nat=1 phaseB ref: absmax=4.5057 mean=7.0675e-04 kernel: absmax=4.5057 mean=7.0675e-04 diff: max=0.0000 mean=2.2241e-07 close=True
(EngineCore pid=144) INFO 04-17 17:23:03 [_backend.py:607] [CUTE_DEBUG_FUSION] layer=model.layers.35.self_attn.attn nat=1 phaseC hidden_ref_absmax=5.6348 hidden_kernel_absmax=5.6250 h_max_diff=0.0098 res_ref_absmax=4.5057 res_kernel_absmax=4.5000 r_max_diff=0.0057 close_h=True close_r=True
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Own-the-stack Phase B Tier-3 evidence — 2026-04-17

**Refactor branch:** `feat/own-the-stack-phase-b`
**Model:** `natfii/Qwen3.5-27B-NVFP4-Opus-GB10`
**Image:** `nvllm:gb10-ots`
**Graph mode:** PIECEWISE CUDA graphs
**Baseline commit (fusion-ship):** `37cceaa6c`

## GSM8K result

**8/8 (100%) — matches baseline.**

Two runs, both 8/8:
- Initial run without `CUTE_DEBUG_FUSION` — 8/8
- Re-run with `CUTE_DEBUG_FUSION=1` for evidence capture — 8/8

## Fusion engagement

`decode_log.txt` — first 3 full-attention layers × 2 decode steps × phase B/C:

- **Phase B (W_O GEMV):** kernel `absmax` / `mean` vs Python-dequant reference → `close=True` on every line.
- **Phase C (residual + RMSNorm):** kernel `hidden` and `residual` outputs vs reference → `close_h=True close_r=True` on every line.

Across the entire GSM8K run, 1920 decode steps engaged fusion; zero lines with `close=False` or `close_h=False` or `close_r=False`.

Startup logged the new API firing for all 16 full-attention layers:

```
INFO [_backend.py:302] CuTe fusion attached: layer=model.layers.3 max_num_seqs=4 hidden_dim=5120 q_size=6144 attn_output_gate=True
(... 16 such lines, layer indices 3 7 11 ... 63 ...)
INFO [_backend.py:355] CuTe fusion resolved: layer=model.layers.3 wo_weight=[...] rmsnorm_gamma=[...]
(... 16 such lines ...)
```

Confirms `CutePagedAttentionImpl.attach_fusion(parent_layer)` + `_resolve_fusion_weights()` replaced the old `_fusion_bind_callback` / `bind_fusion_weights` pair with no behavioral change.

## Tier-1 jupyter tests (host-side)

All 5 pass: `notebooks/nvllm/fusion_bind_tests.ipynb`.

1. NVFP4 happy-path
2. BF16 skip-path (CLAUDE.md debug protocol step 2 regression gate)
3. Double-resolve rebinds to fresh tensor identity (C1 + C2)
4. Buffer pointer stability across attach calls (H3)
5. Per-forward gate boundary `num_actual_tokens > max_num_seqs` (A3)

## How to reproduce

```bash
cd /home/natfii/docker/nvllm
git checkout feat/own-the-stack-phase-b
docker build -f docker/Dockerfile.gb10 -t nvllm:gb10-ots .
NVLLM_IMAGE=nvllm:gb10-ots CUTE_DEBUG_FUSION=1 ./scripts/serve-cute.sh
.venv/bin/python scripts/gsm8k_sanity.py
```

## Rollback

Single-commit refactor. `git revert HEAD` returns to `37cceaa6c`.
Docker image snapshot `nvllm:gb10-preshim-20260417` tags the pre-refactor build.
Binary file not shown.
Loading
Loading