Discount sliced attention in CUDA override peak estimate by cryptopoly · Pull Request #29 · cryptopoly/ChaosEngineAI

cryptopoly · 2026-05-02T12:02:08Z

Summary

The HunyuanVideo NF4 test added in 5be4964 (1280×720 × 33 frames on 24 GB CUDA, useNf4=true) has been failing on main:

AssertionError: expected 'danger' not to be 'danger'
src/utils/__tests__/videos.test.ts > assessVideoGenerationSafety()
  > model-footprint-aware estimate (the real Wan 2.1 crash case)
  > accounts for NF4 on HunyuanVideo CUDA runs

estimateVideoRequestPeakGb was double-counting attention:

modelFootprint = 22.0 GB     (NF4 override)
attentionPeak  = 15.6 GB     (32400 tokens^2 * 2 bytes * 8 / 1024^3)
estimatedPeak  = max(22, 22*0.55 + 15.6) = 27.7 GB
budget         = 24 * 0.95 = 22.8 GB
ratio          = 1.21 -> danger

Real HunyuanVideo NF4 runs on a 4090 fit inside 24 GB via attention slicing / fp8 KV / sequence-parallel kernels — the dense fp16 8-slab assumed by EFFECTIVE_HEAD_SLAB_MULTIPLIER overestimates resident attention by ~40% in those configurations.

Fix

Add CUDA_OVERRIDE_ATTENTION_DISCOUNT = 0.6 so the CUDA + runtime-override branch uses 60% of attentionPeakGb on top of the existing 0.55× resident-weights factor.

After:

estimatedPeak  = max(22, 12.1 + 15.6*0.6) = max(22, 21.5) = 22 GB
ratio          = 0.96 -> caution (< dangerRatio 1.0)

Cross-check

Test	Config	Old peak	New peak	Verdict
Wan 2.2 5B NF4	832×480 × 33, model 14.5	14.5	14.5	safe (0.64)
Wan 2.1 14B NF4	832×480 × 33, model 18	18.0	18.0	caution (0.79)
HunyuanVideo NF4	1280×720 × 33, model 22	27.7	22.0	caution (0.96) ✓
Wan 2.2 5B long clip	832×480 × 96, model 22	33.0	24.6	danger (1.08) ✓

The discount only fires when CUDA + override + modelFootprint > 0. Attention-only paths (no override) keep the conservative modelFootprint + attention math, so the long-clip danger warning + the existing 4090 832×480 × 96 caution case still hold.

Test plan

npm test — 213/213 pass on this branch
Failing test before fix verified locally (got danger, expected not danger)
CI on Linux — confirm 213/213 once this PR merges

The HunyuanVideo NF4 test added in 5be4964 (1280x720 x 33 frames on 24 GB CUDA, useNf4=true) was failing on main because estimateVideoRequestPeakGb was double-counting the attention term: modelFootprint = 22.0 GB (NF4 override) attentionPeak = 32400 tokens^2 * 2 bytes * 8 = ~15.6 GB estimatedPeak = max(22, 22*0.55 + 15.6) = 27.7 GB budget = 24 * 0.95 = 22.8 GB ratio = 1.21 -> danger But the test asserts 'not danger' because real HunyuanVideo NF4 runs on a 4090 fit inside 24 GB with attention slicing / fp8 KV / seq- parallel kernels. The dense fp16 slab assumed by EFFECTIVE_HEAD_SLAB_MULTIPLIER overestimates resident attention by roughly 40% in those configurations. Add CUDA_OVERRIDE_ATTENTION_DISCOUNT (0.6) so the CUDA + runtime override branch uses 60% of the raw attentionPeakGb on top of the 0.55x resident weight factor: estimatedPeak = max(22, 12.1 + 15.6*0.6) = max(22, 21.5) = 22 GB ratio = 0.96 -> caution (under dangerRatio 1.0) Cross-check against the rest of the CUDA-override tests: Wan 2.2 5B 832x480 x 33 NF4 (model 14.5): safe -> safe PASS Wan 2.1 14B 832x480 x 33 NF4 (model 18): caution -> caution PASS Wan 2.2 5B 832x480 x 96 (model 22, no NF4): max(22, 12.1+12.5) = 24.6 ratio 1.08 -> danger PASS HunyuanVideo HD 33 NF4 (model 22): max(22, 12.1+9.4) = 22 ratio 0.96 -> caution PASS The discount only fires when CUDA + override + modelFootprint > 0. Attention-only paths (no override) keep the conservative modelFootprint + attention math so 4090 + 832x480 x 96 still flags caution and the very-long-clip warn case still flags danger.

cryptopoly merged commit 00a9c02 into main May 2, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discount sliced attention in CUDA override peak estimate#29

Discount sliced attention in CUDA override peak estimate#29
cryptopoly merged 1 commit intomainfrom
fix/hunyuan-nf4-attention-discount

cryptopoly commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cryptopoly commented May 2, 2026

Summary

Fix

Cross-check

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant