Skip to content

perf(engine): gated hwaccel auto for SDR opaque extraction#435

Closed
jrusso1020 wants to merge 1 commit intoperf/producer-webp-extraction-formatfrom
perf/producer-gated-hwaccel-decode
Closed

perf(engine): gated hwaccel auto for SDR opaque extraction#435
jrusso1020 wants to merge 1 commit intoperf/producer-webp-extraction-formatfrom
perf/producer-gated-hwaccel-decode

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 commented Apr 23, 2026

What

Enables -hwaccel auto on the Phase 3 ffmpeg extract, gated so it only kicks in for inputs that can safely take it: SDR, opaque pixel format, segment duration above a floor. Default on, with a config opt-out.

To support the gating, VideoMetadata now exposes pixelFormat: string and hasAlpha: boolean, derived from ffprobe's pix_fmt field via a new pixelFormatHasAlpha(pixFmt) helper.

Why

Final PR of the 5-PR stack from hyperframes-notes/producer-render-architecture-review-2026-04-21.md.

The doc explicitly calls for gated hwaccel:

Hardware-accelerated SDR decode, gated on !hasAlpha. [...] Leave the alpha path on software decode — hwaccel decoders generally collapse alpha to opaque. Validate with actual traces: hwaccel often helps less than expected on short segments because of init cost.

And warns against a blanket default:

Don't pursue hwaccel as a blanket default. The alpha path must stay on software decode (or explicitly opt-in to a decoder that preserves alpha, of which there are few). Gating on !hasAlpha is the minimum; gating additionally on segment length (skip hwaccel init for sub-second segments) is probably right too.

This PR lands both gates exactly as described. Impact is measurable via RenderPerfSummary.videoExtractBreakdown.extractMs (PR 1) on SDR-heavy compositions.

How

pixelFormatHasAlpha (ffprobe.ts)

Substring/exact-match check against the canonical alpha-bearing pixel formats:

  • yuva* — planar YUV + alpha (including 10/12-bit variants like yuva444p10le)
  • rgba / bgra / argb / abgr — 8-bit packed RGBA (exact match so bgr doesn't false-positive)
  • rgba64* / bgra64* / argb64* / abgr64* — 16-bit packed RGBA

Case-insensitive. Everything else (including yuv420p, gbrp, nv12, rgb48le) returns false.

VideoMetadata additions

pixelFormat: string populated from videoStream.pix_fmt; hasAlpha: boolean derived via the helper. Still-image fallback path sets both to empty/false since image alpha detection lives elsewhere (the HDR/PNG path in the orchestrator).

shouldEnableHwaccelSdr({ isHdr, hasAlpha, durationSeconds, config })

Extracted as a pure function in videoFrameExtractor.ts so the gating contract is unit-testable without spawning ffmpeg. Returns true iff:

  • config.hwaccelSdrDecode === true (master switch)
  • !isHdr (HDR takes the existing macOS VideoToolbox path; generic hwaccel would conflict)
  • !hasAlpha (hardware decoders generally strip alpha)
  • durationSeconds >= config.hwaccelMinDurationSeconds (default 2.0)

The extractor calls this once per input. When true, -hwaccel auto is added to the args — ffmpeg picks the best available accelerator (VideoToolbox on macOS, VAAPI on Linux, NVDEC on CUDA hosts). When nothing is available, ffmpeg silently falls back to software decode.

Config

Two new EngineConfig fields, both with env var fallbacks:

Field Env Default
hwaccelSdrDecode: boolean PRODUCER_HWACCEL_SDR_DECODE true
hwaccelMinDurationSeconds: number PRODUCER_HWACCEL_MIN_DURATION_SECONDS 2.0

Opt-out: set PRODUCER_HWACCEL_SDR_DECODE=false. Floor is tunable per platform based on PR 1's measurements.

Mutual exclusivity with the HDR path

-hwaccel videotoolbox on the existing HDR+macOS branch and -hwaccel auto on the new SDR branch are in an if/else — never both at once.

Test plan

  • 25 new tests in ffprobe.test.ts covering pixelFormatHasAlpha against 11 alpha-bearing pix_fmts, 10 opaque ones, an empty string, and case-insensitivity.
  • 7 new tests in videoFrameExtractor.test.ts covering the shouldEnableHwaccelSdr gating fence posts:
    • enables for long opaque SDR
    • disables for alpha source
    • disables for HDR source
    • disables for short segment
    • respects master switch
    • respects a lowered duration floor
    • enables at the duration floor (inclusive boundary)
  • Existing VFR / HDR / cache / WebP integration tests still pass — the gating doesn't regress any other path.
  • Full engine suite: 373 tests pass (was 344).
  • Typecheck clean in engine + producer.
  • Lint + format clean.

Stack

Final PR of the 5-PR stack, built on #434#433#432#430.

Every improvement in the stack is measurable via the perf summary added in PR 1 — validation plan in the architecture review doc is now runnable end-to-end against real compositions.

Follow-ups (out of scope for the current stack)

Rough ordering if further work is desired, roughly following the architecture review's tier B/C:

  • Cache eviction (size-capped LRU) — PR 3 left this as a known limitation.
  • HDR/VFR-preflighted inputs bypass the cache today; a preflight-aware key would let them benefit too.
  • URL-keyed HTTP download cache — today each download resolves to a fresh tmp path and always misses.
  • Tier B: pipe raw frames from ffmpeg to the renderer (skip disk round-trip).
  • Tier C: in-browser WebCodecs decode path (obsoletes both preflights).

Copy link
Copy Markdown
Collaborator Author

jrusso1020 commented Apr 23, 2026

@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Closed: inert on validation host (no hardware decoder). Deferring to standalone PR once we can measure benefit on a NVDEC/VideoToolbox host. See hyperframes-notes/perf-stack-rebuild-plan-2026-04-23.md

@jrusso1020 jrusso1020 closed this Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant