Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,6 @@ captures/
cursor-tests/
basecamp-video/
launch-video*/

# Local alpha smoke test scaffolding
/test-alpha/
67 changes: 67 additions & 0 deletions IMPLEMENTATION_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# HyperFrames Alpha Video Support — Implementation Plan

## Problem

HyperFrames renders HTML-based compositions to video via Puppeteer + FFmpeg.
When a source `<video>` has an alpha channel (transparent background, e.g.
WebM VP9 `yuva420p` or PNG-in-AVI `rgba`), the alpha is **dropped** during
frame extraction because `videoFrameExtractor.ts` uses JPEG output by default
and, even when PNG is forced (via `--format webm` / `--format mov`), FFmpeg
does not explicitly request `-pix_fmt rgba`, so yuv-flavored inputs lose
transparency and RGB PNGs are produced.

The downstream `<img>` injection pipeline (videoFrameInjector ->
screenshotService) is already alpha-capable — it feeds a data URI into an
`<img>` tag which Chrome blends correctly against whatever sits behind it.
**Fixing the extraction stage therefore unlocks alpha for ALL output formats
(mp4 / webm / mov)**: the DOM behind a transparent lion overlay will show
through even when the final container is opaque MP4, because the alpha
compositing happens inside Chrome before the screenshot is taken.

## Chosen Approach: A — FFmpeg pre-extract with RGBA PNG

HF already runs plan A internally. Minimum-touch fix:

1. Add `hasAlpha` detection at two levels:
- **Explicit opt-in**: `<video data-alpha="true" ...>`.
- **Auto-detection**: probe the source with ffprobe; if `pix_fmt` is one of
`rgba`, `rgba64be/le`, `yuva420p`, `yuva444p`, `yuva422p`, `argb`, `bgra`,
treat it as alpha-bearing.
2. When `hasAlpha` is true, force `format = "png"` in the extraction options
and add `-pix_fmt rgba` plus `-c:v png` to the FFmpeg invocation for **that
video only**. Other SDR / opaque videos continue extracting as JPEG at full
speed — zero regression for existing compositions.
3. Plumb the bit through `VideoElement` so downstream consumers can be taught
about alpha later if needed (e.g. to short-circuit the opaque MP4 encoder
flags).

## Why this is the right scope

- The `<img>` injector already decodes PNG data URIs with alpha.
- `screenshotService.getCdpSession` already sets a transparent default
background (line 113-118 in frameCapture.ts — when `options.format === "png"`).
But that only matters when the OUTPUT is webm/mov. For opaque MP4 output
our lion alpha should just blend correctly against the `<img>` of the
background video already on the page.
- No changes needed in the injector / compositor / encoder paths — alpha is
purely lost at extract time.

## Touch points (final)

| File | Change |
| ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| `packages/engine/src/utils/ffprobe.ts` | Add `pixelFormat` to `VideoMetadata`; parse `pix_fmt` from stream probe. |
| `packages/engine/src/services/videoFrameExtractor.ts` | Parse `data-alpha` attr into `VideoElement.hasAlpha`; auto-detect from probe; force PNG+rgba extraction; add alpha-preserving FFmpeg flags. |
| `packages/producer/src/services/videoFrameExtractor.ts` | Mirror above (producer re-exports engine). |

## Testing

1. Unit: feed an AVI(rgba) through `extractVideoFramesRange`; assert a frame's PNG bytes include a non-255 alpha pixel.
2. Integration: render a tiny HTML with `<div style="background:red"><video data-alpha="true" src="lion_rgba.avi"></video></div>` at mp4 → the frame where lion overlays red must show red in the transparent regions.
3. VSL: rewire `vsl-hf-reproduction/index.html` lion sources to the RGBA AVIs, re-render, open output.mp4 — lion should no longer have a black box.

## Non-goals

- No new CLI flag — attribute-based API.
- No engine-wide `pix_fmt` override — rgba only applied to the videos that need it.
- No change to the `--format webm` / `--format mov` behaviour (it will still produce transparent output videos — independent axis).
95 changes: 95 additions & 0 deletions PR_DRAFT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Alpha-channel preservation for `<video>` extraction

## Problem

When a HyperFrames composition contains a `<video>` with an alpha channel —
WebM VP8/VP9 `yuva420p`, HEVC `yuva444p`, PNG-in-AVI `rgba`, QuickTime ProRes
4444, etc. — the alpha is silently dropped before the frame ever reaches
Chrome. The offending step is the ffmpeg pre-extract:

```
ffmpeg -i alpha.webm -vf fps=30 -q:v ... frame_%05d.jpg
```

1. Default output is **JPEG**, which has no alpha.
2. Even when output is forced to PNG (via `--format webm` / `--format mov`),
the command omits `-pix_fmt rgba`, so ffmpeg may downconvert yuva → yuv and
write an opaque RGB PNG.

Downstream the compositor works correctly — `<img>` data URIs preserve alpha,
Chrome's compositor handles blending — so a one-line extract fix unlocks
transparency for **every** output container, including opaque MP4.

## Approach

Added opt-in + auto-detect alpha preservation at the extraction stage. No
new CLI flag, no changes to the renderer / encoder.

1. **New `data-alpha` attribute** on `<video>` elements:
- `data-alpha="true"` — force RGBA PNG extraction.
- `data-alpha="false"` — force opaque (overrides auto-detect).
- Absent / `"auto"` — auto-detect from ffprobe's `pix_fmt`.
2. **Auto-detection** via new `pixelFormatHasAlpha()` util. Flags the
yuva-family, rgba/argb/bgra/abgr, rgba64, and ya8/ya16 gray-alpha formats.
3. **FFmpeg invocation** is switched to `format=rgba` + `-pix_fmt rgba` for
alpha-bearing videos, with PNG forced. Other videos stay on JPEG at full
speed — zero regression for existing compositions.
4. **Readiness wait bypass**: `frameCapture.initializeSession` no longer
blocks on `video.readyState >= 1` for `data-alpha="true"` elements, since
Chrome often can't decode the source codec (e.g. PNG-in-AVI) and the
renderer swaps the `<video>` for an `<img>` before capture anyway.

## Files changed

| File | Summary |
| ---------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `packages/engine/src/utils/ffprobe.ts` | Added `pixelFormat` + `hasAlpha` to `VideoMetadata`; new `pixelFormatHasAlpha()` util. |
| `packages/engine/src/services/videoFrameExtractor.ts` | Parse `data-alpha` into `VideoElement.hasAlpha`; force PNG+rgba extraction when alpha is in play; thread `hasAlpha` through `extractAllVideoFrames`. |
| `packages/engine/src/services/videoFrameExtractor.test.ts` | New tests for `data-alpha` parsing. |
| `packages/engine/src/services/frameCapture.ts` | Skip the video-readyState gate for `data-alpha="true"` elements in both screenshot and BeginFrame paths. |
| `packages/engine/src/index.ts` | Re-export `pixelFormatHasAlpha`. |
| `packages/producer/src/services/htmlCompiler.ts` | Thread `hasAlpha` through browser-discovered media metadata. |
| `packages/producer/src/services/renderOrchestrator.ts` | Carry `hasAlpha` into new `VideoElement` entries discovered via DOM probe. |

## Usage

```html
<!-- Alpha-preserving video with explicit opt-in -->
<video
id="lion"
class="clip"
data-start="0"
data-duration="5.92"
data-track-index="0"
data-alpha="true"
muted
playsinline
src="assets/lion_rgba.avi"
></video>
```

For well-tagged sources (e.g. WebM with `yuva420p` correctly advertised) the
`data-alpha="true"` attribute is optional — ffprobe auto-detection will set
the flag at extraction time.

## Test results

- **Unit**: `bun run test` for `@hyperframes/engine` — all 4 `parseVideoElements`
tests pass, including 2 new ones for `data-alpha="true"` / `"false"`.
- **Smoke test**: 800×600 composition with a single PNG-in-AVI lion
(`data-alpha="true"`) over a red div → `output.mp4` shows the red background
through every transparent pixel of the lion. No black box, no fringe.
- **Integration**: 97.32 s / 1080p / 30 fps / 2920-frame VSL composition
(`vsl-hf-reproduction/index.html`) with 13 alpha-tagged lion overlays
composited on top of ~13 opaque background clips — rendered end-to-end in
13m4s to a 79.6 MB MP4. Alpha preserved across all lion shots; no regression
on the opaque background videos.

## Non-goals

- No changes to the `--format webm` / `--format mov` transparent-output path.
Those already worked; this fix is about alpha on the _input_ side.
- No new CLI flag. The attribute-based API is consistent with existing
`data-has-audio`, `data-media-start`, etc.
- No engine-wide pix_fmt override — RGBA is applied only to the specific
videos that opt in or auto-detect as alpha-bearing.
1 change: 1 addition & 0 deletions packages/engine/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ export {
extractVideoMetadata,
extractAudioMetadata,
analyzeKeyframeIntervals,
pixelFormatHasAlpha,
type VideoMetadata,
type AudioMetadata,
type KeyframeAnalysis,
Expand Down
13 changes: 9 additions & 4 deletions packages/engine/src/services/frameCapture.ts
Original file line number Diff line number Diff line change
Expand Up @@ -239,11 +239,14 @@ export async function initializeSession(session: CaptureSession): Promise<void>
);
}

// Wait for all video elements to have loaded metadata (dimensions + duration)
// Without this, frame 0 captures videos at their 300x150 default size
// Wait for all video elements to have loaded metadata (dimensions + duration).
// Without this, frame 0 captures videos at their 300x150 default size.
// Videos opted into alpha extraction (data-alpha="true") are skipped —
// their source codec (e.g. PNG-in-AVI, yuva420p webm) may not be Chrome-
// playable, and the renderer will swap them for an <img> before capture.
const videosReady = await pollPageExpression(
page,
`document.querySelectorAll("video").length === 0 || Array.from(document.querySelectorAll("video")).every(v => v.readyState >= 1)`,
`(() => { const vids = Array.from(document.querySelectorAll("video")).filter(v => v.getAttribute("data-alpha") !== "true"); return vids.length === 0 || vids.every(v => v.readyState >= 1); })()`,
pageReadyTimeout,
);
if (!videosReady) {
Expand Down Expand Up @@ -318,11 +321,13 @@ export async function initializeSession(session: CaptureSession): Promise<void>

// Wait for all video elements to have loaded metadata (dimensions + duration).
// Without this, frame 0 captures videos at their 300x150 default size.
// Videos opted into alpha extraction (data-alpha="true") are skipped — see
// the matching comment in the screenshot branch above.
const videoDeadline =
Date.now() + (session.config?.playerReadyTimeout ?? DEFAULT_CONFIG.playerReadyTimeout);
while (Date.now() < videoDeadline) {
const videosReady = await page.evaluate(
`document.querySelectorAll("video").length === 0 || Array.from(document.querySelectorAll("video")).every(v => v.readyState >= 1)`,
`(() => { const vids = Array.from(document.querySelectorAll("video")).filter(v => v.getAttribute("data-alpha") !== "true"); return vids.length === 0 || vids.every(v => v.readyState >= 1); })()`,
);
if (videosReady) break;
await new Promise((r) => setTimeout(r, 100));
Expand Down
22 changes: 21 additions & 1 deletion packages/engine/src/services/videoFrameExtractor.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,33 @@ describe("parseVideoElements", () => {
);

expect(videos).toHaveLength(1);
expect(videos[0]).toEqual({
expect(videos[0]).toMatchObject({
id: "hero",
src: "clip.mp4",
start: 2,
end: 7,
mediaStart: 1.5,
hasAudio: true,
});
// hasAlpha defaults to undefined (auto-detect at extract time via ffprobe).
expect(videos[0]?.hasAlpha).toBeUndefined();
});

it('parses data-alpha="true" as an explicit alpha opt-in', () => {
const videos = parseVideoElements(
'<video id="lion" src="lion.avi" data-start="0" data-duration="5" data-alpha="true"></video>',
);

expect(videos).toHaveLength(1);
expect(videos[0]?.hasAlpha).toBe(true);
});

it('parses data-alpha="false" as an explicit opaque opt-out', () => {
const videos = parseVideoElements(
'<video id="opaque" src="clip.mp4" data-alpha="false"></video>',
);

expect(videos).toHaveLength(1);
expect(videos[0]?.hasAlpha).toBe(false);
});
});
60 changes: 55 additions & 5 deletions packages/engine/src/services/videoFrameExtractor.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import { spawn } from "child_process";
import { existsSync, mkdirSync, readdirSync, rmSync } from "fs";
import { join } from "path";
import { parseHTML } from "linkedom";
import { extractVideoMetadata, type VideoMetadata } from "../utils/ffprobe.js";
import { extractVideoMetadata, pixelFormatHasAlpha, type VideoMetadata } from "../utils/ffprobe.js";
import { isHdrColorSpace as isHdrColorSpaceUtil } from "../utils/hdr.js";
import { downloadToTemp, isHttpUrl } from "../utils/urlDownloader.js";
import { runFfmpeg } from "../utils/runFfmpeg.js";
Expand All @@ -22,6 +22,14 @@ export interface VideoElement {
end: number;
mediaStart: number;
hasAudio: boolean;
/**
* True when the author opts into alpha-preserving extraction via
* `<video data-alpha="true">`. Undefined / "auto" falls back to ffprobe
* pix_fmt detection. When true, the frame extractor forces PNG output and
* adds `-pix_fmt rgba` so transparency survives the extract → inject →
* composite pipeline.
*/
hasAlpha?: boolean;
}

export interface ExtractedFrames {
Expand Down Expand Up @@ -71,6 +79,7 @@ export function parseVideoElements(html: string): VideoElement[] {
const durationAttr = el.getAttribute("data-duration");
const mediaStartAttr = el.getAttribute("data-media-start");
const hasAudioAttr = el.getAttribute("data-has-audio");
const alphaAttr = el.getAttribute("data-alpha");

const start = startAttr ? parseFloat(startAttr) : 0;
// Derive end from data-end → data-start+data-duration → Infinity (natural duration).
Expand All @@ -84,13 +93,22 @@ export function parseVideoElements(html: string): VideoElement[] {
end = Infinity; // no explicit bounds — play for the full natural video duration
}

// data-alpha parsing:
// "true" → force alpha extraction even if ffprobe pix_fmt is opaque
// "false" → force opaque extraction even if source is alpha-bearing
// absent / "auto" → auto-detect from ffprobe pix_fmt at extract time
let hasAlpha: boolean | undefined;
if (alphaAttr === "true") hasAlpha = true;
else if (alphaAttr === "false") hasAlpha = false;

videos.push({
id,
src,
start,
end,
mediaStart: mediaStartAttr ? parseFloat(mediaStartAttr) : 0,
hasAudio: hasAudioAttr === "true",
hasAlpha,
});
}

Expand All @@ -105,14 +123,30 @@ export async function extractVideoFramesRange(
options: ExtractionOptions,
signal?: AbortSignal,
config?: Partial<Pick<EngineConfig, "ffmpegProcessTimeout">>,
overrides?: {
/**
* When true, force alpha-preserving extraction: PNG output + `-pix_fmt rgba`
* in the FFmpeg invocation. When undefined, ffprobe-detected alpha from the
* source file is used as an auto-opt-in.
*/
hasAlpha?: boolean;
},
): Promise<ExtractedFrames> {
const ffmpegProcessTimeout = config?.ffmpegProcessTimeout ?? DEFAULT_CONFIG.ffmpegProcessTimeout;
const { fps, outputDir, quality = 95, format = "jpg" } = options;
const { fps, outputDir, quality = 95 } = options;

const videoOutputDir = join(outputDir, videoId);
if (!existsSync(videoOutputDir)) mkdirSync(videoOutputDir, { recursive: true });

const metadata = await extractVideoMetadata(videoPath);

// Alpha detection: explicit override (from data-alpha attribute) wins; else
// auto-detect from probed pix_fmt. When alpha is in play we switch to PNG +
// rgba so the injected <img> retains transparency for Chrome compositing.
const alphaAuto = pixelFormatHasAlpha(metadata.pixelFormat);
const hasAlpha = overrides?.hasAlpha ?? alphaAuto;
const format: "jpg" | "png" = hasAlpha ? "png" : (options.format ?? "jpg");

const framePattern = `frame_%05d.${format}`;
const outputPattern = join(videoOutputDir, framePattern);

Expand All @@ -124,21 +158,36 @@ export async function extractVideoFramesRange(
const isMacOS = process.platform === "darwin";

const args: string[] = [];
if (isHdr && isMacOS) {
if (isHdr && isMacOS && !hasAlpha) {
// VideoToolbox hwaccel drops alpha — skip for alpha-bearing sources.
args.push("-hwaccel", "videotoolbox");
}
args.push("-ss", String(startTime), "-i", videoPath, "-t", String(duration));

const vfFilters: string[] = [];
if (isHdr && isMacOS) {
if (isHdr && isMacOS && !hasAlpha) {
// VideoToolbox tone-maps during decode; force output to bt709 SDR format
vfFilters.push("format=nv12");
}
if (hasAlpha) {
// Explicitly request RGBA. Without this, FFmpeg's PNG encoder may pick
// rgb24 when the decoder emits yuva420p without a libvpx-aware path,
// silently dropping the alpha plane. `format=rgba` inserts a scaler /
// format filter that decodes alpha correctly for yuva* / rgba* inputs.
vfFilters.push("format=rgba");
}
vfFilters.push(`fps=${fps}`);
args.push("-vf", vfFilters.join(","));

args.push("-q:v", format === "jpg" ? String(Math.ceil((100 - quality) / 3)) : "0");
if (format === "png") args.push("-compression_level", "6");
if (format === "png") {
args.push("-compression_level", "6");
if (hasAlpha) {
// Force the PNG encoder to write RGBA, not RGB. Redundant with the
// vf `format=rgba` on well-behaved builds, but explicit for safety.
args.push("-pix_fmt", "rgba");
}
}
args.push("-y", outputPattern);

return new Promise((resolve, reject) => {
Expand Down Expand Up @@ -355,6 +404,7 @@ export async function extractAllVideoFrames(
options,
signal,
config,
{ hasAlpha: video.hasAlpha },
);

return { result };
Expand Down
Loading