Skip to content

fix(engine): auto-normalize VFR video inputs to CFR before frame extraction#360

Merged
jrusso1020 merged 6 commits intomainfrom
fix/vfr-screen-recording-freeze
Apr 21, 2026
Merged

fix(engine): auto-normalize VFR video inputs to CFR before frame extraction#360
jrusso1020 merged 6 commits intomainfrom
fix/vfr-screen-recording-freeze

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

What

Auto-normalize variable-frame-rate (VFR) video inputs to constant-frame-rate (CFR) before the frame-extraction stage, eliminating the "frozen screen recording" class of bugs in rendered compositions.

Why

A user reported on X that their screen-recording scenes were freezing in hyperframes renders. Investigation:

  • macOS ScreenCaptureKit, QuickTime screen recordings, and phone videos all emit VFR by default (frames written only on content change).
  • The extractor at packages/engine/src/services/videoFrameExtractor.ts:142 runs ffmpeg -ss <start> -i <video> -t <dur> -vf fps=N on the input.
  • On VFR inputs, the fps filter exhibits two failure modes:
    1. Frame-count shortfall: for a 4-second segment at 30fps starting mid-file, output was ~90 frames instead of 120. FrameLookupTable.getFrameAtTime (line 440) returns null for out-of-range indices, so the compositor holds the last valid frame — the user sees a freeze.
    2. Duplicate-frame runs: 32-39% of output frames were duplicates on realistic VFR fixtures, because the fps filter snaps multiple outputs to the same nearest source frame.

The engine already detects VFR via metadata.isVFR (packages/engine/src/utils/ffprobe.ts:257) but never acted on it — the compiler only logged a warning telling users to manually re-encode.

How

Mirrored the existing SDR→HDR normalization pattern in the same file. When metadata.isVFR === true, re-encode the used segment to CFR before extraction:

ffmpeg -ss <mediaStart> -i <video> -t <duration> \
       -fps_mode cfr -r <targetFps> \
       -c:v libx264 -preset fast -crf 18 \
       -c:a copy -y <normalized.mp4>

Scoping to [mediaStart, mediaStart+duration] rather than full-file means a 30-second clip cut from a 60-minute screen recording pays ~1s of transcode cost, not 18s.

Benchmarked locally on a synthesized VFR fixture (10s of testsrc2 with ~40% frames dropped, sparse keyframes every 10s):

Strategy Dupe rate Frame-count shortfall
Baseline (before) 32-39% 25% shortfall on mid-segments
Flag changes only (-fps_mode cfr, -r vs -vf fps=N, accurate seek) ~same still short
CFR preflight (this PR) 1.7-6% perfect frame count

The flag-only tweaks were tested and insufficient — the fps filter's VFR handling is the underlying issue, not the seek mode or output-rate flag. Pre-encoding is the fix experiment-framework already uses internally for the same reason (see worker/celery/movio/utils/ffmpeg_utils.py reencode_video_if_potentially_vfr).

The compiler warning that used to tell users to manually re-encode VFR videos is downgraded from console.warn to console.info since the engine now handles it; the message still mentions the pre-encode command for users who want to skip the per-render transcode cost.

Test plan

  • Unit tests added: packages/engine/src/services/videoFrameExtractor.test.ts
    • Detects the fixture as VFR.
    • Regression test: mid-file segment produces expected frame count (120 @ 4s × 30fps, previously ~90).
    • Full-file case produces expected frame count (300 @ 10s × 30fps).
  • All 311 existing engine tests still pass.
  • Typecheck, lint, format clean.
  • Manual validation against the user's actual screen recording (once Miguel gets the repro file from the reporter).

— Rames Jusso

…action

Screen recordings (macOS ScreenCaptureKit, QuickTime, phone videos) are
commonly variable-frame-rate. When such inputs hit the extractor's
`-ss <start> -i <video> -t <dur> -vf fps=N` pipeline, the fps filter
can emit fewer frames than requested — for a 4-second 30fps segment
starting mid-file, the output was ~90 frames instead of 120.

`FrameLookupTable.getFrameAtTime` returns null for out-of-range indices,
so the compositor held the last valid frame and the user perceived the
video as freezing. This matches the bug report from an X community post
where a user said "all of them freezes" on their screen recording scenes.

The engine already detects VFR via `metadata.isVFR` in ffprobe.ts but
never acted on it — the compiler only logged a warning. This change
mirrors the existing SDR→HDR normalization pattern: when a source is
detected as VFR, re-encode only the used segment with
`-fps_mode cfr -r <fps> -preset fast -crf 18` before extraction.

Scoping the re-encode to `[mediaStart, mediaStart+duration]` means a
30-second clip cut from a 60-minute screen recording pays ~1s of
transcode cost, not 18s. Benchmarked locally:

  Baseline (current):         32-39% duplicate frames, 25% frame-count
                              shortfall on mid-file segments.
  Tier 1 (flag changes only): ~same — fps filter issue is not flag-fixable.
  Tier 2 (CFR preflight):     1.7-6% duplicate frames, correct frame
                              count in every scenario tested.

The compiler warning that previously told users to manually re-encode
is downgraded to `console.info` since the engine now handles it.

— Rames Jusso
- Drop the `vfrNormDirCreated` flag; `mkdirSync({recursive:true})` is
  idempotent and cheap.
- Don't re-wrap the `VFR→CFR conversion failed` prefix — `convertVfrToCfr`
  already throws a message with that label; adding it again in the catch
  produced "VFR→CFR conversion failed: VFR→CFR conversion failed (exit 1)".
- Shorten the Phase 2b header comment; the function docstring above
  `convertVfrToCfr` already explains the failure modes and rationale.
- Note which frame windows the VFR fixture's select filter drops so the
  magic numbers are scannable.

No behavior change; 311/311 engine tests still pass.

— Rames Jusso
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a regression test just in case

jrusso1020 and others added 4 commits April 21, 2026 18:01
Adds a describe block that synthesizes a VFR fixture via ffmpeg and asserts
the extractor produces the expected frame count (no shortfall) and no long
runs of duplicate frames — the user-visible "frozen screen recording"
symptom. Covers both a mid-file segment and the full-file case.

Guarded with describe.skipIf(!HAS_FFMPEG) because the CI Test job on
ubuntu-24.04 and the Windows test-windows job don't install ffmpeg. The
producer-level regression test in packages/producer/tests/vfr-screen-recording/
runs inside Dockerfile.test (which has ffmpeg) and is the primary CI signal
for this bug; these unit tests are supplementary coverage for local and
any ffmpeg-equipped CI environment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end CI regression coverage for PR #360 via the existing
regression-harness: renders a 3s composition containing a real macOS
ScreenCaptureKit clip (r_frame_rate=120, avg≈36fps) seeked to
mediaStart=1, then PSNR-compares against a committed output.mp4.

Fixture src/clip.mp4 (108 KB) is a 5-second excerpt downscaled to 480×332
with -fps_mode passthrough to preserve the VFR timestamps. Content is the
public hyperframes OSS repo root page — see NOTICE.md for provenance.

With the fix applied, all 100 PSNR checkpoints pass. With the fix reverted,
66 of 100 fail (PSNR drops from ~43 dB to ~20 dB in the duplicate-frame
windows). Tagged "regression,video,vfr" so it runs in the fast shard
of .github/workflows/regression.yml automatically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The committed golden output.mp4 was initially rendered on the host machine;
CI runs the renderer inside Dockerfile.test with a different Chrome +
ffmpeg build, producing pixel-level drift that failed PSNR at 54/100
checkpoints (~20 dB vs 41 dB in the VFR sparse-content windows). Both
renders are valid — the VFR source has inherent sampling ambiguity in
static segments, and different Chrome/ffmpeg builds make different valid
choices.

Regenerated the baseline via `bun run docker:test:update vfr-screen-recording`
so it matches the Docker environment CI actually uses. Matches the flow
the existing sub-composition-video, hdr-pq, etc. baselines were captured
with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hit this 2026-04-21 with the vfr-screen-recording regression test:
host-generated output.mp4 baseline tripped 54/100 PSNR checkpoints in CI
because Chrome + ffmpeg drift between the host and Dockerfile.test.

Document the `bun run --cwd packages/producer docker:test:update <name>`
flow so future contributors don't repeat the mistake.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jrusso1020 jrusso1020 merged commit ffc0682 into main Apr 21, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants