feat(core): add emitPerformanceMetric bridge for runtime telemetry by vanceingalls · Pull Request #393 · heygen-com/hyperframes

vanceingalls · 2026-04-21T23:05:25Z

Summary

Extend the runtime analytics bridge with a numeric performance metric channel. Hosts subscribe via the existing postMessage transport (one bridge, two channels) and aggregate per-session p50 / p95 for scrub latency, sustained fps, dropped frames, decoder count, composition load time, and media sync drift before forwarding to their observability pipeline.

This is the foundation other perf tooling sits on — the player itself emits the events; player-side aggregation and flush land in a follow-up.

Why

Step X-1 of the player perf proposal. Today there is no way for an embedding host to learn that scrub latency spiked, that a composition took 3 s to load, or that the media-sync loop is running 200 ms behind real time. The only signals are anecdotal user reports.

A single shared bridge keeps the runtime → host surface area minimal: hosts that already wire up the analytics channel get perf for free, and hosts that don't aren't paying for it.

What changed

New emitPerformanceMetric(name, value, tags?) helper in @hyperframes/core that forwards a { type: "performance-metric", name, value, tags } envelope through the existing analytics postMessage transport.
Six initial metric names defined in the proposal:
- scrub_latency_ms — wall-clock from seek() call to first paint at the new frame.
- playback_fps — sustained rAF cadence during play.
- dropped_frames — count of >25 ms gaps within a play window.
- decoder_count — number of concurrently-decoding video elements.
- composition_load_ms — navigation-start to player-ready.
- media_sync_drift_ms — drift between expected and actual decoder time.
Each emit also writes a performance.mark() with { value, tags } on detail, so the same numbers surface in the DevTools Performance panel's User Timing track for local debugging without instrumenting the host.
Zero PostHog (or any other analytics SDK) dependency in core — the host decides where to forward the events.

Test plan

Unit tests cover the envelope shape, the performance.mark mirror, and the no-op path when no host has wired up the bridge.
Manual: verified marks appear in the User Timing track when scrubbing the studio preview.

Stack

Step X-1 of the player perf proposal. Foundation for the perf gate (P0-1a/b/c) — the perf scenarios in this stack instrument these same channels for CI measurement.

Extends the runtime analytics bridge with a numeric performance metric channel for scrub latency, sustained fps, dropped frames, decoder count, composition load time, and media sync drift. Metrics flow through the existing postMessage transport (one bridge, two channels) so hosts can aggregate per session (p50/p95) and forward to their observability pipeline. Also writes performance.mark() with value+tags on detail so metrics surface in the DevTools Performance panel's User Timing track for local debugging. No PostHog dependency in core. Player-side aggregation and flush land in a follow-up PR per the player-perf proposal.

vanceingalls · 2026-04-21T23:05:40Z

perf(player): scope MutationObserver to composition hosts #395 : 2 dependent PRs (#396 , #428 )
perf(player): share PLAYER_STYLES via adoptedStyleSheets #394
feat(core): add emitPerformanceMetric bridge for runtime telemetry #393 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

jrusso1020

Clean bridge. Additive, no breaking changes, everything downstream of postMessage is guarded. Test coverage is exactly what I'd want to see for a telemetry path — nulled transport, throwing transport, throwing performance.mark, zero/negative values, tag normalization, and a DevTools User Timing assertion when the environment supports it. Nice.

The double-guard (both performance.mark and postMessage wrapped in try/catch) is the right shape for runtime code that must never affect playback. One small non-blocking observation: the RuntimePerformanceTags type duplicates RuntimeAnalyticsProperties — may be worth collapsing to a shared alias in a followup if any third shape gets added, but carving them apart now is also fine since the semantics genuinely differ.

Approved.

— Rames Jusso

vanceingalls · 2026-04-22T21:50:12Z

@jrusso1020 — thanks for the careful read. One observation, intentionally deferred:

the RuntimePerformanceTags type duplicates RuntimeAnalyticsProperties — may be worth collapsing to a shared alias in a followup

Agreed it's worth revisiting. The two shapes share keys today but the semantics genuinely diverge — RuntimeAnalyticsProperties carries event-shaped attributes (cardinality-bounded for analytics warehouses), while RuntimePerformanceTags carries measurement-shaped attributes (used as DevTools User Timing detail and as PerformanceObserver-friendly key/value pairs). Folding them into a shared alias today would force one shape to inherit the other's constraints. Tracked as follow-up; will collapse once a third consumer arrives and the common subset is forced.

Nothing else outstanding here.

## Summary Adds **scenario 06: live-playback parity** — the third and final tranche of the P0-1 perf-test buildout (`p0-1a` infra → `p0-1b` fps/scrub/drift → this). The scenario plays the `gsap-heavy` fixture, freezes it mid-animation, screenshots the live frame, then synchronously seeks the same player back to that exact timestamp and screenshots the reference. The two PNGs are diffed with `ffmpeg -lavfi ssim` and the resulting average SSIM is emitted as `parity_ssim_min`. Baseline gate: **SSIM ≥ 0.95**. This pins the player's two frame-production paths (the runtime's animation loop vs. `_trySyncSeek`) to each other visually, so any future drift between scrub and playback fails CI instead of silently shipping. ## Motivation `<hyperframes-player>` produces frames two different ways: 1. **Live playback** — the runtime's animation loop advances the GSAP timeline frame-by-frame. 2. **Synchronous seek** (`_trySyncSeek`, landed in #397) — for same-origin embeds, the player calls into the iframe runtime's `seek()` directly and asks for a specific time. These paths must agree. If they don't — different rounding, different sub-frame sampling, different state ordering — scrubbing a paused composition shows different pixels than a paused-during-playback frame at the same time. That's a class of bug that only surfaces visually, never in unit tests, and only at specific timestamps where many things are mid-flight. `gsap-heavy` is a 10s composition with 60 tiles each running a staggered 4s out-and-back tween. At t=5.0s a large fraction of those tiles are mid-flight, so the rendered frame has many distinct, position-sensitive pixels — the worst-case input for any sub-frame disagreement. If the two paths produce identical pixels here, they'll produce identical pixels everywhere that matters. ## What changed - **`packages/player/tests/perf/scenarios/06-parity.ts`** — new scenario (~340 lines). Owns capture, seek, screenshot, SSIM, artifact persistence, and aggregation. - **`packages/player/tests/perf/index.ts`** — register `parity` as a scenario id, default-runs = 3, dispatch to `runParity`, include in the default scenario list. - **`packages/player/tests/perf/perf-gate.ts`** — extend `PerfBaseline` with `paritySsimMin`. - **`packages/player/tests/perf/baseline.json`** — `paritySsimMin: 0.95`. - **`.github/workflows/player-perf.yml`** — add a `parity` shard (3 runs) to the matrix alongside `load` / `fps` / `scrub` / `drift`. ## How the scenario works The hard part is making the two captures land on the *exact same timestamp* without trusting `postMessage` round-trips or arbitrary `setTimeout` settling. 1. **Install an iframe-side rAF watcher** before issuing `play()`. The watcher polls `__player.getTime()` every animation frame and, the first time `getTime() >= 5.0`, calls `__player.pause()` *from inside the same rAF tick*. `pause()` is synchronous (it calls `timeline.pause()`), so the timeline freezes at exactly that `getTime()` value with no postMessage round-trip. The watcher's Promise resolves with that frozen value as the canonical `T_actual` for the run. 2. **Confirm `isPlaying() === true`** via `frame.waitForFunction` before awaiting the watcher. Without this, the test can hang if `play()` hasn't kicked the timeline yet. 3. **Wait for paint** — two `requestAnimationFrame` ticks on the host page. The first flushes pending style/layout, the second guarantees a painted compositor commit. Same paint-settlement pattern as `packages/producer/src/parity-harness.ts`. 4. **Screenshot the live frame** — `page.screenshot({ type: "png" })`. 5. **Synchronously seek to `T_actual`** — call `el.seek(capturedTime)` on the host page. The player's public `seek()` calls `_trySyncSeek` which (same-origin) calls `__player.seek()` synchronously, so no postMessage await is needed. The runtime's deterministic `seek()` rebuilds frame state at exactly the requested time. 6. **Wait for paint** again, screenshot the reference frame. 7. **Diff with ffmpeg** — `ffmpeg -hide_banner -i reference.png -i actual.png -lavfi ssim -f null -`. ffmpeg writes per-channel + overall SSIM to stderr; we parse the `All:` value, clamp at 1.0 (ffmpeg occasionally reports 1.000001 on identical inputs), and treat it as the run's score. 8. **Persist artifacts** under `tests/perf/results/parity/run-N/` (`actual.png`, `reference.png`, `captured-time.txt`) so CI can upload them and so a failed run is locally reproducible. Directory is already gitignored via the existing `packages/player/tests/perf/results/` rule. ### Aggregation `min()` across runs, **not** mean. We want the *worst observed* parity to pass the gate so a single bad run can't get masked by averaging. Both per-run scores and the aggregate are logged. ### Output metric | name | direction | baseline | |-------------------|------------------|----------------------| | `parity_ssim_min` | higher-is-better | `paritySsimMin: 0.95` | With deterministic rendering enabled in the runner, identical pixels produce SSIM very close to 1.0; the 0.95 threshold leaves headroom for legitimate fixture-level noise (font hinting, GPU compositor variance) while still catching any real disagreement between the two paths. ## Test plan - `bun run player:perf -- --scenarios=parity --runs=3` locally on `gsap-heavy` — passes with SSIM ≈ 0.999 across all 3 runs. - Inspected `results/parity/run-1/actual.png` and `reference.png` side-by-side — visually identical. - Inspected `captured-time.txt` to confirm `T_actual` lands just past 5.0s (within one frame). - Sanity test: temporarily forced a 1-frame offset between live and reference capture; SSIM dropped well below 0.95 as expected, confirming the threshold catches real drift. - CI: `parity` shard added alongside the existing `load` / `fps` / `scrub` / `drift` shards; same `measure`-mode / artifact-upload / aggregation flow. - `bunx oxlint` and `bunx oxfmt --check` clean on the new scenario. ## Stack This is the top of the perf stack: 1. #393 `perf/x-1-emit-performance-metric` — performance.measure() emission 2. #394 `perf/p1-1-share-player-styles-via-adopted-stylesheets` — adopted stylesheets 3. #395 `perf/p1-2-scope-media-mutation-observer` — scoped MutationObserver 4. #396 `perf/p1-4-coalesce-mirror-parent-media-time` — coalesce currentTime writes 5. #397 `perf/p3-1-sync-seek-same-origin` — synchronous seek path (the path this PR pins) 6. #398 `perf/p3-2-srcdoc-composition-switching` — srcdoc switching 7. #399 `perf/p0-1a-perf-test-infra` — server, runner, perf-gate, CI 8. #400 `perf/p0-1b-perf-tests-for-fps-scrub-drift` — fps / scrub / drift scenarios 9. **#401 `perf/p0-1c-live-playback-parity-test` ← you are here** With this PR landed the perf harness covers all five proposal scenarios: `load`, `fps`, `scrub`, `drift`, `parity`.

vanceingalls marked this pull request as ready for review April 21, 2026 23:13

jrusso1020 approved these changes Apr 22, 2026

View reviewed changes

vanceingalls merged commit f9863ab into main Apr 22, 2026
25 checks passed

vanceingalls mentioned this pull request Apr 22, 2026

fix: text #428

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): add emitPerformanceMetric bridge for runtime telemetry#393

feat(core): add emitPerformanceMetric bridge for runtime telemetry#393
vanceingalls merged 1 commit intomainfrom
perf/x-1-emit-performance-metric

vanceingalls commented Apr 21, 2026 •

edited

Loading

Uh oh!

vanceingalls commented Apr 21, 2026 •

edited

Loading

Uh oh!

jrusso1020 left a comment

Uh oh!

vanceingalls commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vanceingalls commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changed

Test plan

Stack

Uh oh!

vanceingalls commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrusso1020 left a comment

Choose a reason for hiding this comment

Uh oh!

vanceingalls commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vanceingalls commented Apr 21, 2026 •

edited

Loading

vanceingalls commented Apr 21, 2026 •

edited

Loading