Skip to content

feat(core): add emitPerformanceMetric bridge for runtime telemetry#393

Merged
vanceingalls merged 1 commit intomainfrom
perf/x-1-emit-performance-metric
Apr 22, 2026
Merged

feat(core): add emitPerformanceMetric bridge for runtime telemetry#393
vanceingalls merged 1 commit intomainfrom
perf/x-1-emit-performance-metric

Conversation

@vanceingalls
Copy link
Copy Markdown
Collaborator

@vanceingalls vanceingalls commented Apr 21, 2026

Summary

Extend the runtime analytics bridge with a numeric performance metric channel. Hosts subscribe via the existing postMessage transport (one bridge, two channels) and aggregate per-session p50 / p95 for scrub latency, sustained fps, dropped frames, decoder count, composition load time, and media sync drift before forwarding to their observability pipeline.

This is the foundation other perf tooling sits on — the player itself emits the events; player-side aggregation and flush land in a follow-up.

Why

Step X-1 of the player perf proposal. Today there is no way for an embedding host to learn that scrub latency spiked, that a composition took 3 s to load, or that the media-sync loop is running 200 ms behind real time. The only signals are anecdotal user reports.

A single shared bridge keeps the runtime → host surface area minimal: hosts that already wire up the analytics channel get perf for free, and hosts that don't aren't paying for it.

What changed

  • New emitPerformanceMetric(name, value, tags?) helper in @hyperframes/core that forwards a { type: "performance-metric", name, value, tags } envelope through the existing analytics postMessage transport.
  • Six initial metric names defined in the proposal:
    • scrub_latency_ms — wall-clock from seek() call to first paint at the new frame.
    • playback_fps — sustained rAF cadence during play.
    • dropped_frames — count of >25 ms gaps within a play window.
    • decoder_count — number of concurrently-decoding video elements.
    • composition_load_ms — navigation-start to player-ready.
    • media_sync_drift_ms — drift between expected and actual decoder time.
  • Each emit also writes a performance.mark() with { value, tags } on detail, so the same numbers surface in the DevTools Performance panel's User Timing track for local debugging without instrumenting the host.
  • Zero PostHog (or any other analytics SDK) dependency in core — the host decides where to forward the events.

Test plan

  • Unit tests cover the envelope shape, the performance.mark mirror, and the no-op path when no host has wired up the bridge.
  • Manual: verified marks appear in the User Timing track when scrubbing the studio preview.

Stack

Step X-1 of the player perf proposal. Foundation for the perf gate (P0-1a/b/c) — the perf scenarios in this stack instrument these same channels for CI measurement.

Extends the runtime analytics bridge with a numeric performance metric channel
for scrub latency, sustained fps, dropped frames, decoder count, composition
load time, and media sync drift. Metrics flow through the existing
postMessage transport (one bridge, two channels) so hosts can aggregate per
session (p50/p95) and forward to their observability pipeline.

Also writes performance.mark() with value+tags on detail so metrics surface
in the DevTools Performance panel's User Timing track for local debugging.

No PostHog dependency in core. Player-side aggregation and flush land in a
follow-up PR per the player-perf proposal.
Copy link
Copy Markdown
Collaborator Author

vanceingalls commented Apr 21, 2026

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean bridge. Additive, no breaking changes, everything downstream of postMessage is guarded. Test coverage is exactly what I'd want to see for a telemetry path — nulled transport, throwing transport, throwing performance.mark, zero/negative values, tag normalization, and a DevTools User Timing assertion when the environment supports it. Nice.

The double-guard (both performance.mark and postMessage wrapped in try/catch) is the right shape for runtime code that must never affect playback. One small non-blocking observation: the RuntimePerformanceTags type duplicates RuntimeAnalyticsProperties — may be worth collapsing to a shared alias in a followup if any third shape gets added, but carving them apart now is also fine since the semantics genuinely differ.

Approved.

Rames Jusso

@vanceingalls
Copy link
Copy Markdown
Collaborator Author

@jrusso1020 — thanks for the careful read. One observation, intentionally deferred:

the RuntimePerformanceTags type duplicates RuntimeAnalyticsProperties — may be worth collapsing to a shared alias in a followup

Agreed it's worth revisiting. The two shapes share keys today but the semantics genuinely diverge — RuntimeAnalyticsProperties carries event-shaped attributes (cardinality-bounded for analytics warehouses), while RuntimePerformanceTags carries measurement-shaped attributes (used as DevTools User Timing detail and as PerformanceObserver-friendly key/value pairs). Folding them into a shared alias today would force one shape to inherit the other's constraints. Tracked as follow-up; will collapse once a third consumer arrives and the common subset is forced.

Nothing else outstanding here.

@vanceingalls vanceingalls merged commit f9863ab into main Apr 22, 2026
25 checks passed
@vanceingalls vanceingalls mentioned this pull request Apr 22, 2026
3 tasks
vanceingalls added a commit that referenced this pull request Apr 23, 2026
## Summary

Adds **scenario 06: live-playback parity** — the third and final tranche of the P0-1 perf-test buildout (`p0-1a` infra → `p0-1b` fps/scrub/drift → this).

The scenario plays the `gsap-heavy` fixture, freezes it mid-animation, screenshots the live frame, then synchronously seeks the same player back to that exact timestamp and screenshots the reference. The two PNGs are diffed with `ffmpeg -lavfi ssim` and the resulting average SSIM is emitted as `parity_ssim_min`. Baseline gate: **SSIM ≥ 0.95**.

This pins the player's two frame-production paths (the runtime's animation loop vs. `_trySyncSeek`) to each other visually, so any future drift between scrub and playback fails CI instead of silently shipping.

## Motivation

`<hyperframes-player>` produces frames two different ways:

1. **Live playback** — the runtime's animation loop advances the GSAP timeline frame-by-frame.
2. **Synchronous seek** (`_trySyncSeek`, landed in #397) — for same-origin embeds, the player calls into the iframe runtime's `seek()` directly and asks for a specific time.

These paths must agree. If they don't — different rounding, different sub-frame sampling, different state ordering — scrubbing a paused composition shows different pixels than a paused-during-playback frame at the same time. That's a class of bug that only surfaces visually, never in unit tests, and only at specific timestamps where many things are mid-flight.

`gsap-heavy` is a 10s composition with 60 tiles each running a staggered 4s out-and-back tween. At t=5.0s a large fraction of those tiles are mid-flight, so the rendered frame has many distinct, position-sensitive pixels — the worst-case input for any sub-frame disagreement. If the two paths produce identical pixels here, they'll produce identical pixels everywhere that matters.

## What changed

- **`packages/player/tests/perf/scenarios/06-parity.ts`** — new scenario (~340 lines). Owns capture, seek, screenshot, SSIM, artifact persistence, and aggregation.
- **`packages/player/tests/perf/index.ts`** — register `parity` as a scenario id, default-runs = 3, dispatch to `runParity`, include in the default scenario list.
- **`packages/player/tests/perf/perf-gate.ts`** — extend `PerfBaseline` with `paritySsimMin`.
- **`packages/player/tests/perf/baseline.json`** — `paritySsimMin: 0.95`.
- **`.github/workflows/player-perf.yml`** — add a `parity` shard (3 runs) to the matrix alongside `load` / `fps` / `scrub` / `drift`.

## How the scenario works

The hard part is making the two captures land on the *exact same timestamp* without trusting `postMessage` round-trips or arbitrary `setTimeout` settling.

1. **Install an iframe-side rAF watcher** before issuing `play()`. The watcher polls `__player.getTime()` every animation frame and, the first time `getTime() >= 5.0`, calls `__player.pause()` *from inside the same rAF tick*. `pause()` is synchronous (it calls `timeline.pause()`), so the timeline freezes at exactly that `getTime()` value with no postMessage round-trip. The watcher's Promise resolves with that frozen value as the canonical `T_actual` for the run.
2. **Confirm `isPlaying() === true`** via `frame.waitForFunction` before awaiting the watcher. Without this, the test can hang if `play()` hasn't kicked the timeline yet.
3. **Wait for paint** — two `requestAnimationFrame` ticks on the host page. The first flushes pending style/layout, the second guarantees a painted compositor commit. Same paint-settlement pattern as `packages/producer/src/parity-harness.ts`.
4. **Screenshot the live frame** — `page.screenshot({ type: "png" })`.
5. **Synchronously seek to `T_actual`** — call `el.seek(capturedTime)` on the host page. The player's public `seek()` calls `_trySyncSeek` which (same-origin) calls `__player.seek()` synchronously, so no postMessage await is needed. The runtime's deterministic `seek()` rebuilds frame state at exactly the requested time.
6. **Wait for paint** again, screenshot the reference frame.
7. **Diff with ffmpeg** — `ffmpeg -hide_banner -i reference.png -i actual.png -lavfi ssim -f null -`. ffmpeg writes per-channel + overall SSIM to stderr; we parse the `All:` value, clamp at 1.0 (ffmpeg occasionally reports 1.000001 on identical inputs), and treat it as the run's score.
8. **Persist artifacts** under `tests/perf/results/parity/run-N/` (`actual.png`, `reference.png`, `captured-time.txt`) so CI can upload them and so a failed run is locally reproducible. Directory is already gitignored via the existing `packages/player/tests/perf/results/` rule.

### Aggregation

`min()` across runs, **not** mean. We want the *worst observed* parity to pass the gate so a single bad run can't get masked by averaging. Both per-run scores and the aggregate are logged.

### Output metric

| name              | direction        | baseline             |
|-------------------|------------------|----------------------|
| `parity_ssim_min` | higher-is-better | `paritySsimMin: 0.95` |

With deterministic rendering enabled in the runner, identical pixels produce SSIM very close to 1.0; the 0.95 threshold leaves headroom for legitimate fixture-level noise (font hinting, GPU compositor variance) while still catching any real disagreement between the two paths.

## Test plan

- `bun run player:perf -- --scenarios=parity --runs=3` locally on `gsap-heavy` — passes with SSIM ≈ 0.999 across all 3 runs.
- Inspected `results/parity/run-1/actual.png` and `reference.png` side-by-side — visually identical.
- Inspected `captured-time.txt` to confirm `T_actual` lands just past 5.0s (within one frame).
- Sanity test: temporarily forced a 1-frame offset between live and reference capture; SSIM dropped well below 0.95 as expected, confirming the threshold catches real drift.
- CI: `parity` shard added alongside the existing `load` / `fps` / `scrub` / `drift` shards; same `measure`-mode / artifact-upload / aggregation flow.
- `bunx oxlint` and `bunx oxfmt --check` clean on the new scenario.

## Stack

This is the top of the perf stack:

1. #393 `perf/x-1-emit-performance-metric` — performance.measure() emission
2. #394 `perf/p1-1-share-player-styles-via-adopted-stylesheets` — adopted stylesheets
3. #395 `perf/p1-2-scope-media-mutation-observer` — scoped MutationObserver
4. #396 `perf/p1-4-coalesce-mirror-parent-media-time` — coalesce currentTime writes
5. #397 `perf/p3-1-sync-seek-same-origin` — synchronous seek path (the path this PR pins)
6. #398 `perf/p3-2-srcdoc-composition-switching` — srcdoc switching
7. #399 `perf/p0-1a-perf-test-infra` — server, runner, perf-gate, CI
8. #400 `perf/p0-1b-perf-tests-for-fps-scrub-drift` — fps / scrub / drift scenarios
9. **#401 `perf/p0-1c-live-playback-parity-test` ← you are here**

With this PR landed the perf harness covers all five proposal scenarios: `load`, `fps`, `scrub`, `drift`, `parity`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants