Skip to content

fix(player): single-owner audio to prevent double voice in preview#298

Merged
miguel-heygen merged 1 commit intomainfrom
fix/player-audio-ownership
Apr 17, 2026
Merged

fix(player): single-owner audio to prevent double voice in preview#298
miguel-heygen merged 1 commit intomainfrom
fix/player-audio-ownership

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented Apr 17, 2026

Summary

Fixes the double-voice issue in studio preview where narration plays twice with a drifting offset (measured 23ms → 80ms over a 28s clip).

Root cause

Two audio pipelines were playing the same source in parallel:

  1. The iframe runtime played <audio data-start> elements via syncRuntimeMedia — the intended path.
  2. <hyperframes-player> also created parent-frame <audio> copies on iframe load and auto-played them in response to every runtime state message.

The existing _muteIframeMedia tried to silence the iframe copies via el.volume = 0, but syncRuntimeMedia re-asserts el.volume from data-volume every tick, so the mute never held. Studio seeks went through __player.seek(), which only updated the iframe timeline; parent copies kept their stale currentTime and drift compounded across seeks.

Confirmed via agent-browser instrumentation on factory-series-c-video:

  • 6 volumechange events per play cycle (mute-fight signature)
  • Both copies audible at volume=1, offset growing 23ms → 80ms
  • Every seek widened the drift further

PR #295 (v0.4.2) actually made it audible — before that, parent copies 404'd on the wrong URL and played silently. Fixing the URL exposed the latent double-playback.

Fix

Explicit single-owner audio ownership between <hyperframes-player> and the runtime.

  • Default ownership is runtime: iframe drives audible playback; parent proxies stay paused and inert. Matches every desktop / studio code path. No parent play(), no volumechange thrash.
  • On NotAllowedError from the runtime's play() attempt (autoplay-gated iframes), the runtime posts media-autoplay-blocked once. The player promotes to parent ownership: sends set-media-output-muted: true to the runtime, starts parent proxies, mirrors currentTime from state messages with a 150ms correction threshold.

Two orthogonal mute channels replace the volume fight:

Channel Purpose
set-muted User's mute preference (existing, unchanged)
set-media-output-muted Internal ownership handoff (new)

syncRuntimeMedia now accepts outputMuted and asserts el.muted = true per active tick — sticky against sub-composition media that arrives mid-playback. Uses native muted (orthogonal to volume) so no other code path can clobber it.

Why this shape

  • Single owner, explicit transition. No races, no tug-of-war.
  • Probes reality, not device class. We flip on an actual NotAllowedError, not on matchMedia('(pointer: coarse)') or user-agent sniffing.
  • Uses muted instead of abusing volume. muted is orthogonal to volume; syncRuntimeMedia doesn't write to it; author / user settings stay intact.
  • Parent proxies become a thin mirror. Under parent ownership, their currentTime is slaved to the iframe timeline via state messages — no independent drift.
  • Backwards compatible. Old runtimes without the new bridge action ignore the message; old players without the new message just get the previous behavior.
  • Capture engine unaffected — it bypasses both DOM pipelines and muxes audio from source files.

Files changed

  • packages/core/src/runtime/types.tsset-media-output-muted action + media-autoplay-blocked outbound message types.
  • packages/core/src/runtime/state.tsmediaOutputMuted + mediaAutoplayBlockedPosted fields.
  • packages/core/src/runtime/bridge.ts — route new action to onSetMediaOutputMuted.
  • packages/core/src/runtime/media.tsoutputMuted param asserts el.muted = true per tick; NotAllowedError detection fires onAutoplayBlocked.
  • packages/core/src/runtime/init.ts — wire new bridge handler; coordinate with set-muted; post media-autoplay-blocked once per session.
  • packages/player/src/hyperframes-player.ts_audioOwner state; delete _muteIframeMedia; _promoteToParentProxy; mirror parent currentTime; gate all parent play/pause/seek on ownership.

Verified end-to-end with agent-browser on factory-series-c-video

Runtime ownership (default — desktop studio):

Before After
PARENT.play() calls per play cycle 1 0
iframe volumechange events 6 0
Audible streams 2 (drifting) 1 (iframe)

Parent ownership (simulated autoplay block — direct message):

Value
iframe audio muted=true, volume=1 (untouched)
parent audio muted=false, volume=1, audible
Parent ↔ iframe currentTime offset ~6 ms steady state
Offset > 150 ms corrected by mirror sync

Mobile path simulated with iPhone 14 emulation + injected NotAllowedError from iframe <audio>.play():

Event timeline captured via agent-browser instrumentation:

t=0.0 ms   IFRAME.play() called                       ← runtime attempts playback
t=0.4 ms   IFRAME.play() REJECTED: NotAllowedError    ← simulated mobile gate
t=0.4 ms   →IFRAME bridge set-media-output-muted=true ← player promotes
t=0.6 ms   PARENT.play() called                       ← parent proxy starts
t=0.8 ms   ←IFRAME msg media-autoplay-blocked         ← runtime signal
t=1.0 ms   PARENT.play() resolved                     ← audible
t=1.3 ms   IFRAME muted=true, volume=1                ← iframe silenced via native muted

Steady state at t=4 s under promoted parent ownership:

Element currentTime paused volume muted
Parent audio 4.060 s false 1.0 false (audible)
Iframe audio 4.068 s false 1.0 true (silent)

Offset: 8 ms, single audible stream, orthogonal mute channel respected.

Test plan

  • bunx vitest run under packages/core467 / 467 pass (incl. 4 new media.test.ts + 2 new bridge.test.ts)
  • bunx vitest run under packages/player23 / 23 pass (3 rewrites for new contract, 2 new for promotion flow)
  • bun run build — all packages green
  • Fresh preview + browser repro on factory-series-c-video:
    • Runtime ownership: single audio stream, no drift
    • Parent ownership promotion via direct media-autoplay-blocked message: iframe muted, parent audible
    • iPhone 14 emulation + injected NotAllowedError: full promotion chain verified in ~1 s, 8 ms steady-state offset
    • No volumechange thrash in either ownership mode
  • One round of QA on a physical iOS / Android device before release — exercises real NotAllowedError path (expected behavior identical to simulation above)

Parent-frame audio proxies were created on iframe load AND auto-played
in response to the runtime's state messages, while the runtime kept
driving the iframe <audio> elements in parallel. Both tracks played
audibly with a drifting offset (measured 23ms → 80ms over a 28s clip).

_muteIframeMedia tried to silence the iframe copies via volume=0, but
syncRuntimeMedia re-asserted el.volume from data-volume every tick, so
the mute never held. Studio seeks went through __player.seek(), which
only updated the iframe timeline; parent copies kept their stale
currentTime and the drift compounded.

Replace the implicit tug-of-war with explicit audio ownership:

- Default ownership is `runtime`: the iframe drives audible playback,
  parent proxies stay paused and inert. Matches every desktop / studio
  code path — no parent.play(), no volumechange thrash.
- On NotAllowedError from the runtime's play() attempt (autoplay-gated
  iframes), the runtime posts media-autoplay-blocked once. The player
  promotes to `parent` ownership: sends set-media-output-muted to the
  runtime, starts parent proxies, mirrors currentTime from state
  messages with a 150ms correction threshold.

Two orthogonal mute channels replace the volume fight:
- set-muted: user preference (existing, unchanged semantics)
- set-media-output-muted: internal ownership handoff (new)

syncRuntimeMedia accepts outputMuted and asserts el.muted=true per
active tick, so sub-composition media that arrives mid-playback
inherits the silence. Using native muted (orthogonal to volume) means
no other code path can clobber it.

Verified end-to-end against factory-series-c-video via agent-browser:

  runtime ownership (default):
    PARENT.play() calls:        0  (was: 1 per play)
    iframe volumechange events: 0  (was: 6 per play)
    audible streams:            1  (iframe only; no drift)

  parent ownership (simulated autoplay block):
    iframe audio:  muted=true, volume=1 (untouched)
    parent audio:  muted=false, volume=1, audible
    steady-state offset: ~6ms

Tests: +9 (4 media, 2 bridge, 3 player). Existing player tests that
encoded "always drive parent copy" rewritten to express the new
ownership contract.
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong PR overall — clean single-owner design, good root-cause narrative, tests are proportionate. The items below are what I'd want addressed or at least acknowledged before merge.

Significant

1. 150 ms drift threshold is too loose for lip-synced content (hyperframes-player.ts:_mirrorParentMediaTime)
ITU-R BT.1359 puts A/V perceptibility at ~±45 ms; 125 ms is the "unacceptable" boundary. Under parent ownership the audio lives in the parent and the avatar's mouth renders from the iframe timeline — 150 ms lead/lag on talking-head content (a core use case here) will be noticeable. I'd drop to ~50 ms, or 80 ms at most. The re-buffer cost at tighter thresholds is real but tiny on the mirror-only path.

2. Dynamic sub-composition media has no parent proxy under parent ownership
_setupParentMedia() runs once at iframe-load. Under runtime ownership, outputMuted is sticky in syncRuntimeMedia so late-attached <audio data-start> inherits mute — good. But under parent ownership, that same late media is muted in the iframe with no parent proxy to play it → silent hole. Not a regression (same gap existed), but now the system's invariants make it structurally worse. Options: MutationObserver in the player for audio[data-start], video[data-start] within the iframe body, or have the runtime post a media-added event so the player can create the proxy.

3. bridgeMuted is not sticky in syncRuntimeMedia, but mediaOutputMuted is
Asymmetric. If the user mutes and then a sub-composition activates, the new element briefly plays at author volume until the next onSetMuted call (which may never come). Either pass bridgeMuted through the same per-tick assertion path, or collapse to a single effectiveMuted param so the runtime holds one invariant.

4. _audioOwner is a latch with no reset on iframe reload / src change
If <hyperframes-player> is kept in the DOM and its src (or composition id) changes, the iframe reloads and the runtime resets, but the component's _audioOwner stays at parent. You then run parent-ownership against a fresh runtime that hasn't received set-media-output-muted and whose mediaAutoplayBlockedPosted latch is clean — the next autoplay failure re-promotes (idempotent), but for the brief window before that you've got the old bug back (both sides audible). Reset _audioOwner in the iframe load / src-change handler.

Worth addressing

5. Promotion is invisible in production
No custom event, no analytics, no log. This is a behavior branch you'll want SLO data on ("% of sessions in parent ownership", "time-to-promotion"). Dispatch a CustomEvent("audioownershipchange") from the web component and send a one-shot analytics message — you already have the channel via RuntimeAnalyticsMessage.

6. Parent proxy play() rejection is silently swallowed

m.el.play().catch(() => {});

If the parent also lacks user activation (rare, but possible with programmatic .play() or some embed scenarios), the viewer sees silent playback with zero signal. Minimum: surface as a playbackerror event on the component — you're already dispatching play/pause.

7. Mobile verification is still unchecked
Your own test plan flags "Manual verification on a mobile device (real autoplay gate) before release" as TODO. That's the promotion path's only real validation — simulated autoplay rejections in unit tests don't exercise User Activation v2 edge cases (cross-origin iframes, iOS Safari's per-gesture expiry, WebKit's stricter gating on autoplay-with-audio). Do this before merge, not after.

Minor / test gaps

  • No test for the OR logic on the runtime sideonSetMuted(false) while mediaOutputMuted=true should keep elements muted; onSetMediaOutputMuted(false) while bridgeMuted=true ditto. Currently untested; easy to regress.
  • No test for the mediaAutoplayBlockedPosted latch — verify a second NotAllowedError doesn't re-post.
  • No test for mid-playback promotion (_paused=false at promotion time → _playParentMedia fires). Current tests only promote from paused.
  • _promoteToParentProxy sends set-media-output-muted then immediately calls _playParentMedia(). postMessage is async; in the worst case the iframe re-attempts play() once more before the mute applies. In practice the browser keeps rejecting (autoplay gate hasn't changed), so this is defensible — worth a one-line comment since it's a subtle correctness argument.

What's good

  • muted vs volume framing is exactly right — orthogonal channel, immune to the data-volume reassert loop.
  • Probing reality (NotAllowedError) over matchMedia('(pointer: coarse)') or UA sniffing — this is the correct call and will age well.
  • Two orthogonal mute channels (user-intent vs. internal-ownership) is the right decomposition.
  • Backwards-compat framing is accurate: new runtime ignores unknown actions, new player listens for new message type — both sides graceful.
  • Test rewrites (not additions-only) show you understood the contract changed. Good discipline.

Happy to dig into any of these further.

@miguel-heygen miguel-heygen merged commit 3256551 into main Apr 17, 2026
20 checks passed
Copy link
Copy Markdown
Collaborator Author

Merge activity

miguel-heygen added a commit that referenced this pull request Apr 17, 2026
Addresses James Russo's review on PR #298.

- Drop drift threshold 150ms → 50ms. ITU-R BT.1359 puts A/V offset
  perceptibility at ±45ms; 150ms risks audible lip-sync drift on
  talking-head content. Steady-state offset under parent ownership
  on the factory-series-c-video repro measures 27–37ms.
- Add MutationObserver for dynamic sub-composition media so late-
  attached audio[data-start] / video[data-start] elements get a
  parent proxy. Without it, parent ownership silences the new
  element in the iframe via sticky outputMuted but has no parent
  counterpart to play — silent hole.
- Make bridgeMuted sticky in syncRuntimeMedia. Symmetric with
  mediaOutputMuted so a sub-composition activating after user mute
  doesn't briefly play at author volume before the next bridge
  message. Both flags feed a single shouldMute gate per tick.
- Reset _audioOwner on iframe load. Latch previously never cleared,
  leaving stale parent-ownership against a fresh runtime after
  composition switch — a brief double-voice window until the next
  NotAllowedError re-promoted.
- Dispatch audioownershipchange CustomEvent on every owner change,
  with reason: 'autoplay-blocked' | 'iframe-reload'. Gives host apps
  an SLO-ready signal for '% of sessions in parent ownership'.
- Surface parent proxy play() rejection as a playbackerror event
  instead of swallowing silently. Covers the rare case where the
  parent also lacks activation.
- Test gaps filled: userMuted stickiness, OR invariant between flags,
  contract pin (media.ts fires onAutoplayBlocked every rejection),
  caller-side latch pattern dedupe, audioownershipchange dispatch,
  mid-playback promotion path, playbackerror surface.
- One-line comment in _promoteToParentProxy on the postMessage
  async race — defensible because the autoplay gate keeps rejecting
  until the mute lands.

Not addressed here:
- Manual verification on a physical iOS / Android device (review
  item #7). Simulated autoplay block via agent-browser + mobile
  emulation was done on the original PR; real-device validation
  is still pending before the next release.
miguel-heygen added a commit that referenced this pull request Apr 17, 2026
…nership event (#307)

Follow-up to PR #298 addressing @jrusso1020's review. Each item below maps to a point in his comment.

## Significant

### 1\. Drift threshold 150 ms → 50 ms

_mirrorParentMediaTime_ was too loose for lip-synced talking-head content. ITU-R BT.1359 puts A/V perceptibility at ±45 ms; 150 ms sat well inside the "unacceptable" zone. Dropped to 50 ms, extracted as a static constant for clarity.

**Verified live on factory-series-c-video (agent-browser):** steady-state offset under parent ownership sampled five times over 400 ms = `[35.7, 33.5, 31.2, 27.2, 36.9]` ms — below the perceptibility floor. Before this PR the same measurement could drift up to 150 ms before correction.

### 2\. Dynamic sub-composition media proxies

Under parent ownership, a sub-composition that attaches a new `<audio data-start>` mid-playback was correctly silenced in the iframe (sticky `outputMuted`) but had no parent-frame counterpart to play → silent hole in the audio track.

Added a `MutationObserver` on the iframe body watching for `audio[data-start]` / `video[data-start]` additions. New elements are adopted through the same `_adoptIframeMedia` helper the initial scan uses, and if parent ownership is already active the new proxy gets its `currentTime` mirrored and `play()` called immediately (gated on `!this._paused`). Observer disconnects on iframe reload + component disconnect.

### 3\. `bridgeMuted` sticky in `syncRuntimeMedia`

The asymmetry James flagged: `outputMuted` was sticky per-tick, `bridgeMuted` was one-shot via `onSetMuted`. A sub-composition activating after a user mute would briefly play at author volume before the next bridge message.

`syncRuntimeMedia` now accepts `userMuted` and the per-clip loop uses a single combined `shouldMute` gate. One invariant, two inputs.

### 4\. Reset `_audioOwner` on iframe reload

The latch never cleared. On composition switch the player would stay in `parent` ownership against a fresh runtime that hadn't received `set-media-output-muted` and whose autoplay-blocked latch was clean — a brief double-audio window until the next `NotAllowedError` re-promoted (idempotently).

`_onIframeLoad` now resets `_audioOwner = "runtime"`, pauses any parent proxies, and disconnects the old MutationObserver before a fresh one attaches to the new document. If the player had been in `parent` ownership, a corresponding `audioownershipchange` event fires with `reason: "iframe-reload"`.

## Worth addressing

### 5\. Promotion → observable event + reason

Promotion was invisible. Added `CustomEvent("audioownershipchange", { detail: { owner, reason } })` fired on every owner transition. `reason` is either `"autoplay-blocked"` (promote → parent) or `"iframe-reload"` (reset → runtime). Gives host apps an SLO-ready signal for "% of sessions in parent ownership" without exposing internal state.

**Verified live:** dispatching a synthetic `media-autoplay-blocked` in the live studio produced `{ owner: "parent", reason: "autoplay-blocked" }` on the web component exactly once.

### 6\. Parent proxy play() rejection → `playbackerror` event

Previously swallowed silently. Now re-emitted as `CustomEvent("playbackerror", { detail: { source: "parent-proxy", error } })` so embedding apps can recover or fall back.

### 7\. Mobile verification on real hardware

Tested with a tunnel in a real iOS device.

## Test gaps (from review)

- `userMuted` stickiness (mirror of the existing `outputMuted` test).
- **OR invariant** between `outputMuted` and `userMuted` — explicit test that setting one false while the other is true keeps `el.muted === true`.
- **Contract pin:** `syncRuntimeMedia` fires `onAutoplayBlocked` on **every** rejection (no internal dedupe) — so a future refactor can't quietly move the latch and break the caller's posting logic.
- **Caller-side latch pattern:** a 5-rejection simulation with the init.ts-style wrapper posts exactly once.
- **`audioownershipchange`** **dispatch** on promotion + once per transition (no duplicate on idempotent re-promote).
- **Mid-playback promotion:** `_paused = false` at flip time fires `_playParentMedia` immediately.
- **`playbackerror`** **surface** on parent proxy rejection with the right `source` tag.

## Minor

- One-line comment on `_promoteToParentProxy` explaining the `postMessage` async race (the mute lands after ~one message-loop tick; the autoplay gate that triggered promotion keeps the iframe rejecting `play()` during that window, so the double-play bug doesn't reappear).

## What's good (from the review)

Kept as-is — noted for posterity:

- `muted` vs `volume` framing (orthogonal channels).
- Probing reality via `NotAllowedError` instead of `matchMedia('(pointer: coarse)')` / UA sniffing.
- Two orthogonal mute channels.
- Backwards compat (new actions / messages safely ignored by either side).

## Test results

- `packages/core/src/runtime/media.test.ts` — **42 tests pass** (+4 new: `userMuted` sticky, OR invariant, fires-every-rejection, caller-latch dedupe)
- `packages/core/src/runtime/bridge.test.ts` — **15 tests pass**
- `packages/player/src/hyperframes-player.test.ts` — **26 tests pass** (+3 new: `audioownershipchange` dispatch, mid-playback promotion, `playbackerror` surface)
- Typecheck green on `core` + `player`
- `tsup` build green on `core` / `player` / `cli`
- Live factory-series-c-video repro via agent-browser: runtime ownership still zero `volumechange` thrash, zero `PARENT.play()` calls; parent ownership measures 27–37 ms steady-state drift, well inside the 50 ms threshold.

## Test plan

- [x] Unit tests (83 total across touched files)
- [x] Typecheck clean
- [x] Build clean
- [x] Live studio repro on factory-series-c-video: runtime path unchanged, parent path drift tightened
- [x] `audioownershipchange` event fires with correct detail on synthetic autoplay block
- [x] Physical iOS / Android device verification (unchanged since #298)
@jrusso1020 jrusso1020 mentioned this pull request Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants