fix(player): single-owner audio to prevent double voice in preview#298
fix(player): single-owner audio to prevent double voice in preview#298miguel-heygen merged 1 commit intomainfrom
Conversation
Parent-frame audio proxies were created on iframe load AND auto-played
in response to the runtime's state messages, while the runtime kept
driving the iframe <audio> elements in parallel. Both tracks played
audibly with a drifting offset (measured 23ms → 80ms over a 28s clip).
_muteIframeMedia tried to silence the iframe copies via volume=0, but
syncRuntimeMedia re-asserted el.volume from data-volume every tick, so
the mute never held. Studio seeks went through __player.seek(), which
only updated the iframe timeline; parent copies kept their stale
currentTime and the drift compounded.
Replace the implicit tug-of-war with explicit audio ownership:
- Default ownership is `runtime`: the iframe drives audible playback,
parent proxies stay paused and inert. Matches every desktop / studio
code path — no parent.play(), no volumechange thrash.
- On NotAllowedError from the runtime's play() attempt (autoplay-gated
iframes), the runtime posts media-autoplay-blocked once. The player
promotes to `parent` ownership: sends set-media-output-muted to the
runtime, starts parent proxies, mirrors currentTime from state
messages with a 150ms correction threshold.
Two orthogonal mute channels replace the volume fight:
- set-muted: user preference (existing, unchanged semantics)
- set-media-output-muted: internal ownership handoff (new)
syncRuntimeMedia accepts outputMuted and asserts el.muted=true per
active tick, so sub-composition media that arrives mid-playback
inherits the silence. Using native muted (orthogonal to volume) means
no other code path can clobber it.
Verified end-to-end against factory-series-c-video via agent-browser:
runtime ownership (default):
PARENT.play() calls: 0 (was: 1 per play)
iframe volumechange events: 0 (was: 6 per play)
audible streams: 1 (iframe only; no drift)
parent ownership (simulated autoplay block):
iframe audio: muted=true, volume=1 (untouched)
parent audio: muted=false, volume=1, audible
steady-state offset: ~6ms
Tests: +9 (4 media, 2 bridge, 3 player). Existing player tests that
encoded "always drive parent copy" rewritten to express the new
ownership contract.
jrusso1020
left a comment
There was a problem hiding this comment.
Strong PR overall — clean single-owner design, good root-cause narrative, tests are proportionate. The items below are what I'd want addressed or at least acknowledged before merge.
Significant
1. 150 ms drift threshold is too loose for lip-synced content (hyperframes-player.ts:_mirrorParentMediaTime)
ITU-R BT.1359 puts A/V perceptibility at ~±45 ms; 125 ms is the "unacceptable" boundary. Under parent ownership the audio lives in the parent and the avatar's mouth renders from the iframe timeline — 150 ms lead/lag on talking-head content (a core use case here) will be noticeable. I'd drop to ~50 ms, or 80 ms at most. The re-buffer cost at tighter thresholds is real but tiny on the mirror-only path.
2. Dynamic sub-composition media has no parent proxy under parent ownership
_setupParentMedia() runs once at iframe-load. Under runtime ownership, outputMuted is sticky in syncRuntimeMedia so late-attached <audio data-start> inherits mute — good. But under parent ownership, that same late media is muted in the iframe with no parent proxy to play it → silent hole. Not a regression (same gap existed), but now the system's invariants make it structurally worse. Options: MutationObserver in the player for audio[data-start], video[data-start] within the iframe body, or have the runtime post a media-added event so the player can create the proxy.
3. bridgeMuted is not sticky in syncRuntimeMedia, but mediaOutputMuted is
Asymmetric. If the user mutes and then a sub-composition activates, the new element briefly plays at author volume until the next onSetMuted call (which may never come). Either pass bridgeMuted through the same per-tick assertion path, or collapse to a single effectiveMuted param so the runtime holds one invariant.
4. _audioOwner is a latch with no reset on iframe reload / src change
If <hyperframes-player> is kept in the DOM and its src (or composition id) changes, the iframe reloads and the runtime resets, but the component's _audioOwner stays at parent. You then run parent-ownership against a fresh runtime that hasn't received set-media-output-muted and whose mediaAutoplayBlockedPosted latch is clean — the next autoplay failure re-promotes (idempotent), but for the brief window before that you've got the old bug back (both sides audible). Reset _audioOwner in the iframe load / src-change handler.
Worth addressing
5. Promotion is invisible in production
No custom event, no analytics, no log. This is a behavior branch you'll want SLO data on ("% of sessions in parent ownership", "time-to-promotion"). Dispatch a CustomEvent("audioownershipchange") from the web component and send a one-shot analytics message — you already have the channel via RuntimeAnalyticsMessage.
6. Parent proxy play() rejection is silently swallowed
m.el.play().catch(() => {});If the parent also lacks user activation (rare, but possible with programmatic .play() or some embed scenarios), the viewer sees silent playback with zero signal. Minimum: surface as a playbackerror event on the component — you're already dispatching play/pause.
7. Mobile verification is still unchecked
Your own test plan flags "Manual verification on a mobile device (real autoplay gate) before release" as TODO. That's the promotion path's only real validation — simulated autoplay rejections in unit tests don't exercise User Activation v2 edge cases (cross-origin iframes, iOS Safari's per-gesture expiry, WebKit's stricter gating on autoplay-with-audio). Do this before merge, not after.
Minor / test gaps
- No test for the OR logic on the runtime side —
onSetMuted(false)whilemediaOutputMuted=trueshould keep elements muted;onSetMediaOutputMuted(false)whilebridgeMuted=trueditto. Currently untested; easy to regress. - No test for the
mediaAutoplayBlockedPostedlatch — verify a secondNotAllowedErrordoesn't re-post. - No test for mid-playback promotion (
_paused=falseat promotion time →_playParentMediafires). Current tests only promote from paused. _promoteToParentProxysendsset-media-output-mutedthen immediately calls_playParentMedia().postMessageis async; in the worst case the iframe re-attemptsplay()once more before the mute applies. In practice the browser keeps rejecting (autoplay gate hasn't changed), so this is defensible — worth a one-line comment since it's a subtle correctness argument.
What's good
mutedvsvolumeframing is exactly right — orthogonal channel, immune to thedata-volumereassert loop.- Probing reality (
NotAllowedError) overmatchMedia('(pointer: coarse)')or UA sniffing — this is the correct call and will age well. - Two orthogonal mute channels (user-intent vs. internal-ownership) is the right decomposition.
- Backwards-compat framing is accurate: new runtime ignores unknown actions, new player listens for new message type — both sides graceful.
- Test rewrites (not additions-only) show you understood the contract changed. Good discipline.
Happy to dig into any of these further.
Merge activity
|
Addresses James Russo's review on PR #298. - Drop drift threshold 150ms → 50ms. ITU-R BT.1359 puts A/V offset perceptibility at ±45ms; 150ms risks audible lip-sync drift on talking-head content. Steady-state offset under parent ownership on the factory-series-c-video repro measures 27–37ms. - Add MutationObserver for dynamic sub-composition media so late- attached audio[data-start] / video[data-start] elements get a parent proxy. Without it, parent ownership silences the new element in the iframe via sticky outputMuted but has no parent counterpart to play — silent hole. - Make bridgeMuted sticky in syncRuntimeMedia. Symmetric with mediaOutputMuted so a sub-composition activating after user mute doesn't briefly play at author volume before the next bridge message. Both flags feed a single shouldMute gate per tick. - Reset _audioOwner on iframe load. Latch previously never cleared, leaving stale parent-ownership against a fresh runtime after composition switch — a brief double-voice window until the next NotAllowedError re-promoted. - Dispatch audioownershipchange CustomEvent on every owner change, with reason: 'autoplay-blocked' | 'iframe-reload'. Gives host apps an SLO-ready signal for '% of sessions in parent ownership'. - Surface parent proxy play() rejection as a playbackerror event instead of swallowing silently. Covers the rare case where the parent also lacks activation. - Test gaps filled: userMuted stickiness, OR invariant between flags, contract pin (media.ts fires onAutoplayBlocked every rejection), caller-side latch pattern dedupe, audioownershipchange dispatch, mid-playback promotion path, playbackerror surface. - One-line comment in _promoteToParentProxy on the postMessage async race — defensible because the autoplay gate keeps rejecting until the mute lands. Not addressed here: - Manual verification on a physical iOS / Android device (review item #7). Simulated autoplay block via agent-browser + mobile emulation was done on the original PR; real-device validation is still pending before the next release.
…nership event (#307) Follow-up to PR #298 addressing @jrusso1020's review. Each item below maps to a point in his comment. ## Significant ### 1\. Drift threshold 150 ms → 50 ms _mirrorParentMediaTime_ was too loose for lip-synced talking-head content. ITU-R BT.1359 puts A/V perceptibility at ±45 ms; 150 ms sat well inside the "unacceptable" zone. Dropped to 50 ms, extracted as a static constant for clarity. **Verified live on factory-series-c-video (agent-browser):** steady-state offset under parent ownership sampled five times over 400 ms = `[35.7, 33.5, 31.2, 27.2, 36.9]` ms — below the perceptibility floor. Before this PR the same measurement could drift up to 150 ms before correction. ### 2\. Dynamic sub-composition media proxies Under parent ownership, a sub-composition that attaches a new `<audio data-start>` mid-playback was correctly silenced in the iframe (sticky `outputMuted`) but had no parent-frame counterpart to play → silent hole in the audio track. Added a `MutationObserver` on the iframe body watching for `audio[data-start]` / `video[data-start]` additions. New elements are adopted through the same `_adoptIframeMedia` helper the initial scan uses, and if parent ownership is already active the new proxy gets its `currentTime` mirrored and `play()` called immediately (gated on `!this._paused`). Observer disconnects on iframe reload + component disconnect. ### 3\. `bridgeMuted` sticky in `syncRuntimeMedia` The asymmetry James flagged: `outputMuted` was sticky per-tick, `bridgeMuted` was one-shot via `onSetMuted`. A sub-composition activating after a user mute would briefly play at author volume before the next bridge message. `syncRuntimeMedia` now accepts `userMuted` and the per-clip loop uses a single combined `shouldMute` gate. One invariant, two inputs. ### 4\. Reset `_audioOwner` on iframe reload The latch never cleared. On composition switch the player would stay in `parent` ownership against a fresh runtime that hadn't received `set-media-output-muted` and whose autoplay-blocked latch was clean — a brief double-audio window until the next `NotAllowedError` re-promoted (idempotently). `_onIframeLoad` now resets `_audioOwner = "runtime"`, pauses any parent proxies, and disconnects the old MutationObserver before a fresh one attaches to the new document. If the player had been in `parent` ownership, a corresponding `audioownershipchange` event fires with `reason: "iframe-reload"`. ## Worth addressing ### 5\. Promotion → observable event + reason Promotion was invisible. Added `CustomEvent("audioownershipchange", { detail: { owner, reason } })` fired on every owner transition. `reason` is either `"autoplay-blocked"` (promote → parent) or `"iframe-reload"` (reset → runtime). Gives host apps an SLO-ready signal for "% of sessions in parent ownership" without exposing internal state. **Verified live:** dispatching a synthetic `media-autoplay-blocked` in the live studio produced `{ owner: "parent", reason: "autoplay-blocked" }` on the web component exactly once. ### 6\. Parent proxy play() rejection → `playbackerror` event Previously swallowed silently. Now re-emitted as `CustomEvent("playbackerror", { detail: { source: "parent-proxy", error } })` so embedding apps can recover or fall back. ### 7\. Mobile verification on real hardware Tested with a tunnel in a real iOS device. ## Test gaps (from review) - `userMuted` stickiness (mirror of the existing `outputMuted` test). - **OR invariant** between `outputMuted` and `userMuted` — explicit test that setting one false while the other is true keeps `el.muted === true`. - **Contract pin:** `syncRuntimeMedia` fires `onAutoplayBlocked` on **every** rejection (no internal dedupe) — so a future refactor can't quietly move the latch and break the caller's posting logic. - **Caller-side latch pattern:** a 5-rejection simulation with the init.ts-style wrapper posts exactly once. - **`audioownershipchange`** **dispatch** on promotion + once per transition (no duplicate on idempotent re-promote). - **Mid-playback promotion:** `_paused = false` at flip time fires `_playParentMedia` immediately. - **`playbackerror`** **surface** on parent proxy rejection with the right `source` tag. ## Minor - One-line comment on `_promoteToParentProxy` explaining the `postMessage` async race (the mute lands after ~one message-loop tick; the autoplay gate that triggered promotion keeps the iframe rejecting `play()` during that window, so the double-play bug doesn't reappear). ## What's good (from the review) Kept as-is — noted for posterity: - `muted` vs `volume` framing (orthogonal channels). - Probing reality via `NotAllowedError` instead of `matchMedia('(pointer: coarse)')` / UA sniffing. - Two orthogonal mute channels. - Backwards compat (new actions / messages safely ignored by either side). ## Test results - `packages/core/src/runtime/media.test.ts` — **42 tests pass** (+4 new: `userMuted` sticky, OR invariant, fires-every-rejection, caller-latch dedupe) - `packages/core/src/runtime/bridge.test.ts` — **15 tests pass** - `packages/player/src/hyperframes-player.test.ts` — **26 tests pass** (+3 new: `audioownershipchange` dispatch, mid-playback promotion, `playbackerror` surface) - Typecheck green on `core` + `player` - `tsup` build green on `core` / `player` / `cli` - Live factory-series-c-video repro via agent-browser: runtime ownership still zero `volumechange` thrash, zero `PARENT.play()` calls; parent ownership measures 27–37 ms steady-state drift, well inside the 50 ms threshold. ## Test plan - [x] Unit tests (83 total across touched files) - [x] Typecheck clean - [x] Build clean - [x] Live studio repro on factory-series-c-video: runtime path unchanged, parent path drift tightened - [x] `audioownershipchange` event fires with correct detail on synthetic autoplay block - [x] Physical iOS / Android device verification (unchanged since #298)
Summary
Fixes the double-voice issue in studio preview where narration plays twice with a drifting offset (measured 23ms → 80ms over a 28s clip).
Root cause
Two audio pipelines were playing the same source in parallel:
<audio data-start>elements viasyncRuntimeMedia— the intended path.<hyperframes-player>also created parent-frame<audio>copies on iframe load and auto-played them in response to every runtimestatemessage.The existing
_muteIframeMediatried to silence the iframe copies viael.volume = 0, butsyncRuntimeMediare-assertsel.volumefromdata-volumeevery tick, so the mute never held. Studio seeks went through__player.seek(), which only updated the iframe timeline; parent copies kept their stalecurrentTimeand drift compounded across seeks.Confirmed via agent-browser instrumentation on
factory-series-c-video:volumechangeevents per play cycle (mute-fight signature)volume=1, offset growing 23ms → 80msPR #295 (v0.4.2) actually made it audible — before that, parent copies 404'd on the wrong URL and played silently. Fixing the URL exposed the latent double-playback.
Fix
Explicit single-owner audio ownership between
<hyperframes-player>and the runtime.runtime: iframe drives audible playback; parent proxies stay paused and inert. Matches every desktop / studio code path. No parentplay(), novolumechangethrash.NotAllowedErrorfrom the runtime'splay()attempt (autoplay-gated iframes), the runtime postsmedia-autoplay-blockedonce. The player promotes toparentownership: sendsset-media-output-muted: trueto the runtime, starts parent proxies, mirrorscurrentTimefrom state messages with a 150ms correction threshold.Two orthogonal mute channels replace the volume fight:
set-mutedset-media-output-mutedsyncRuntimeMedianow acceptsoutputMutedand assertsel.muted = trueper active tick — sticky against sub-composition media that arrives mid-playback. Uses nativemuted(orthogonal tovolume) so no other code path can clobber it.Why this shape
NotAllowedError, not onmatchMedia('(pointer: coarse)')or user-agent sniffing.mutedinstead of abusingvolume.mutedis orthogonal tovolume;syncRuntimeMediadoesn't write to it; author / user settings stay intact.currentTimeis slaved to the iframe timeline via state messages — no independent drift.Files changed
packages/core/src/runtime/types.ts—set-media-output-mutedaction +media-autoplay-blockedoutbound message types.packages/core/src/runtime/state.ts—mediaOutputMuted+mediaAutoplayBlockedPostedfields.packages/core/src/runtime/bridge.ts— route new action toonSetMediaOutputMuted.packages/core/src/runtime/media.ts—outputMutedparam assertsel.muted = trueper tick;NotAllowedErrordetection firesonAutoplayBlocked.packages/core/src/runtime/init.ts— wire new bridge handler; coordinate withset-muted; postmedia-autoplay-blockedonce per session.packages/player/src/hyperframes-player.ts—_audioOwnerstate; delete_muteIframeMedia;_promoteToParentProxy; mirror parentcurrentTime; gate all parent play/pause/seek on ownership.Verified end-to-end with agent-browser on
factory-series-c-videoRuntime ownership (default — desktop studio):
PARENT.play()calls per play cyclevolumechangeeventsParent ownership (simulated autoplay block — direct message):
muted=true,volume=1(untouched)muted=false,volume=1, audiblecurrentTimeoffsetMobile path simulated with iPhone 14 emulation + injected
NotAllowedErrorfrom iframe<audio>.play():Event timeline captured via agent-browser instrumentation:
Steady state at t=4 s under promoted parent ownership:
Offset: 8 ms, single audible stream, orthogonal mute channel respected.
Test plan
bunx vitest rununderpackages/core— 467 / 467 pass (incl. 4 newmedia.test.ts+ 2 newbridge.test.ts)bunx vitest rununderpackages/player— 23 / 23 pass (3 rewrites for new contract, 2 new for promotion flow)bun run build— all packages greenfactory-series-c-video:media-autoplay-blockedmessage: iframe muted, parent audibleNotAllowedError: full promotion chain verified in ~1 s, 8 ms steady-state offsetvolumechangethrash in either ownership modeNotAllowedErrorpath (expected behavior identical to simulation above)