Skip to content

fix(engine,producer): preserve template-wrapped sub-composition media offsets#476

Merged
miguel-heygen merged 4 commits intomainfrom
fix/template-subcomposition-media-offset
Apr 24, 2026
Merged

fix(engine,producer): preserve template-wrapped sub-composition media offsets#476
miguel-heygen merged 4 commits intomainfrom
fix/template-subcomposition-media-offset

Conversation

@miguel-heygen
Copy link
Copy Markdown
Collaborator

@miguel-heygen miguel-heygen commented Apr 24, 2026

Problem

Template-wrapped sub-compositions could still lose correct parent timing during render in more than one place.

In the validated repros, a host sub-composition starting after the intro (and in one follow-up repro, starting at 20s after earlier compositions) contained scene-local media inside it. On the broken paths:

  • template-wrapped media could be missed during compile and scheduled at raw scene-local time
  • already-correct first-pass offsets could be clobbered during recompileWithResolutions()
  • even after those two fixes, the browser-metadata reconcile step in executeRenderJob() could still overwrite a compiled global end with a scene-local data-end from the inlined DOM, clipping the tail off late-start sub-composition media

What this fixes

Template-wrapped media discovery

  • parseVideoElements, parseImageElements, and parseAudioElements now unwrap a single top-level <template> wrapper before scraping media
  • the unwrap helper is DOM-based, not regex-based, so it avoids the CodeQL backtracking warning and only unwraps the exact single-wrapper shape we want
  • multiple sibling templates or other top-level content are left untouched instead of being rewritten heuristically

Offset preservation after duration resolution

  • recompileWithResolutions() now preserves the first-pass sub-composition media arrays when the already-inlined HTML no longer contains [data-composition-src] hosts
  • that prevents correctly offset media metadata from being overwritten by scene-local media parsed from the merged DOM

Browser metadata reconciliation in the compiled time origin

  • browser-discovered media can still report scene-local data-start / data-end from the merged DOM after inlining
  • the producer now reprojects browser end values into the compiled element's time origin before reconciling them back into composition.videos / composition.audios
  • this prevents late-start sub-composition media from getting truncated back to a scene-local end during the probe phase

Regression coverage

  • adds focused engine tests for the template unwrap helper
  • adds producer regression coverage for both the initial compile path and the post-inline recompileWithResolutions() path
  • adds producer regression coverage for late-start host compositions (t≈20) with scene-local media inside them
  • adds producer unit coverage for the browser-end reprojection helper used by the reconcile path

Root cause

There were three distinct renderer failures behind the bug:

1. Template contents were invisible to the media scrapers

parseSubCompositions() reads raw sub-composition HTML and applies the host offset to discovered media. But the engine media helpers were querying the parsed document directly, and linkedom follows browser semantics here: top-level <template> contents live in a DocumentFragment, so querySelectorAll() never saw those <video> / <audio> / <img> nodes.

That meant template-wrapped sub-compositions could silently produce zero discovered media during the first pass.

2. The duration-resolution recompile could clobber already-correct offsets

After the browser resolves composition durations, recompileWithResolutions() reparses the already-inlined HTML. By that point the original [data-composition-src] hosts are gone, so parseSubCompositions() legitimately returns no nested media.

The old code still rebuilt the deduped media arrays from the merged DOM, which let scene-local media parsed from the inlined HTML overwrite the correctly offset first-pass metadata.

3. The browser probe reconcile path mixed two timing coordinate systems

discoverMediaFromBrowser() reads data-start / data-end directly from the live DOM after sub-compositions are already inlined. For nested media, those attributes can still be scene-local even though the compiled metadata has already been offset into the parent host timeline.

The old reconcile path compared those values directly and overwrote existing.end whenever the numbers differed. For a late-start sub-composition, that could replace a correct global end like 25.5 with a scene-local end like 5.5, cutting the clip off during render.

Verification

Local checks

  • bun test packages/engine/src/utils/htmlTemplate.test.ts
  • bun test packages/producer/src/services/htmlCompiler.test.ts
  • bunx vitest run packages/producer/src/services/renderOrchestrator.test.ts
  • bun run --filter @hyperframes/engine test
  • bun run --filter @hyperframes/engine typecheck
  • bun run --filter @hyperframes/producer typecheck
  • bunx oxlint packages/engine/src/utils/htmlTemplate.ts packages/engine/src/utils/htmlTemplate.test.ts packages/producer/src/services/renderOrchestrator.ts packages/producer/src/services/renderOrchestrator.test.ts packages/producer/src/services/htmlCompiler.test.ts
  • bunx oxfmt --check packages/engine/src/utils/htmlTemplate.ts packages/engine/src/utils/htmlTemplate.test.ts
  • bun run build:producer

Render / browser verification

Verified against two local repros:

  1. Early offset repro

    • host starts at 2s
    • child media is scene-local 0-4s
    • compiled render summary keeps the child video/audio at start: 2
    • browser verification via agent-browser confirmed the 2.2s frame still shows the child clip active in the host timeline
  2. Late offset repro

    • earlier compositions run first, then the target host starts at 20s
    • child media starts scene-local at 1.5s and should remain visible through 24.5s
    • compiled render summary keeps the child video/audio at start: 21.5, end: 25.5
    • browser verification via agent-browser confirmed the 24.5s frame still shows the late clip visible, which is the exact tail-clipping case the old reconcile path could break

Notes

  • the /tmp/hf-pr475-repro and /tmp/hf-pr476-late-offset-repro projects plus their browser-proof artifacts are verification-only and are not part of this PR
  • this PR stays narrowly scoped to sub-composition media timing across compile, recompile, and browser probe reconciliation; it does not broaden into general sub-composition HTML normalization beyond the single-wrapper case

Comment thread packages/engine/src/utils/htmlTemplate.ts Fixed
@miguel-heygen miguel-heygen marked this pull request as ready for review April 24, 2026 16:01
@MuTsunTsai
Copy link
Copy Markdown
Contributor

Apparently I'm working on the same issue in #475

Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: approve — real silent-misrender bug, reproduced against main, fix cleanly addresses both root causes

Local verification

Bug reproduced on main. Cherry-picked the new htmlCompiler.test.ts cases onto main (3b8de7a5) and ran them:

(fail) template-wrapped sub-composition media offsets > offsets template-wrapped media to the host start during compile
expect(compiled.videos[0]?.start).toBe(2);
Expected: 2
Received: 0
(fail) template-wrapped sub-composition media offsets > preserves first-pass media offsets when durations are resolved after inlining

That's the real failure mode: a <video> authored scene-local at data-start="0" inside a template-wrapped sub-composition with host data-start="2" is being scheduled at t=0s instead of t=2s. In a render, the child video collides with the intro on the timeline — exactly what the PR description says.

Fix verified against 5d23706e:

  • bun test packages/engine/src/utils/htmlTemplate.test.ts7/7 pass
  • bun test packages/producer/src/services/htmlCompiler.test.ts30/30 pass (including both new regression cases)
  • bun run --filter @hyperframes/engine test490/490 pass
  • bun run --filter @hyperframes/engine typecheck → clean
  • bun run --filter @hyperframes/producer typecheck → clean

Staff review — architecture + correctness

The PR identifies TWO distinct root causes and fixes each at the precise site. Not a shotgun fix.

Root cause 1 — <template> contents invisible to media scrapers. LinkeDOM follows spec: <template> contents live in a DocumentFragment that querySelectorAll() on the outer document never traverses. Sub-compositions in this codebase commonly wrap their body in a single top-level <template> (see the skill contract — every sub-composition is a <template> wrapping a <div data-composition-id=…>). So parseVideoElements / parseImageElements / parseAudioElements silently returned [] for the most common sub-composition shape in the codebase. That's a huge correctness-scoped bug that likely was masked because many renders happened to use non-template sub-composition shapes or used root-level media.

The unwrapTemplate helper is precisely scoped. Reading htmlTemplate.ts:

  • DOM-based, not regex-based. Miguel explicitly called out avoiding the CodeQL backtracking warning from a regex approach. Good — once you're in LinkeDOM, use LinkeDOM.
  • Only unwraps the exact single-wrapper shape. getSingleMeaningfulChild walks past whitespace-only text nodes (nodeType 3) and comments (nodeType 8) but returns null on ANY of: multiple element children, mixed text + elements, no element children. If the walk doesn't find exactly one meaningful element AND that element isn't <template>, the helper returns the input HTML unchanged. That defensive shape means: multi-sibling-templates, mixed content, empty fragments all pass through untouched.
  • Short-circuits cheaply. The string-level !lowered.includes("<template") || !lowered.includes("</template>") skip avoids LinkeDOM parsing for any HTML that doesn't plausibly contain a template. Zero cost on the common path.
  • Wrapping strategy is correct. parseHTMLContent detects <!doctype|<html to decide whether to wrap fragments in a synthetic <!doctype html><html><head></head><body>…</body></html>, so both full-document and fragment inputs go through one codepath. LinkeDOM's parseHTML then gives a body we can walk.
  • template.innerHTML ?? html — defensive fallback. In practice innerHTML returns "" for empty templates (the test confirms this), and since "" is defined, ?? returns the empty string, not html. Correct.

Nitty correctness check on each test case in htmlTemplate.test.ts — all 7 encode meaningful invariants:

  • returns the input unchanged when there is no template wrapper — short-circuit works
  • unwraps a bare top-level template fragment — the happy path
  • unwraps a full document whose body only contains a template — full-doc shape works
  • returns the input unchanged when the closing template tag is missing — malformed input is passthrough, not a crash
  • returns an empty string for an empty template — matches the ?? semantics, not a passthrough
  • preserves nested templates inside the outer wrapper — only the OUTER wrapper unwraps
  • leaves multiple sibling templates unchanged — doesn't heuristically pick one

That's a solid coverage profile for a string→string helper.

Root cause 2 — recompileWithResolutions was clobbering first-pass offsets. The change:

const hasSubMedia = subVideos.length > 0 || subAudios.length > 0 || subImages.length > 0;
const videos = hasSubMedia ? dedupeElementsById([...mainVideos, ...subVideos]) : compiled.videos;

The semantics: after the browser resolves composition durations, recompileWithResolutions reparses the already-inlined HTML. By this point the original [data-composition-src] hosts are gone (they got inlined into the main HTML), so parseSubCompositions legitimately returns empty. The bug was that mainVideos re-parsed from the inlined DOM would now find the scene-local video at its SCENE-LOCAL data-start="0" — losing the host offset that was correctly applied in the first pass. The fix preserves the first-pass compiled.videos when there's no second-pass sub-media.

The correctness argument: between the two passes, the set of media elements doesn't change — duration resolution just replaces data-composition-duration on hosts. So preserving the first-pass arrays when no fresh sub-media is discovered is safe — we aren't dropping any legitimately-new elements.

Non-blocking observations

  1. unwrapTemplate lowercases for the short-circuit but the DOM check uses tagName === "TEMPLATE". LinkeDOM normalizes tagName to uppercase, which matches the comparison — correct. No issue.

  2. BOM handling. html.trimStart().toLowerCase() handles  as a whitespace code point in most engines (String.prototype.trim treats it as whitespace per ES2019). If an author source-saved a <template> as UTF-8-BOM, the check still works. Minor trivia; not an action item.

  3. Documentation for the hasSubMedia preservation. The conditional is subtle and the change is 4 lines. A single-line comment at the site explaining "after inlining, re-parsing the merged DOM loses the sub-composition offsets — preserve first-pass arrays when no fresh sub-media is discovered" would save the next reader the archaeology. Not a blocker; tests and commit message cover it.

  4. Parallel fix for parseSubCompositions? The PR applies unwrapTemplate to the three media parsers in engine/services/audioMixer.ts and engine/services/videoFrameExtractor.ts. If anywhere else in the engine uses parseHTML(rawSubCompositionHtml) and then querySelectorAll against a template-wrapped fragment, it would have the same bug. parseSubCompositions itself handles the host-level traversal via its own regex + offset logic and doesn't hit this issue. Out of scope; flagging for future awareness.

CI state

On 5d23706e — all green except 2 regression-shards (styles-b, styles-e) still in-progress. Completed green: Format / Typecheck / Lint / Test / Build / Test: runtime contract / Smoke / Tests on windows-latest / Render on windows-latest / CodeQL / player-perf / regression-shards (render-compat, styles-a, styles-g). Perf suite skipped on this PR (Detect changes gated it out as non-perf-relevant).

Ship it.

Review by Rames Jusso

@MuTsunTsai
Copy link
Copy Markdown
Contributor

MuTsunTsai commented Apr 24, 2026

Nice catch on the DOM-based unwrap — that's cleaner than the regex I used in #475.

One case I think is still broken after this PR: the browser-metadata reconciliation in executeRenderJob (renderOrchestrator.ts) overwrites existing.end whenever |existing.end - el.end| > 0.0001. For sub-composition media, existing.end is the compiled value (scene-local end + host offset, so e.g. 89.94), and el.end comes from the inlined DOM where the clip's data-end is still scene-local (85.52). The 4.417 gap triggers the reconcile and clips the tail off every sub-composition video/audio at render time.

I hit this in my render: the host was data-start="4.417" and the last ~4.4s of every clip was cut. Only fixing the parse / recompile paths (the two in this PR) isn't enough — the reconciler undoes it.

Repro just needs a host with data-start > 0 + a scene-local clip that runs to the end of the scene, then check whether the final audio.aac contains the full clip or is short.

I have a fix for this in #475 as a second commit (link to commit). Happy to extract it into a separate PR on top of this one if that's cleaner.

@miguel-heygen miguel-heygen merged commit 267ffd3 into main Apr 24, 2026
37 checks passed
Copy link
Copy Markdown
Collaborator Author

Merge activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants