[AUDIO_WORKLET] Optimised output buffer copy by cwoffenden · Pull Request #24891 · emscripten-core/emscripten

cwoffenden · 2025-08-08T18:22:51Z

A reworking of #22753, which "improves the copy back from the audio worklet's heap to JS by 7-12x depending on the browser." From the previous description:

Since we pass in the stack for the worklet from the caller's heap, its address doesn't change. And since the render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers do not change either. This optimisation adds one-time subarray views and replaces the float-by-float copy with a simple set() per channel (per output).

The existing interactive tests (written for the original PR) can be run for comparison:

test/runner interactive.test_audio_worklet_stereo_io
test/runner interactive.test_audio_worklet_2x_stereo_io
test/runner interactive.test_audio_worklet_mono_io
test/runner interactive.test_audio_worklet_2x_hard_pan_io
test/runner interactive.test_audio_worklet_params_mixing
test/runner interactive.test_audio_worklet_memory_growth
test/runner interactive.test_audio_worklet_hard_pans

These test various input/output arrangements as well as parameters (parameters are interesting because, depending on the browser, the sizes change as the params move from static to varying).

The original benchmark of the extracted copy is still valid:

https://wip.numfum.com/cw/2024-10-29/index.html

This is tested with 32- and 64-bit wasm (which required a reordering of how structs and data were stored to avoid alignment issues).

Some explanations:

Fixed-position output buffer views are created once in theWasmAudioWorkletProcessor constructor
Stack allocations for the process() call are split into aligned struct data (see the comments) and audio/param data
The struct writes are simplified by this splitting of data
ASSERTIONS are used to ensure everything fits and correctly aligns
The tests account for size changes in the params, which can vary from a single float to 128 floats (a single float nicely showing up any 8-byte alignment issues for wasm64)

Future improvements: the output views are sequential, so instead of of being individual views covering each channel the views could cover one to however-many-views needed, with a single set() being enough for all outputs.

src/audio_worklet.js

This takes into account multiple multi-speaker outputs (the spec labels 18).

src/audio_worklet.js

juj

LGTM, the logic looks solid now.

I had a couple of minor comments, but those are cosmetic. LGTM either way, this looks good to land to me.

cwoffenden · 2025-08-21T08:19:55Z

LGTM, the logic looks solid now.

Thanks!

juj · 2025-08-21T08:39:11Z

One thing that comes to mind that may be worth probing: The TLS section is allocated at the end of the stack, (it should be exclusive of the stack size though).

So you may try to place a global variable __thread int foo = 42; variable in the test test/webaudio/audioworklet_params_mixing.c, and add an assert(foo == 42); to verify that its value never changes during the test. This assert would be inside the audio worklet callback process call.

This will verify that the allocation from stack top won't stomp on TLS variables.

cwoffenden · 2025-08-21T08:55:00Z

One thing that comes to mind that may be worth probing

[snip]

I can add this as a separate PR, internals like this were one of my worries when starting this last year (most of those worries went away once it was working, but I'm still missing insights like this).

juj · 2025-08-21T09:32:09Z

Yeah that sounds good.

cwoffenden · 2025-08-21T17:08:42Z

Changes in as per @juj's suggestions. ~~CI is breaking but it's not related.~~

cwoffenden · 2025-08-22T10:11:34Z

Yeah that sounds good.

#25024 tests this.

cwoffenden · 2025-08-25T11:52:44Z

@sbc100 if we can land this in the next days, I'll have time in the next weeks to prep for the WebAudio 1.1 API and how I'd like to extend the current API to enable support (I don't want to merge this later).

juj · 2025-08-25T12:06:56Z

I'm happy with the PR, and code size increase is ok for the feature. I'll defer landing though to Sam, since he's still got a yellow dot on the review.

src/audio_worklet.js

cwoffenden · 2025-08-26T14:59:33Z

All done from me.

cwoffenden · 2025-08-30T08:09:00Z

Could one of you land this? Two ticks, so I’m guessing we’re all good?

juj · 2025-08-30T08:46:16Z

Well it will be my pleasure. Thank you so much for the great effort in this - it is a superb performance optimization.

cwoffenden · 2025-08-30T08:54:06Z

it is a superb performance optimization

Thanks!

…e#24931) This adds an interactive test to force growing the heap during playback: ``` test/runner interactive.test_audio_worklet_memory_growth ``` Tested with `interactive64` and `interactive_2gb` (for `interactive64_4gb` the heap is already at the browser's max in testing so can't be grown). It works by alloc'ing and leaking 2/3 of the current size until it can no longer do so. Emscripten regrows its wasm memory in the process, invalidating any data views (see emscripten-core#24891). **Edit: test can now grow from both the main and audio thread.**

) Built on emscripten-core#24931 (it touches the same file, but a rebase after merge will fix this). This adds hard-panned audio files to test that the left and right channels don't get flipped with any changes to the audio worklet code (relevant for emscripten-core#24891, which changes how the copies are performed). ``` test/runner interactive.test_audio_worklet_hard_pans ``` The bass track is hard-left (with its right muted), drums are right.

A reworking of emscripten-core#22753, which "improves the copy back from the audio worklet's heap to JS by 7-12x depending on the browser." From the previous description: Since we pass in the stack for the worklet from the caller's heap, its address doesn't change. And since the render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers do not change either. This optimisation adds one-time subarray views and replaces the float-by-float copy with a simple `set()` per channel (per output). The existing interactive tests (written for the original PR) can be run for comparison: ``` test/runner interactive.test_audio_worklet_stereo_io test/runner interactive.test_audio_worklet_2x_stereo_io test/runner interactive.test_audio_worklet_mono_io test/runner interactive.test_audio_worklet_2x_hard_pan_io test/runner interactive.test_audio_worklet_params_mixing test/runner interactive.test_audio_worklet_memory_growth test/runner interactive.test_audio_worklet_hard_pans ``` These test various input/output arrangements as well as parameters (parameters are interesting because, depending on the browser, the sizes change as the params move from static to varying). The original benchmark of the extracted copy is still valid: https://wip.numfum.com/cw/2024-10-29/index.html This is tested with 32- and 64-bit wasm (which required a reordering of how structs and data were stored to avoid alignment issues). Some explanations: - Fixed-position output buffer views are created once in the`WasmAudioWorkletProcessor` constructor - Stack allocations for the `process()` call are split into aligned struct data (see the comments) and audio/param data - The struct writes are simplified by this splitting of data - `ASSERTIONS` are used to ensure everything fits and correctly aligns - The tests account for size changes in the params, which can vary from a single float to 128 floats (a single float nicely showing up any 8-byte alignment issues for wasm64) ~~Future improvements: the output views are sequential, so instead of of being individual views covering each channel the views could cover one to however-many-views needed, with a single `set()` being enough for all outputs.~~

cwoffenden force-pushed the cw-aw-optimised-copy branch from 5a70474 to 99f4c12 Compare August 8, 2025 18:23

cwoffenden marked this pull request as draft August 8, 2025 18:23

cwoffenden force-pushed the cw-aw-optimised-copy branch 2 times, most recently from e19d4fd to 348932f Compare August 12, 2025 16:17

cwoffenden marked this pull request as ready for review August 12, 2025 19:54

cwoffenden force-pushed the cw-aw-optimised-copy branch from 2313240 to 748c167 Compare August 13, 2025 08:30

sbc100 requested a review from juj August 13, 2025 15:16

juj reviewed Aug 13, 2025

View reviewed changes

src/audio_worklet.js Outdated Show resolved Hide resolved

cwoffenden marked this pull request as draft August 13, 2025 17:31

cwoffenden mentioned this pull request Aug 14, 2025

[AUDIO_WORKLET] Add interactive heap growing test NFC #24931

Merged

cwoffenden force-pushed the cw-aw-optimised-copy branch 2 times, most recently from 4cba21c to 998ac6b Compare August 14, 2025 14:45

sbc100 reviewed Aug 14, 2025

View reviewed changes

src/audio_worklet.js Outdated Show resolved Hide resolved

cwoffenden added 14 commits August 14, 2025 20:37

Work-in-progress (fails with wasm64)

2bce44e

Minor docs

17e4415

Minor docs

f4710d5

Minor docs

85fe599

Clarification

09cdfed

Temp workaround for (pos.) unaligned structs

62cb5d9

Code size updates

c06c3c4

Simplify struct fills, fix wasm64

52e6ee4

Code size

c7359d5

Increased the max buffer count

2bb992b

This takes into account multiple multi-speaker outputs (the spec labels 18).

Code size changes

50a4ce6

Recreate views as heap grows

e6a834f

Use ptrToString for address

f3bf6b2

Code size

5186686

cwoffenden force-pushed the cw-aw-optimised-copy branch from b57d103 to 5186686 Compare August 14, 2025 18:37

cwoffenden added 2 commits August 15, 2025 12:47

Don't test for heap changes if memory growth is not enabled

0cc9647

Code size

2c550c3

juj reviewed Aug 20, 2025

View reviewed changes

src/audio_worklet.js Outdated Show resolved Hide resolved

juj reviewed Aug 20, 2025

View reviewed changes

src/audio_worklet.js Show resolved Hide resolved

juj approved these changes Aug 20, 2025

View reviewed changes

cwoffenden added 4 commits August 21, 2025 17:19

Simplify output view array size tracking

2288d88

Micro-opt to calculate the stack bytes once

c42bc56

Code size

b931338

Merge branch 'main' into cw-aw-optimised-copy

4532568

cwoffenden added 2 commits August 21, 2025 23:24

Merge branch 'main' into cw-aw-optimised-copy

958eb87

Merge branch 'main' into cw-aw-optimised-copy

d9e17a5

sbc100 approved these changes Aug 25, 2025

View reviewed changes

src/audio_worklet.js Outdated Show resolved Hide resolved

src/audio_worklet.js Outdated Show resolved Hide resolved

src/audio_worklet.js Show resolved Hide resolved

cwoffenden added 3 commits August 26, 2025 16:48

Changes following reviews

4b92679

Merge branch 'main' into cw-aw-optimised-copy

675e069

Added more docs

5362735

juj merged commit f78366c into emscripten-core:main Aug 30, 2025
30 checks passed

cwoffenden deleted the cw-aw-optimised-copy branch August 30, 2025 08:57

Conversation

cwoffenden commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

juj left a comment

Choose a reason for hiding this comment

Uh oh!

cwoffenden commented Aug 21, 2025

Uh oh!

juj commented Aug 21, 2025

Uh oh!

cwoffenden commented Aug 21, 2025

Uh oh!

juj commented Aug 21, 2025

Uh oh!

cwoffenden commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwoffenden commented Aug 22, 2025

Uh oh!

cwoffenden commented Aug 25, 2025

Uh oh!

juj commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cwoffenden commented Aug 26, 2025

Uh oh!

cwoffenden commented Aug 30, 2025

Uh oh!

Uh oh!

juj commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwoffenden commented Aug 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cwoffenden commented Aug 8, 2025 •

edited

Loading

cwoffenden commented Aug 21, 2025 •

edited

Loading

juj commented Aug 30, 2025 •

edited

Loading