[AUDIO_WORKLET] Optimised output buffer copy#24891
[AUDIO_WORKLET] Optimised output buffer copy#24891juj merged 27 commits intoemscripten-core:mainfrom
Conversation
5a70474 to
99f4c12
Compare
e19d4fd to
348932f
Compare
2313240 to
748c167
Compare
4cba21c to
998ac6b
Compare
This takes into account multiple multi-speaker outputs (the spec labels 18).
b57d103 to
5186686
Compare
juj
left a comment
There was a problem hiding this comment.
LGTM, the logic looks solid now.
I had a couple of minor comments, but those are cosmetic. LGTM either way, this looks good to land to me.
Thanks! |
|
One thing that comes to mind that may be worth probing: The TLS section is allocated at the end of the stack, (it should be exclusive of the stack size though). So you may try to place a global variable This will verify that the allocation from stack top won't stomp on TLS variables. |
[snip] I can add this as a separate PR, internals like this were one of my worries when starting this last year (most of those worries went away once it was working, but I'm still missing insights like this). |
|
Yeah that sounds good. |
|
Changes in as per @juj's suggestions. |
#25024 tests this. |
|
@sbc100 if we can land this in the next days, I'll have time in the next weeks to prep for the WebAudio 1.1 API and how I'd like to extend the current API to enable support (I don't want to merge this later). |
|
I'm happy with the PR, and code size increase is ok for the feature. I'll defer landing though to Sam, since he's still got a yellow dot on the review. |
|
All done from me. |
|
Could one of you land this? Two ticks, so I’m guessing we’re all good? |
|
Well it will be my pleasure. Thank you so much for the great effort in this - it is a superb performance optimization. |
Thanks! |
…e#24931) This adds an interactive test to force growing the heap during playback: ``` test/runner interactive.test_audio_worklet_memory_growth ``` Tested with `interactive64` and `interactive_2gb` (for `interactive64_4gb` the heap is already at the browser's max in testing so can't be grown). It works by alloc'ing and leaking 2/3 of the current size until it can no longer do so. Emscripten regrows its wasm memory in the process, invalidating any data views (see emscripten-core#24891). **Edit: test can now grow from both the main and audio thread.**
) Built on emscripten-core#24931 (it touches the same file, but a rebase after merge will fix this). This adds hard-panned audio files to test that the left and right channels don't get flipped with any changes to the audio worklet code (relevant for emscripten-core#24891, which changes how the copies are performed). ``` test/runner interactive.test_audio_worklet_hard_pans ``` The bass track is hard-left (with its right muted), drums are right.
A reworking of emscripten-core#22753, which "improves the copy back from the audio worklet's heap to JS by 7-12x depending on the browser." From the previous description: Since we pass in the stack for the worklet from the caller's heap, its address doesn't change. And since the render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers do not change either. This optimisation adds one-time subarray views and replaces the float-by-float copy with a simple `set()` per channel (per output). The existing interactive tests (written for the original PR) can be run for comparison: ``` test/runner interactive.test_audio_worklet_stereo_io test/runner interactive.test_audio_worklet_2x_stereo_io test/runner interactive.test_audio_worklet_mono_io test/runner interactive.test_audio_worklet_2x_hard_pan_io test/runner interactive.test_audio_worklet_params_mixing test/runner interactive.test_audio_worklet_memory_growth test/runner interactive.test_audio_worklet_hard_pans ``` These test various input/output arrangements as well as parameters (parameters are interesting because, depending on the browser, the sizes change as the params move from static to varying). The original benchmark of the extracted copy is still valid: https://wip.numfum.com/cw/2024-10-29/index.html This is tested with 32- and 64-bit wasm (which required a reordering of how structs and data were stored to avoid alignment issues). Some explanations: - Fixed-position output buffer views are created once in the`WasmAudioWorkletProcessor` constructor - Stack allocations for the `process()` call are split into aligned struct data (see the comments) and audio/param data - The struct writes are simplified by this splitting of data - `ASSERTIONS` are used to ensure everything fits and correctly aligns - The tests account for size changes in the params, which can vary from a single float to 128 floats (a single float nicely showing up any 8-byte alignment issues for wasm64) ~~Future improvements: the output views are sequential, so instead of of being individual views covering each channel the views could cover one to however-many-views needed, with a single `set()` being enough for all outputs.~~
A reworking of #22753, which "improves the copy back from the audio worklet's heap to JS by 7-12x depending on the browser." From the previous description:
Since we pass in the stack for the worklet from the caller's heap, its address doesn't change. And since the render quantum size doesn't change after the audio worklet creation, the stack positions for the audio buffers do not change either. This optimisation adds one-time subarray views and replaces the float-by-float copy with a simple
set()per channel (per output).The existing interactive tests (written for the original PR) can be run for comparison:
These test various input/output arrangements as well as parameters (parameters are interesting because, depending on the browser, the sizes change as the params move from static to varying).
The original benchmark of the extracted copy is still valid:
https://wip.numfum.com/cw/2024-10-29/index.html
This is tested with 32- and 64-bit wasm (which required a reordering of how structs and data were stored to avoid alignment issues).
Some explanations:
WasmAudioWorkletProcessorconstructorprocess()call are split into aligned struct data (see the comments) and audio/param dataASSERTIONSare used to ensure everything fits and correctly alignsFuture improvements: the output views are sequential, so instead of of being individual views covering each channel the views could cover one to however-many-views needed, with a singleset()being enough for all outputs.