Add opt-in RNNoise noise suppression with preset controls#3772
Add opt-in RNNoise noise suppression with preset controls#3772LucaPisl wants to merge 27 commits into
Conversation
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
…ting Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
The rnnoise C library (xiph/rnnoise) is compiled to WebAssembly inside @jitsi/rnnoise-wasm and bundled verbatim into the Element Call web app bundle. The rnnoise BSD 3-Clause license requires that its copyright notice be reproduced in documentation or other materials provided with binary distributions. @jitsi/rnnoise-wasm does not carry a NOTICE file nor embed the upstream BSD notice in its generated JS/WASM artefacts, so Element Call — as the distributor of the binary — must supply the attribution itself. Add THIRD_PARTY_NOTICES at the repository root containing: - xiph/rnnoise BSD 3-Clause notice (Mozilla, Jean-Marc Valin, Xiph.Org Foundation, Mark Borgerding) - @jitsi/rnnoise-wasm Apache 2.0 notice (ESTOS GmbH, BlueJimp SARL) Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
- Fix copyright header in RNNoiseProcessor.ts (was "2025 New Vector Ltd.", all other files in the feature use "2026 Element Creations Ltd.") - Pin @jitsi/rnnoise-wasm to exact version 0.2.1 (was "^0.2.1") to prevent unexpected upstream WASM changes being pulled automatically - Add double-init guard to RNNoiseProcessor.init(): tears down existing nodes before re-initialising so callers need not explicitly destroy first - Add post-await destroyed check in init() to abort cleanly if a concurrent destroy() ran during worklet registration - Add pendingWorkletRegistrations WeakMap mutex in ensureWorkletRegistered() to prevent concurrent addModule() calls on the same AudioContext - Add comment to createWorkletCode() warning that it is the test-harness copy of the worklet and must be kept in sync with RNNoiseWorkletModule.ts - Add three new unit tests covering: double-init node cleanup, concurrent ensureWorkletRegistered() deduplication, concurrent init+destroy safety Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
The @typescript-eslint/promise-function-async rule requires functions that return Promise values to be declared async. While the runtime behaviour is equivalent, adding async keeps the linter clean and makes the intent explicit. Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
|
Hi and thx for your contribution, have you explored https://github.com/mezonai/mezon-noise-suppression as an alternative and way more superior speech enhancement solution? Also the integration should be way more straight forward. |
|
Hi! I didn't explore it. I've been tracking the aforementioned issue for a while now and had my eyes set on RNNoise, mainly due to the discussions in the issue. I have to admit that I haven't really explored alternatives (except for DeepFilterNet, which, as mentioned in the issue, is not viable due to size constraints). I figured RNN is good for element-call, seeing as it's used by Jitsi too. As for the implementation, it's mainly a lot of rewriting to ensure consistency which I didn't bother to squash. This is more meant to serve as a building block for noise suppression in Element, as it's something that's been lacking for a while now. |
This adds opt-in enhanced noise suppression powered by RNNoise — a small recurrent neural network trained specifically for speech denoising. It sits alongside (and replaces, when active) the browser's native WebRTC noise suppression, and is exposed to users as a toggle in the Audio settings tab.
Why RNNoise
The browser's built-in noise suppression is decent but varies a lot across browsers and operating systems. RNNoise runs consistently as an AudioWorklet, processes audio at the same point in the pipeline regardless of platform, and is tunable. The main tradeoff is a small bundle size increase and ~10ms of additional latency — both of which I think are acceptable for a feature like this.
What changed
Settings
A new
rnnoiseNoiseSuppressionboolean setting (off by default) appears in the Audio tab, following the same pattern as the background blur toggle in the Video tab. Alongside it there's a preset selector — Conservative, Balanced, or Strong — which controls how aggressively RNNoise attenuates non-speech audio. The setting and the chosen preset both persist across sessions via localStorage.When the browser doesn't support AudioWorklet (old Safari, some mobile WebViews), the toggle renders as disabled with an explanatory note. The feature is completely inert in those environments — no processor is loaded, no errors are thrown.
Audio pipeline
The processor is implemented as a
TrackProcessor<Track.Kind.Audio>using LiveKit's existing processor API, the same mechanism used for background blur on video. This means LiveKit handles all the track lifecycle plumbing — the processor just receives the rawMediaStreamTrackand anAudioContext, builds a small Web Audio graph (source → AudioWorkletNode → destination), and returns the processed track.Inside the worklet, a ring buffer bridges the 128-sample blocks that AudioWorklet delivers to the 480-sample frames that RNNoise expects (480 samples at 48 kHz = 10 ms). Multi-channel input is downmixed to mono by averaging all channels before processing — RNNoise is mono-only. The preset system layers a gain envelope on top of RNNoise's VAD probability output: during detected noise segments, gain is smoothly ramped down to a preset floor, then released back up when speech resumes.
RNNoise expects 48 kHz audio. If the AudioContext reports a different sample rate, the processor refuses to initialize and logs a one-time warning — audio falls through to native suppression unchanged. This is an edge case in practice (Chrome and Firefox both default to 48 kHz) but it's worth being explicit about.
Native noise suppression interaction
When RNNoise is active and the browser supports it, the WebRTC constraint
noiseSuppressionis set tofalseon the microphone track. Running both simultaneously tends to produce audible artifacts and wastes CPU. When RNNoise is toggled on or off, the microphone track is restarted with the updated constraint before the processor is applied or removed — this ensures the WebRTC layer is always consistent with what's happening in the audio graph.If the processor fails to initialize for any reason, the setting is automatically turned off and
noiseSuppressionis re-enabled, so the user is never left with noise suppression disabled and no replacement active.Wiring
The processor is managed by the
Publisherclass, which already owns the LiveKit track lifecycle. It observes both the RNNoise enabled setting and the preset setting reactively, and serializes all processor operations through a promise queue to avoid race conditions during rapid toggling. When a new microphone track appears (e.g. after a device switch), if RNNoise is enabled, the track is restarted with the updated suppression policy and the processor is re-applied.WASM package
The WASM binary comes from
@jitsi/rnnoise-wasm(version 0.2.1), which ships a synchronous build with the WASM inlined as base64. This avoids needing a separate.wasmfetch from within the worklet scope, which would be awkward to configure. The sync build is what Jitsi Meet uses in production.THIRD_PARTY_NOTICEShas been updated with the BSD (rnnoise) and Apache 2.0 (@jitsi/rnnoise-wasm) license notices.Tests
Unit tests cover the full processor lifecycle — init, destroy, idempotent double-destroy, restart, double-init without explicit destroy, concurrent init+destroy race, worklet module deduplication per AudioContext, the 44.1 kHz bypass, worklet load failures, and the stereo/multi-channel downmix policy. The noise suppression policy function has its own tests for all three input combinations. The SettingsModal has tests for rendering, the unsupported-browser state, and localStorage persistence. The Publisher integration tests cover enabling/disabling the processor, the automatic fallback on setup failure, the native suppression policy on track restart, and preset propagation to an already-active processor.
On the Playwright side, two end-to-end scenarios were added to the sticky call spec: a rejoin-after-disconnect stability test and a microphone device-switch stability test, both with RNNoise enabled. These run on Chromium with fake media devices and check that no RNNoise or AudioWorklet errors appear in the console across the scenario.
Known limitations
48 kHz only. If the AudioContext is at a different sample rate, RNNoise is silently skipped and native suppression is used. A future improvement would be to add a resampler, but that adds complexity and the 48 kHz case covers essentially all desktop browsers.
~10 ms added latency. The 480-sample frame size at 48 kHz introduces a 10 ms pipeline delay. This is inherent to RNNoise and not something we can avoid without modifying the model. For a voice call this is imperceptible.
Bundle size.
@jitsi/rnnoise-wasm's sync build adds roughly 200–250 KB to the bundle (base64-encoded WASM). The worklet module is only fetched when the processor is first initialized — it's not loaded at startup — so this doesn't affect initial page load time for users who haven't enabled the feature. It does affect the bundle for users who have.No resampling. Closely related to the 48 kHz limitation above. If we want to support non-standard sample rates in the future, we'd need to add a resampler node in the audio graph.
Preset selector is always visible when enabled. The preset could arguably default to hidden with an "advanced" disclosure, but I kept it visible since three options is not a lot of UI and the labels are self-explanatory.
Notes
Adds opt-in enhanced noise suppression powered by RNNoise (AudioWorklet + WASM). Includes a three-level preset system (Conservative / Balanced / Strong), automatic native suppression coordination, and graceful fallback for unsupported browsers or initialization failures.
Additional notes
AI was used in writing and documentation. More testing is needed, but everything LGTM on my local dev environment. This PR is in reference to issue #714