Skip to content

Add opt-in RNNoise noise suppression with preset controls#3772

Open
LucaPisl wants to merge 27 commits into
element-hq:livekitfrom
LucaPisl:lucapisl/rnnoise-noise-suppression
Open

Add opt-in RNNoise noise suppression with preset controls#3772
LucaPisl wants to merge 27 commits into
element-hq:livekitfrom
LucaPisl:lucapisl/rnnoise-noise-suppression

Conversation

@LucaPisl
Copy link
Copy Markdown

@LucaPisl LucaPisl commented Mar 3, 2026

This adds opt-in enhanced noise suppression powered by RNNoise — a small recurrent neural network trained specifically for speech denoising. It sits alongside (and replaces, when active) the browser's native WebRTC noise suppression, and is exposed to users as a toggle in the Audio settings tab.

Why RNNoise

The browser's built-in noise suppression is decent but varies a lot across browsers and operating systems. RNNoise runs consistently as an AudioWorklet, processes audio at the same point in the pipeline regardless of platform, and is tunable. The main tradeoff is a small bundle size increase and ~10ms of additional latency — both of which I think are acceptable for a feature like this.

What changed

Settings

A new rnnoiseNoiseSuppression boolean setting (off by default) appears in the Audio tab, following the same pattern as the background blur toggle in the Video tab. Alongside it there's a preset selector — Conservative, Balanced, or Strong — which controls how aggressively RNNoise attenuates non-speech audio. The setting and the chosen preset both persist across sessions via localStorage.

When the browser doesn't support AudioWorklet (old Safari, some mobile WebViews), the toggle renders as disabled with an explanatory note. The feature is completely inert in those environments — no processor is loaded, no errors are thrown.

Audio pipeline

The processor is implemented as a TrackProcessor<Track.Kind.Audio> using LiveKit's existing processor API, the same mechanism used for background blur on video. This means LiveKit handles all the track lifecycle plumbing — the processor just receives the raw MediaStreamTrack and an AudioContext, builds a small Web Audio graph (source → AudioWorkletNode → destination), and returns the processed track.

Inside the worklet, a ring buffer bridges the 128-sample blocks that AudioWorklet delivers to the 480-sample frames that RNNoise expects (480 samples at 48 kHz = 10 ms). Multi-channel input is downmixed to mono by averaging all channels before processing — RNNoise is mono-only. The preset system layers a gain envelope on top of RNNoise's VAD probability output: during detected noise segments, gain is smoothly ramped down to a preset floor, then released back up when speech resumes.

RNNoise expects 48 kHz audio. If the AudioContext reports a different sample rate, the processor refuses to initialize and logs a one-time warning — audio falls through to native suppression unchanged. This is an edge case in practice (Chrome and Firefox both default to 48 kHz) but it's worth being explicit about.

Native noise suppression interaction

When RNNoise is active and the browser supports it, the WebRTC constraint noiseSuppression is set to false on the microphone track. Running both simultaneously tends to produce audible artifacts and wastes CPU. When RNNoise is toggled on or off, the microphone track is restarted with the updated constraint before the processor is applied or removed — this ensures the WebRTC layer is always consistent with what's happening in the audio graph.

If the processor fails to initialize for any reason, the setting is automatically turned off and noiseSuppression is re-enabled, so the user is never left with noise suppression disabled and no replacement active.

Wiring

The processor is managed by the Publisher class, which already owns the LiveKit track lifecycle. It observes both the RNNoise enabled setting and the preset setting reactively, and serializes all processor operations through a promise queue to avoid race conditions during rapid toggling. When a new microphone track appears (e.g. after a device switch), if RNNoise is enabled, the track is restarted with the updated suppression policy and the processor is re-applied.

WASM package

The WASM binary comes from @jitsi/rnnoise-wasm (version 0.2.1), which ships a synchronous build with the WASM inlined as base64. This avoids needing a separate .wasm fetch from within the worklet scope, which would be awkward to configure. The sync build is what Jitsi Meet uses in production. THIRD_PARTY_NOTICES has been updated with the BSD (rnnoise) and Apache 2.0 (@jitsi/rnnoise-wasm) license notices.

Tests

Unit tests cover the full processor lifecycle — init, destroy, idempotent double-destroy, restart, double-init without explicit destroy, concurrent init+destroy race, worklet module deduplication per AudioContext, the 44.1 kHz bypass, worklet load failures, and the stereo/multi-channel downmix policy. The noise suppression policy function has its own tests for all three input combinations. The SettingsModal has tests for rendering, the unsupported-browser state, and localStorage persistence. The Publisher integration tests cover enabling/disabling the processor, the automatic fallback on setup failure, the native suppression policy on track restart, and preset propagation to an already-active processor.

On the Playwright side, two end-to-end scenarios were added to the sticky call spec: a rejoin-after-disconnect stability test and a microphone device-switch stability test, both with RNNoise enabled. These run on Chromium with fake media devices and check that no RNNoise or AudioWorklet errors appear in the console across the scenario.

Known limitations

48 kHz only. If the AudioContext is at a different sample rate, RNNoise is silently skipped and native suppression is used. A future improvement would be to add a resampler, but that adds complexity and the 48 kHz case covers essentially all desktop browsers.

~10 ms added latency. The 480-sample frame size at 48 kHz introduces a 10 ms pipeline delay. This is inherent to RNNoise and not something we can avoid without modifying the model. For a voice call this is imperceptible.

Bundle size. @jitsi/rnnoise-wasm's sync build adds roughly 200–250 KB to the bundle (base64-encoded WASM). The worklet module is only fetched when the processor is first initialized — it's not loaded at startup — so this doesn't affect initial page load time for users who haven't enabled the feature. It does affect the bundle for users who have.

No resampling. Closely related to the 48 kHz limitation above. If we want to support non-standard sample rates in the future, we'd need to add a resampler node in the audio graph.

Preset selector is always visible when enabled. The preset could arguably default to hidden with an "advanced" disclosure, but I kept it visible since three options is not a lot of UI and the labels are self-explanatory.

Notes

Adds opt-in enhanced noise suppression powered by RNNoise (AudioWorklet + WASM). Includes a three-level preset system (Conservative / Balanced / Strong), automatic native suppression coordination, and graceful fallback for unsupported browsers or initialization failures.

Additional notes

AI was used in writing and documentation. More testing is needed, but everything LGTM on my local dev environment. This PR is in reference to issue #714

Element_pr1 Element_pr2

LucaPisl added 27 commits March 3, 2026 00:54
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
…ting

Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
The rnnoise C library (xiph/rnnoise) is compiled to WebAssembly inside
@jitsi/rnnoise-wasm and bundled verbatim into the Element Call web app
bundle. The rnnoise BSD 3-Clause license requires that its copyright
notice be reproduced in documentation or other materials provided with
binary distributions.

@jitsi/rnnoise-wasm does not carry a NOTICE file nor embed the upstream
BSD notice in its generated JS/WASM artefacts, so Element Call — as the
distributor of the binary — must supply the attribution itself.

Add THIRD_PARTY_NOTICES at the repository root containing:
- xiph/rnnoise BSD 3-Clause notice (Mozilla, Jean-Marc Valin,
  Xiph.Org Foundation, Mark Borgerding)
- @jitsi/rnnoise-wasm Apache 2.0 notice (ESTOS GmbH, BlueJimp SARL)

Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
- Fix copyright header in RNNoiseProcessor.ts (was "2025 New Vector Ltd.",
  all other files in the feature use "2026 Element Creations Ltd.")
- Pin @jitsi/rnnoise-wasm to exact version 0.2.1 (was "^0.2.1") to
  prevent unexpected upstream WASM changes being pulled automatically
- Add double-init guard to RNNoiseProcessor.init(): tears down existing
  nodes before re-initialising so callers need not explicitly destroy first
- Add post-await destroyed check in init() to abort cleanly if a
  concurrent destroy() ran during worklet registration
- Add pendingWorkletRegistrations WeakMap mutex in ensureWorkletRegistered()
  to prevent concurrent addModule() calls on the same AudioContext
- Add comment to createWorkletCode() warning that it is the test-harness
  copy of the worklet and must be kept in sync with RNNoiseWorkletModule.ts
- Add three new unit tests covering: double-init node cleanup, concurrent
  ensureWorkletRegistered() deduplication, concurrent init+destroy safety

Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
The @typescript-eslint/promise-function-async rule requires functions
that return Promise values to be declared async. While the runtime
behaviour is equivalent, adding async keeps the linter clean and makes
the intent explicit.

Signed-off-by: LucaPisl <luca.pislaru@gmail.com>
@LucaPisl LucaPisl requested a review from a team as a code owner March 3, 2026 21:58
@LucaPisl LucaPisl requested a review from toger5 March 3, 2026 21:58
@fkwp
Copy link
Copy Markdown
Contributor

fkwp commented Mar 3, 2026

Hi and thx for your contribution,

have you explored https://github.com/mezonai/mezon-noise-suppression as an alternative and way more superior speech enhancement solution? Also the integration should be way more straight forward.

@LucaPisl
Copy link
Copy Markdown
Author

LucaPisl commented Mar 3, 2026

Hi! I didn't explore it. I've been tracking the aforementioned issue for a while now and had my eyes set on RNNoise, mainly due to the discussions in the issue. I have to admit that I haven't really explored alternatives (except for DeepFilterNet, which, as mentioned in the issue, is not viable due to size constraints). I figured RNN is good for element-call, seeing as it's used by Jitsi too.

As for the implementation, it's mainly a lot of rewriting to ensure consistency which I didn't bother to squash. This is more meant to serve as a building block for noise suppression in Element, as it's something that's been lacking for a while now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants