Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
78c8a5c
feat(musica): structure-first audio separation via dynamic mincut
claude Apr 6, 2026
f4b5c7f
refactor(musica/crowd): use DynamicGraph for local + global graphs
claude Apr 6, 2026
fa217ef
enhance(musica/lanczos): add batch_lanczos with cross-frame alignment
claude Apr 6, 2026
a8a49ce
enhance(musica/hearing_aid): improve binaural pipeline with mincut re…
claude Apr 6, 2026
ad15840
docs(musica): comprehensive README with benchmarks and competitive an…
claude Apr 6, 2026
f1f84e7
feat(musica): add 6 enhancement modules — 55 tests passing
claude Apr 6, 2026
88c81b7
feat(musica/wasm): add browser demo with drag-and-drop separation UI
claude Apr 6, 2026
dda313e
feat(musica): HEARmusica — Rust hearing aid DSP framework (Tympan port)
claude Apr 6, 2026
3181df5
feat(musica): 8-part benchmark suite + HEARmusica pipeline benchmarks
claude Apr 6, 2026
46a1ffe
feat(musica): add enhanced separator, evaluation module, and adaptive…
claude Apr 6, 2026
4ffd2a8
feat(musica): add candle-whisper transcription integration (ADR-144)
claude Apr 6, 2026
24d522e
feat(musica): add real audio evaluation with public domain WAV files
claude Apr 6, 2026
a5e656b
perf(musica): optimize critical hot loops across 5 modules
claude Apr 6, 2026
88b09bc
feat(musica): add advanced SOTA separator with Wiener filtering, casc…
claude Apr 6, 2026
2725ff7
fix(musica): adaptive quality selection in advanced separator
claude Apr 6, 2026
c1f8922
feat(musica): add instantaneous frequency graph edges for close-tone …
claude Apr 6, 2026
ebc93e7
refactor(musica): best-of-resolutions strategy replaces lossy mask in…
claude Apr 6, 2026
bdab2db
feat(musica): multi-exponent Wiener search and energy-balanced qualit…
claude Apr 6, 2026
4b6592a
feat(musica): SOTA push — 8 major improvements across all modules
claude Apr 6, 2026
2657235
feat(musica): terminal visualizer, weight optimization, multi-source …
claude Apr 7, 2026
06d97e5
feat(musica): STFT padding, Lanczos batch improvements, WASM bridge c…
Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
242 changes: 242 additions & 0 deletions docs/adr/ADR-143-hearmusica-tympan-rust-port.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,242 @@
# ADR-143: HEARmusica — High-Fidelity Rust Port of Tympan Open-Source Hearing Aid

## Status
Accepted

## Date
2026-04-06

## Context

Tympan is an MIT-licensed open-source hearing aid platform built on Arduino/Teensy (ARM Cortex-M7, 600 MHz). Its `AudioStream_F32` abstraction provides a block-graph processing pipeline with ~20 DSP algorithms including WDRC compression, feedback cancellation, and biquad filtering.

The musica project already implements graph-based audio separation (Fiedler vector + dynamic mincut) with sub-millisecond latency. Combining Tympan's proven hearing aid DSP chain with musica's novel separation engine creates a system no commercial hearing aid can match: explainable, graph-based source separation integrated into a complete hearing aid pipeline.

### Why Rust?

| Concern | Tympan (C++) | HEARmusica (Rust) |
|---------|-------------|-------------------|
| Memory safety | Manual (buffer overruns possible) | Compile-time guaranteed |
| Concurrency | Interrupt-based (race conditions possible) | Ownership model prevents data races |
| Targets | Teensy only | Embedded (`no_std`), WASM, desktop, cloud |
| Regulatory | Hard to formally verify | Ownership + type system aids certification |
| Performance | Good (ARM CMSIS-DSP) | Equal or better (LLVM auto-vectorization) |

### Why Not Fork OpenMHA?

OpenMHA has 80+ plugins and NAL-NL2 fitting — far richer algorithm library. However:
- **AGPL v3 license** — any derivative must be open-sourced, killing commercial products
- **Complex architecture** — AC variables, template plugins, JACK dependency fight Rust's ownership model
- **200K+ LOC** — porting is impractical; clean-room reimplementation required for any algorithm

Tympan's MIT license and simple `update()` pattern make it the right porting target.

## Decision

Create **HEARmusica** as a Rust hearing aid DSP framework within the musica example crate, porting Tympan's core blocks with a Rust-idiomatic `AudioProcessor` trait and integrating musica's graph-based separation as a first-class processing block.

### Architecture

```
┌─────────────────────────────────────────────────────────┐
│ HEARmusica Pipeline │
├─────────────────────────────────────────────────────────┤
│ │
│ Input (L/R mic) │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌───────────┐ ┌─────────��────────┐ │
│ │ Biquad │──▶│ Feedback │──▶│ Graph Separator │ │
│ │ Prefilter│ │ Canceller │ │ (Fiedler+MinCut) │ │
│ └──────────┘ └───────────┘ └────────┬─────────┘ │
│ │ │
│ ┌─────────────┼──────────┐ │
│ ▼ ▼ │ │
│ [speech] [noise] │ │
│ │ │ │
│ ▼ │ │
│ ┌──────────────────┐ │ │
│ │ Multi-Band WDRC │ │ │
│ │ Compressor │ │ │
│ └────────┬──────���──┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌─���────────────────┐ │ │
│ │ Audiogram Gain │ │ │
│ │ (NAL-R/half-gain)│ │ │
│ └────────┬───���─────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌──���───────────────┐ ��� │
│ │ Limiter/Output │ │ │
│ └──��─────┬─────────┘ │ │
│ │ │ │
│ ▼ │ │
│ Output (L/R) │ │
│ │ │
└───────��─────────────────────────────────────────────┘
```

### Core Trait (Tympan `AudioStream_F32` → Rust)

```rust
/// Audio processing block — the fundamental unit of HEARmusica.
/// Maps to Tympan's AudioStream_F32 with Rust ownership semantics.
pub trait AudioProcessor: Send {
/// Configure for given sample rate and block size.
/// Called once before processing starts (maps to OpenMHA's prepare()).
fn prepare(&mut self, sample_rate: f32, block_size: usize);

/// Process one block of audio in-place.
/// MUST be real-time safe: no allocation, no locks, no syscalls.
fn process(&mut self, block: &mut AudioBlock);

/// Release resources (maps to OpenMHA's release()).
fn release(&mut self) {}

/// Human-readable name for debugging and replay logging.
fn name(&self) -> &str;

/// Current latency contribution in samples.
fn latency_samples(&self) -> usize { 0 }
}
```

### AudioBlock

```rust
/// Stereo audio block — the data unit passed between processors.
pub struct AudioBlock {
pub left: Vec<f32>,
pub right: Vec<f32>,
pub sample_rate: f32,
pub block_size: usize,
pub metadata: BlockMetadata,
}

pub struct BlockMetadata {
pub frame_index: u64,
pub timestamp_us: u64,
pub speech_mask: Option<Vec<f32>>, // Set by separator
pub noise_estimate: Option<Vec<f32>>, // Set by noise estimator
}
```

### Processing Blocks (Tympan Port)

| Block | Tympan Source | Rust Module | Key Algorithm |
|-------|-------------|-------------|---------------|
| `BiquadFilter` | `AudioFilterBiquad_F32` | `filter.rs` | IIR biquad (low/high/band/notch/allpass/peaking/shelf) |
| `WDRCompressor` | `AudioEffectCompressor_F32` | `compressor.rs` | Multi-band WDRC with attack/release/ratio/knee |
| `FeedbackCanceller` | `AudioEffectFeedbackCancel_F32` | `feedback.rs` | Normalized LMS adaptive filter |
| `GainProcessor` | `AudioEffectGain_F32` | `gain.rs` | Linear/dB gain + audiogram-shaped frequency response |
| `DelayLine` | `AudioEffectDelay_F32` | `delay.rs` | Sample-accurate circular buffer delay |
| `Mixer` | `AudioMixer_F32` | `mixer.rs` | Weighted sum of N inputs |
| `Limiter` | (custom) | `limiter.rs` | Brick-wall limiter with lookahead |

### Novel Blocks (Musica Integration)

| Block | Module | Key Algorithm |
|-------|--------|---------------|
| `GraphSeparator` | `separator_block.rs` | Fiedler vector + dynamic mincut from musica |
| `BinauralEnhancer` | Uses `hearing_aid.rs` | ILD/IPD/IC features + speech scoring |
| `NeuralRefiner` | Uses `neural_refine.rs` | Tiny MLP mask refinement |

### Pipeline Runner

```rust
pub struct Pipeline {
blocks: Vec<Box<dyn AudioProcessor>>,
sample_rate: f32,
block_size: usize,
}

impl Pipeline {
pub fn new(sample_rate: f32, block_size: usize) -> Self;
pub fn add(&mut self, block: Box<dyn AudioProcessor>);
pub fn prepare(&mut self);
pub fn process_block(&mut self, block: &mut AudioBlock);
pub fn total_latency_samples(&self) -> usize;
pub fn total_latency_ms(&self) -> f32;
}
```

### File Structure

```
docs/examples/musica/src/hearmusica/
├── mod.rs — Module root, re-exports, Pipeline struct
├── block.rs — AudioProcessor trait, AudioBlock, BlockMetadata
├── compressor.rs — Multi-band WDRC compressor
├── feedback.rs — NLMS adaptive feedback canceller
├── filter.rs — Biquad IIR filter (all standard types)
├── gain.rs — Gain processor + audiogram fitting (NAL-R)
├── limiter.rs — Brick-wall output limiter
├── delay.rs — Sample-accurate delay line
├── mixer.rs — Weighted N-input mixer
├── separator_block.rs — Graph separator as AudioProcessor
├── presets.rs — Pre-built pipeline configurations
```

### Preset Pipelines

```rust
/// Standard hearing aid: prefilter → feedback cancel → WDRC → audiogram gain → limiter
pub fn standard_hearing_aid(audiogram: &Audiogram) -> Pipeline;

/// Speech-in-noise: prefilter → feedback cancel → graph separator → WDRC (speech only) → gain → limiter
pub fn speech_in_noise(audiogram: &Audiogram) -> Pipeline;

/// Music mode: prefilter → wideband gentle compression → gain → limiter (minimal processing)
pub fn music_mode(audiogram: &Audiogram) -> Pipeline;

/// Maximum clarity: prefilter → feedback cancel → graph separator → neural refine → WDRC → gain → limiter
pub fn maximum_clarity(audiogram: &Audiogram) -> Pipeline;
```

## Performance Targets

| Metric | Target | Tympan Reference |
|--------|--------|-----------------|
| Block latency | < 0.5 ms per block | ~1.3 ms (block 16 @ 24 kHz) |
| Total pipeline latency | < 4 ms | 5.7 ms measured |
| Memory usage | < 64 KB working set | ~50 KB on Teensy |
| Binary size (WASM) | < 200 KB | N/A |
| Sample rates | 8-96 kHz | 8-96 kHz |
| Block sizes | 16-256 samples | 1-128 samples |

## Testing Strategy

1. **Unit tests per block** — Verify frequency response, gain curves, convergence
2. **Pipeline integration tests** — End-to-end with synthetic signals
3. **Latency validation** — Every block stays within budget
4. **Preset validation** — Each preset processes without clipping or artifacts
5. **Comparison test** — Same input through Tympan WDRC params vs HEARmusica, verify SDR > 30 dB

## Consequences

### Positive
- MIT-licensed Rust hearing aid DSP — first of its kind
- Runs everywhere (MCU, WASM, desktop) from single codebase
- Graph-based separation integrated as native pipeline block
- Fully auditable for FDA/CE regulatory compliance
- Sub-millisecond block processing enables ultra-low-latency configurations

### Negative
- Initial algorithm library is smaller than OpenMHA (8 blocks vs 30+ plugins)
- No hardware board (depends on external audio I/O)
- Beamforming requires multi-mic arrays (not in scope for v1)

### Risks
- WDRC parameter tuning requires audiological expertise
- Real-world validation needs clinical testing with hearing-impaired users
- Feedback cancellation convergence depends on acoustic coupling

## References

- Tympan Library: https://github.com/Tympan/Tympan_Library (MIT)
- OpenAudio ArduinoLibrary: https://github.com/chipaudette/OpenAudio_ArduinoLibrary
- ANSI S3.22 Hearing Aid Testing Standard
- NAL-R Prescription Rule (Byrne & Dillon, 1986)
- WDRC: Villchur (1973), compression ratios and kneepoints
- NLMS Adaptive Filtering: Haykin, Adaptive Filter Theory
126 changes: 126 additions & 0 deletions docs/adr/ADR-144-candle-whisper-musica-transcription.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# ADR-144: Candle-Whisper Integration with Musica for Pure-Rust Transcription

## Status
Accepted

## Date
2026-04-06

## Context

Musica performs audio source separation via dynamic mincut graph partitioning, producing clean per-source audio tracks. The natural next step is transcription — converting separated speech to text. Current transcription systems (Whisper, Deepgram) suffer significant accuracy degradation with overlapping speakers and background noise:

- **Clean speech**: ~5% WER (Word Error Rate)
- **2 overlapping speakers**: ~25-35% WER
- **Cocktail party (4+ speakers + noise)**: ~40-60% WER

By separating sources first with Musica, then transcribing each clean track independently, we can maintain near-clean-speech accuracy even in challenging scenarios.

### Why candle-whisper over whisper-rs?

| Criterion | candle-whisper | whisper-rs |
|-----------|---------------|------------|
| **Language** | Pure Rust | C++ FFI bindings |
| **Build** | `cargo build` only | Needs C++ compiler + cmake |
| **Dependencies** | candle-core/nn/transformers | whisper.cpp (compiled from source) |
| **Cross-compile** | Easy (pure Rust) | Hard (C++ toolchain per target) |
| **WASM potential** | Possible via candle WASM | Not feasible (C++ FFI) |
| **Inference speed** | 1.5-3x slower on CPU | Fastest (GGML optimized) |
| **GPU support** | CUDA + Metal via features | CUDA + Metal + CoreML |
| **Alignment** | Matches Musica's zero-C-dep philosophy | External C++ dependency |

**Decision**: Use candle-whisper for architectural purity. The speed penalty is acceptable because:
1. Musica's separation is the bottleneck, not transcription
2. The `tiny` model (39M params) runs 5-10x real-time even via candle on CPU
3. Pure Rust enables WASM deployment for browser-based transcription
4. No cmake/C++ build complexity

## Decision

Integrate candle-whisper as an optional feature (`transcribe`) in Musica, providing:

1. **TranscriberConfig** — model size, language, task (transcribe/translate), beam size
2. **Transcriber** — loads Whisper model via candle, accepts `&[f32]` PCM at 16kHz
3. **TranscriptionResult** — segments with text, timestamps, confidence
4. **Pipeline integration** — `separate_and_transcribe()` combining Musica + Whisper
5. **Before/after benchmark** — measures SNR improvement and simulated WER reduction

### Architecture

```
┌─────────────────┐
Raw Mixed Audio ──> │ Musica Separator │
│ (graph mincut) │
└──┬──┬──┬──┬──────┘
│ │ │ │
Speaker1 │ │ │ │ Noise
Speaker2 │ │ │ (discard)
Speaker3 │ │
▼ ▼ ▼
┌─────────────────┐
│ candle-whisper │
│ (per-track) │
└──┬──┬──┬────────┘
│ │ │
▼ ▼ ▼
Transcript per speaker
with timestamps + confidence
```

### Audio Format Flow

```
Musica output: Vec<f64> (any sample rate)
→ resample to 16kHz if needed
→ cast f64 → f32
→ pad/trim to 30-second chunks
→ feed to Whisper encoder
→ decode tokens → text segments
```

### Feature Flag Design

```toml
[features]
transcribe = ["candle-core", "candle-nn", "candle-transformers"]
```

When `transcribe` is disabled, the module compiles as a stub with the same public API but returns a "candle not available" error. This keeps the base Musica build lightweight.

## Consequences

### Positive
- Pure Rust end-to-end: capture → separate → transcribe → index
- No C/C++ toolchain required
- WASM-deployable transcription pipeline
- Dramatically improved transcription accuracy via pre-separation
- Optional dependency — doesn't bloat base build

### Negative
- candle inference ~1.5-3x slower than whisper.cpp on CPU
- Model weights must be downloaded at runtime (~75MB for tiny, ~500MB for base)
- candle ecosystem less mature than PyTorch/whisper.cpp
- Large dependency tree when enabled (~50 crates)

### Mitigations
- Default to `tiny` model for real-time use cases
- Cache model weights locally after first download
- GPU acceleration via `cuda`/`metal` feature flags when available
- Benchmark to validate acceptable latency

## Performance Targets

| Metric | Target | Notes |
|--------|--------|-------|
| WER (clean, tiny model) | <8% | Baseline Whisper tiny accuracy |
| WER (separated track) | <12% | After Musica separation |
| WER (raw mixed, no separation) | >30% | Demonstrates improvement |
| Inference RTF (tiny, CPU) | <0.2x | 5x faster than real-time |
| Separation + transcription latency | <5s per 30s audio | End-to-end |

## References

- [candle](https://github.com/huggingface/candle) — HuggingFace's minimalist Rust ML framework
- [candle-whisper example](https://github.com/huggingface/candle/tree/main/candle-examples/examples/whisper)
- [OpenAI Whisper](https://github.com/openai/whisper) — Original model
- ADR-143: HEARmusica Tympan Rust Port
Loading
Loading