Audio Enhancer

Enhance audio quality lost from lossy compression using AI-powered super-resolution. Transform low-bitrate audio (YouTube 127kbps opus/webm) into high-quality FLAC/opus with synthetically generated high frequencies and improved depth.

What It Does

Original FLAC (studio) → 127kbps opus (YouTube) → Audio Enhancer → High-quality FLAC

The pipeline uses AudioSR (neural super-resolution) to synthesize high-frequency content lost during compression. Unlike true reconstruction (which would require recovering the original data), AudioSR generates plausible-sounding high frequencies using patterns learned from training data. The result is then professionally mastered for clean output.

Note: Lossy compression permanently discards high-frequency data. AudioSR "hallucinates" new content that sounds natural - similar to how AI image upscaling generates plausible details rather than recovering the original pixels.

Features

Super Resolution - Neural network (AudioSR) generates synthetic high frequencies to replace those lost in compression
Denoising - Optional noise reduction for noisy recordings (noisereduce + GPU acceleration)
Harmonic Enhancement - Optional harmonic exciter and stereo widening
Final Mastering - Soft limiter, optional loudness normalization, dithering
AMD ROCm Support - Full GPU acceleration on AMD RX 6000/7000 series

Requirements

Python 3.10-3.12
FFmpeg (for audio extraction)
AMD GPU with ROCm 6.2+ or NVIDIA GPU with CUDA
Or Docker (for containerized usage)

Installation

Native Installation

# Clone repository with submodules
git clone --recurse-submodules https://github.com/enrell/audio-enhancer.git
cd audio-enhancer

# Install with uv (recommended)
uv sync

# Set up AudioSR environment (separate venv due to numpy conflicts)
./scripts/setup_audiosr.sh

Docker Installation

git clone --recurse-submodules https://github.com/enrell/audio-enhancer.git
cd audio-enhancer

# Build for your GPU (auto-detected)
./scripts/docker-run.sh --help

# Or build manually for specific GPU
docker compose --profile rocm build   # AMD GPU
docker compose --profile cuda build   # NVIDIA GPU
docker compose --profile cpu build    # CPU only

Usage

CLI

# Basic usage - super resolution + mastering (default)
uv run audio-enhancer input.opus -o output.flac

# With optional stages
uv run audio-enhancer input.mp3 -o output.flac --denoise --harmonic

# Enable loudness normalization
uv run audio-enhancer input.opus -o output.flac --normalize

# Output formats: flac, wav, ogg, opus
uv run audio-enhancer input.webm -o output.opus --format opus

GUI

uv run audio-enhancer --gui

Docker

# Auto-detect GPU and process file
./scripts/docker-run.sh input.opus output.flac

# With extra options
./scripts/docker-run.sh input.mp3 output.flac --denoise --normalize

# Or use docker compose directly (place files in input/ folder)
mkdir -p input output
cp myfile.opus input/
docker compose --profile rocm run --rm audio-enhancer-rocm /input/myfile.opus -o /output/myfile.flac

Options

Flag	Description
`--denoise`	Enable denoising (for noisy recordings)
`--harmonic`	Enable harmonic enhancement
`--normalize`	Enable loudness normalization (-14 LUFS)
`--no-super-res`	Skip super-resolution stage
`--format FORMAT`	Output format: flac, wav, ogg, opus (default: flac)
`--sample-rate RATE`	Output sample rate (default: 48000)
`--gui`	Launch GUI mode
`--info`	Show GPU and system info

Pipeline Stages

1. Extraction

Extracts audio from any format (mp3, opus, webm, ogg, etc.) using FFmpeg, resamples to target sample rate.

2. Super Resolution (AudioSR)

Neural network (latent diffusion model) that generates high-frequency content lost during lossy compression. AudioSR synthesizes plausible high frequencies based on patterns learned from high-quality audio training data - it cannot recover the actual original data that was permanently discarded by the codec.

Uses DDIM sampling with configurable steps and guidance scale
Falls back to DSP-based bandwidth extension (harmonic generation) if AudioSR fails or runs out of VRAM

3. Denoising (Optional)

GPU-accelerated noise reduction using noisereduce. Processes audio in chunks to manage VRAM usage.

4. Harmonic Enhancement (Optional)

Harmonic exciter (soft saturation for warmth)
Transient shaping
High-shelf EQ for "air"
Stereo widening

5. Final Mastering

Soft-knee limiter (prevents clipping)
Optional loudness normalization to -14 LUFS
TPDF dithering for bit-depth reduction

GPU Support

AMD ROCm

Tested on RX 6600 (8GB VRAM). Uses chunked processing to fit within VRAM limits.

# Check GPU detection
uv run audio-enhancer --info

NVIDIA CUDA

Should work with CUDA-enabled GPUs. Install PyTorch with CUDA instead of ROCm:

uv pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

Project Structure

audio-enhancer/
├── src/audio_enhancer/
│   ├── core.py              # Pipeline orchestrator
│   ├── gpu.py               # GPU detection (ROCm/CUDA)
│   ├── gui.py               # Tkinter GUI
│   ├── main.py              # CLI entry point
│   └── pipeline/
│       ├── base.py          # Base classes
│       ├── extraction.py    # Audio extraction (FFmpeg)
│       ├── denoise.py       # Noise reduction
│       ├── super_resolution.py  # AudioSR / DSP fallback
│       ├── harmonic.py      # Harmonic enhancement
│       └── mastering.py     # Final mastering + export
├── audiosr_env/             # AudioSR runner script
│   └── run_audiosr.py       # AudioSR subprocess wrapper
├── scripts/
│   ├── setup_audiosr.sh     # AudioSR venv setup script
│   └── docker-run.sh        # Docker auto-detect runner
├── Dockerfile               # Multi-GPU Docker build
├── docker-compose.yml       # Docker Compose profiles
└── pyproject.toml

Troubleshooting

AudioSR runs out of VRAM

AudioSR requires ~7GB VRAM. If it fails, the pipeline automatically falls back to DSP-based bandwidth extension.

"No module named 'pkg_resources'"

cd audiosr_env && uv pip install setuptools

ROCm not detected

Ensure ROCm is installed and HSA_OVERRIDE_GFX_VERSION is set for your GPU:

export HSA_OVERRIDE_GFX_VERSION=10.3.0  # For RX 6600

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
audiosr_env		audiosr_env
scripts		scripts
src/audio_enhancer		src/audio_enhancer
versatile_audio_super_resolution @ d312fba		versatile_audio_super_resolution @ d312fba
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Enhancer

What It Does

Features

Requirements

Installation

Native Installation

Docker Installation

Usage

CLI

GUI

Docker

Options

Pipeline Stages

1. Extraction

2. Super Resolution (AudioSR)

3. Denoising (Optional)

4. Harmonic Enhancement (Optional)

5. Final Mastering

GPU Support

AMD ROCm

NVIDIA CUDA

Project Structure

Troubleshooting

AudioSR runs out of VRAM

"No module named 'pkg_resources'"

ROCm not detected

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Enhancer

What It Does

Features

Requirements

Installation

Native Installation

Docker Installation

Usage

CLI

GUI

Docker

Options

Pipeline Stages

1. Extraction

2. Super Resolution (AudioSR)

3. Denoising (Optional)

4. Harmonic Enhancement (Optional)

5. Final Mastering

GPU Support

AMD ROCm

NVIDIA CUDA

Project Structure

Troubleshooting

AudioSR runs out of VRAM

"No module named 'pkg_resources'"

ROCm not detected

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages