SepACap: Source Separation For A Cappella Music

Model Repository: You can find the checkpoint and configuration file of our model on Huggingface

Model Description

We address multi-singer separation in a cappella music where the number of active singers varies. We use power set-based data augmentation to expand training data and introduce SepACap, an adaptation of SepReformer with periodic activations and a composite loss function that handles silent stems. On the JaCappella dataset, our approach achieves state-of-the-art performance for both full-ensemble and subset separation scenarios.

Getting Started

Installation

# Clone the repository
git clone https://github.com/Tino3141/Separator.git
cd Separator

# Install dependencies
pip install -r requirements.txt

Quick Start

import torch
import torchaudio
from src.model import Model
from src.utils import util_system

# 1. Load the configuration
config = util_system.parse_yaml("configs/modelMusicSep.yaml")["config"]

# 2. Initialize the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = Model(**config["model"]).to(device)

# 3. Load checkpoint
checkpoint = torch.load("path/to/checkpoint.pth", map_location=device)
model.load_state_dict(checkpoint["model_state"], strict=False)
model.eval()

# 4. Load and process audio
audio, sr = torchaudio.load("input_audio.wav")

# Resample if needed (model expects 24kHz for music separation)
if sr != 24000:
    resampler = torchaudio.transforms.Resample(sr, 24000)
    audio = resampler(audio)

# Ensure mono and add batch dimension
if audio.shape[0] > 1:
    audio = audio.mean(dim=0, keepdim=True)
audio = audio.squeeze(0)  # Remove channel dimension

# 5. Inference
with torch.no_grad():
    audio_input = audio.to(device).unsqueeze(0)  # Add batch dimension
    separated_sources, aux_outputs = model(audio_input)

# 6. Save separated sources
# separated_sources is a list of tensors, one per source
stem_names = ['alto', 'bass', 'finger_snap', 'lead_vocal',
              'soprano', 'tenor', 'vocal_percussion']

for i, stem_name in enumerate(stem_names):
    output_audio = separated_sources[i].cpu().squeeze()
    torchaudio.save(
        f"output_{stem_name}.wav",
        output_audio.unsqueeze(0),  # Add channel dimension
        24000  # Sample rate
    )

Command-Line Inference

For batch processing, use the evaluation script:

python scripts/evalSepReformer.py \
    --config configs/modelMusicSep.yaml \
    --checkpoint path/to/checkpoint.pth \
    --dataset Tino3141/jaCappellaPowerTest \
    --split test_p10 \
    --input_rate 48000 \
    --model_rate 24000 \
    --csv output_metrics.csv

Citation

If you use this model in your research, please cite:

@misc{lanzendörfer2025sourceseparationcappellamusic,
      title={Source Separation for A Cappella Music}, 
      author={Luca A. Lanzendörfer and Constantin Pinkl and Florian Grötschla},
      year={2025},
      eprint={2509.26580},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2509.26580}, 
}

Links

Model Repository: https://huggingface.co/Tino3141/sepacap

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
images		images
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SepACap: Source Separation For A Cappella Music

Model Description

Getting Started

Installation

Quick Start

Command-Line Inference

Citation

Links

About

Uh oh!

Releases

Packages

Languages

ETH-DISCO/SepACap

Folders and files

Latest commit

History

Repository files navigation

SepACap: Source Separation For A Cappella Music

Model Description

Getting Started

Installation

Quick Start

Command-Line Inference

Citation

Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages