Skip to content

ETH-DISCO/SepACap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SepACap: Source Separation For A Cappella Music

SepACap Model Architecture

Model Repository: You can find the checkpoint and configuration file of our model on Huggingface

Model Description

We address multi-singer separation in a cappella music where the number of active singers varies. We use power set-based data augmentation to expand training data and introduce SepACap, an adaptation of SepReformer with periodic activations and a composite loss function that handles silent stems. On the JaCappella dataset, our approach achieves state-of-the-art performance for both full-ensemble and subset separation scenarios.

Getting Started

Installation

# Clone the repository
git clone https://github.com/Tino3141/Separator.git
cd Separator

# Install dependencies
pip install -r requirements.txt

Quick Start

import torch
import torchaudio
from src.model import Model
from src.utils import util_system

# 1. Load the configuration
config = util_system.parse_yaml("configs/modelMusicSep.yaml")["config"]

# 2. Initialize the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = Model(**config["model"]).to(device)

# 3. Load checkpoint
checkpoint = torch.load("path/to/checkpoint.pth", map_location=device)
model.load_state_dict(checkpoint["model_state"], strict=False)
model.eval()

# 4. Load and process audio
audio, sr = torchaudio.load("input_audio.wav")

# Resample if needed (model expects 24kHz for music separation)
if sr != 24000:
    resampler = torchaudio.transforms.Resample(sr, 24000)
    audio = resampler(audio)

# Ensure mono and add batch dimension
if audio.shape[0] > 1:
    audio = audio.mean(dim=0, keepdim=True)
audio = audio.squeeze(0)  # Remove channel dimension

# 5. Inference
with torch.no_grad():
    audio_input = audio.to(device).unsqueeze(0)  # Add batch dimension
    separated_sources, aux_outputs = model(audio_input)

# 6. Save separated sources
# separated_sources is a list of tensors, one per source
stem_names = ['alto', 'bass', 'finger_snap', 'lead_vocal',
              'soprano', 'tenor', 'vocal_percussion']

for i, stem_name in enumerate(stem_names):
    output_audio = separated_sources[i].cpu().squeeze()
    torchaudio.save(
        f"output_{stem_name}.wav",
        output_audio.unsqueeze(0),  # Add channel dimension
        24000  # Sample rate
    )

Command-Line Inference

For batch processing, use the evaluation script:

python scripts/evalSepReformer.py \
    --config configs/modelMusicSep.yaml \
    --checkpoint path/to/checkpoint.pth \
    --dataset Tino3141/jaCappellaPowerTest \
    --split test_p10 \
    --input_rate 48000 \
    --model_rate 24000 \
    --csv output_metrics.csv

Citation

If you use this model in your research, please cite:

@misc{lanzendörfer2025sourceseparationcappellamusic,
      title={Source Separation for A Cappella Music}, 
      author={Luca A. Lanzendörfer and Constantin Pinkl and Florian Grötschla},
      year={2025},
      eprint={2509.26580},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2509.26580}, 
}

Links

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages