Model Repository: You can find the checkpoint and configuration file of our model on Huggingface
We address multi-singer separation in a cappella music where the number of active singers varies. We use power set-based data augmentation to expand training data and introduce SepACap, an adaptation of SepReformer with periodic activations and a composite loss function that handles silent stems. On the JaCappella dataset, our approach achieves state-of-the-art performance for both full-ensemble and subset separation scenarios.
# Clone the repository
git clone https://github.com/Tino3141/Separator.git
cd Separator
# Install dependencies
pip install -r requirements.txtimport torch
import torchaudio
from src.model import Model
from src.utils import util_system
# 1. Load the configuration
config = util_system.parse_yaml("configs/modelMusicSep.yaml")["config"]
# 2. Initialize the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = Model(**config["model"]).to(device)
# 3. Load checkpoint
checkpoint = torch.load("path/to/checkpoint.pth", map_location=device)
model.load_state_dict(checkpoint["model_state"], strict=False)
model.eval()
# 4. Load and process audio
audio, sr = torchaudio.load("input_audio.wav")
# Resample if needed (model expects 24kHz for music separation)
if sr != 24000:
resampler = torchaudio.transforms.Resample(sr, 24000)
audio = resampler(audio)
# Ensure mono and add batch dimension
if audio.shape[0] > 1:
audio = audio.mean(dim=0, keepdim=True)
audio = audio.squeeze(0) # Remove channel dimension
# 5. Inference
with torch.no_grad():
audio_input = audio.to(device).unsqueeze(0) # Add batch dimension
separated_sources, aux_outputs = model(audio_input)
# 6. Save separated sources
# separated_sources is a list of tensors, one per source
stem_names = ['alto', 'bass', 'finger_snap', 'lead_vocal',
'soprano', 'tenor', 'vocal_percussion']
for i, stem_name in enumerate(stem_names):
output_audio = separated_sources[i].cpu().squeeze()
torchaudio.save(
f"output_{stem_name}.wav",
output_audio.unsqueeze(0), # Add channel dimension
24000 # Sample rate
)For batch processing, use the evaluation script:
python scripts/evalSepReformer.py \
--config configs/modelMusicSep.yaml \
--checkpoint path/to/checkpoint.pth \
--dataset Tino3141/jaCappellaPowerTest \
--split test_p10 \
--input_rate 48000 \
--model_rate 24000 \
--csv output_metrics.csvIf you use this model in your research, please cite:
@misc{lanzendörfer2025sourceseparationcappellamusic,
title={Source Separation for A Cappella Music},
author={Luca A. Lanzendörfer and Constantin Pinkl and Florian Grötschla},
year={2025},
eprint={2509.26580},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2509.26580},
}- Model Repository: https://huggingface.co/Tino3141/sepacap
