How to Load Pre-trained Models

from model import CLAPEncoder

model = CLAPEncoder()
model.load_pretrained()

Training Method

Step 1. Dataset Preparation

The captions of AudioCaps 2.0 are included as a submodule, so just run the following command:

git submodule update --init --recursive
cd data
python3 remove_cr.py

Place the wav files in data/wav. You can find the download request link on the AudioCaps GitHub page(here).

The RIR dataset is generated via simulation in this project:

cd data/rir_generator
python3 main.py

For event labels used in pre-training, download the labels from the AudioSet(here) page and place them under data/audioset as follows:

data
└── audioset
    ├── balanced_train_segments.csv
    ├── eval_segments.csv
    └── unbalanced_train_segments.csv

Then, generate the tag data:

cd data/event_label
python3 get_info.py
python3 convert_to_tag.py

Download monoraul CLAP model:

mkdir -p data/ckpt
cd data/ckpt
wget https://huggingface.co/lukewys/laion_clap/resolve/main/music_speech_audioset_epoch_15_esc_89.98.pt

Step 2. Pre-training the Spatial Information Encoder

We pre-train the spatial information encoder using the sound event localization and detection (SELD) task.

cd pretrain_spatial_encoder
python3 train.py

Step 3. Training CLAP

Next, train CLAP with the following command:

python3 train.py

Citation

If you use SpatialCLAP in your research, please cite the following paper:

@article{seki2025spatial,
  title={Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions},
  author={Seki, Kentaro and Okamoto, Yuki and Yamaoka, Kouei and Saito, Yuki and Takamichi, Shinnosuke and Saruwatari, Hiroshi},
  journal={arXiv preprint arXiv:2509.14785},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
pretrain_spatial_encoder		pretrain_spatial_encoder
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
audio_dataset.py		audio_dataset.py
collate.py		collate.py
encoder.py		encoder.py
feature_extractor.py		feature_extractor.py
generate_caption.py		generate_caption.py
htsat.py		htsat.py
loss.py		loss.py
model.py		model.py
rir_dataset.py		rir_dataset.py
seldnet.py		seldnet.py
text_encoder.py		text_encoder.py
train.py		train.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to Load Pre-trained Models

Training Method

Step 1. Dataset Preparation

Step 2. Pre-training the Spatial Information Encoder

Step 3. Training CLAP

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

How to Load Pre-trained Models

Training Method

Step 1. Dataset Preparation

Step 2. Pre-training the Spatial Information Encoder

Step 3. Training CLAP

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages