End2End sound localization model

This is a fork of bingo-todd's repository trying to replicate the results of the following paper: P. Vecchiotti, N. Ma, S. Squartini, and G. J. Brown, “End-to-end binaural sound localisation from the raw waveform,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 451–455.

Only WaveLoc-GTF is implemented, which is one of the two models proposed in the paper.

bingo-todd reported slightly different results from those presented in the paper (see below). This fork also includes some very minor changes with respect to bingo-todd's version, which were necessary to make the code run correctly, as well as a fix for a bug where the training dataset leaked in some cases.

The results obtained here are slightly different from those reported by bingo-todd. This could be due to the minor code changes, but, given how small the changes are, it could also be due to differences in the architecture used for training.

Model

Requirements

You need:

Python 3.7
BasicTools (in bingo-todd's other repository) (included here as submodule)
A number of dependencies included in the requirements.txt file (e.g., TensorFlow 1.14, pysofar etc)
To train and evaluate you need:
- A copy of the TIMIT dataset (you have to secure this yourself)
- The Surrey RealRoomBRIRs dataset (included here as submodule)

Installation

Download this repository recursively:

git clone --recursive https://github.com/enzodesena/WaveLoc.git
cd WaveLoc

Dev container quick start (Cursor / VS Code)

This repository includes a .devcontainer/ setup so you can run the project inside Docker while keeping your editor on the host.

Install and start Docker Desktop.
Open the WaveLoc folder in Cursor or VS Code.
Run Dev Containers: Reopen in Container from the Command Palette.
Wait for the initial build and dependency install to finish.

Notes:

The dev container uses python:3.7-bullseye.
On Apple Silicon, it runs as linux/amd64 for compatibility with this project.
After opening in the container, terminals and Python execution run inside Docker.

Manual setup

Start Docker (Apple Silicon users only)

If you haven't already done so already, install Docker (you will need Homebrew installed to be able to run this):

brew install --cask docker
open -a docker

Then start a Docker container using Python 3.7 (make sure you are within the WaveLoc folder):

docker run --platform linux/amd64 -it --rm \
  -v "$PWD":/work \
  -w /work \
  python:3.7-bullseye bash

Download the data and extract TIMIT

Obtain the TIMIT dataset and unzip it into WaveLoc/data/external/darpa-timit-acousticphonetic-continuous-speech in such a way that the files are for instance in WaveLoc/data/external/darpa-timit-acousticphonetic-continuous-speech/data/Test/DR1/... .

Install dependencies (not needed if using dev container)

apt-get update
apt-get install -y pkg-config libhdf5-dev libnetcdf-dev gcc g++ gfortran
conda create -n waveloc python=3.7 	# Not needed if using docker
conda activate waveloc 							# Not needed if using docker
pip install -r requirements.txt

Generating dataset and training model

(cd gen_dataset && ./run.sh)  # This takes a few hours
python train_mct.py           # This also takes a few hours

Training

Dataset

BRIR

Surrey binaural room impulse response (BRIR) database, including anechoic room and 4 reverberation room.

Room A B C D

RT_60(s) 0.32 0.47 0.68 0.89

DDR(dB) 6.09 5.31 8.82 6.12
Sound source (TIMIT database) sentences per azimuth

Train Validate Evaluate

24 6 15

Multi-conditional training(MCT)

For each reverberant room, the rest 3 reverberant rooms and anechoic room are used for training

Training curves

Evaluation

Root mean square error(RMSE) is used as the metrics of performance. For each reverberant room, the evaluation was performed 3 times to get more stable results and the test dataset was regenerated each time.

Since binaural sound is directly fed to models without extra preprocess and there may be short pulses in speech, the localization result was reported based on chunks rather than frames. Each chunk consisted of 25 consecutive frames.

Paper results vs original

Reverberant room	A	B	C	D
Results of this repository	1.7	2.0	1.0	2.7
Bingo Todd's result	1.5	2.0	1.4	2.7
Result in paper	1.5	3.0	1.7	3.5

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
.vscode		.vscode
BasicTools @ 6e99d56		BasicTools @ 6e99d56
Gammatone-filters @ e125c26		Gammatone-filters @ e125c26
data/external		data/external
examples		examples
gen_dataset		gen_dataset
images		images
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
WaveLoc.py		WaveLoc.py
evaluate_mct.py		evaluate_mct.py
inference_example.py		inference_example.py
requirements.txt		requirements.txt
train_mct.py		train_mct.py
wav_tools_WaveLoc.py		wav_tools_WaveLoc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End2End sound localization model

Model

Requirements

Installation

Dev container quick start (Cursor / VS Code)

Manual setup

Start Docker (Apple Silicon users only)

Download the data and extract TIMIT

Install dependencies (not needed if using dev container)

Generating dataset and training model

Training

Dataset

Multi-conditional training(MCT)

Evaluation

Paper results vs original

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Room	A	B	C	D
RT_60(s)	0.32	0.47	0.68	0.89
DDR(dB)	6.09	5.31	8.82	6.12

Train	Validate	Evaluate
24	6	15

Folders and files

Latest commit

History

Repository files navigation

End2End sound localization model

Model

Requirements

Installation

Dev container quick start (Cursor / VS Code)

Manual setup

Start Docker (Apple Silicon users only)

Download the data and extract TIMIT

Install dependencies (not needed if using dev container)

Generating dataset and training model

Training

Dataset

Multi-conditional training(MCT)

Evaluation

Paper results vs original

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages