This is a fork of bingo-todd's repository trying to replicate the results of the following paper: P. Vecchiotti, N. Ma, S. Squartini, and G. J. Brown, “End-to-end binaural sound localisation from the raw waveform,” in 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 451–455.
Only WaveLoc-GTF is implemented, which is one of the two models proposed in the paper.
bingo-todd reported slightly different results from those presented in the paper (see below). This fork also includes some very minor changes with respect to bingo-todd's version, which were necessary to make the code run correctly, as well as a fix for a bug where the training dataset leaked in some cases.
The results obtained here are slightly different from those reported by bingo-todd. This could be due to the minor code changes, but, given how small the changes are, it could also be due to differences in the architecture used for training.
You need:
- Python 3.7
- BasicTools (in bingo-todd's other repository) (included here as submodule)
- A number of dependencies included in the requirements.txt file (e.g., TensorFlow 1.14, pysofar etc)
- To train and evaluate you need:
- A copy of the TIMIT dataset (you have to secure this yourself)
- The Surrey RealRoomBRIRs dataset (included here as submodule)
Download this repository recursively:
git clone --recursive https://github.com/enzodesena/WaveLoc.git
cd WaveLocThis repository includes a .devcontainer/ setup so you can run the project inside Docker while keeping your editor on the host.
- Install and start Docker Desktop.
- Open the
WaveLocfolder in Cursor or VS Code. - Run
Dev Containers: Reopen in Containerfrom the Command Palette. - Wait for the initial build and dependency install to finish.
Notes:
- The dev container uses
python:3.7-bullseye. - On Apple Silicon, it runs as
linux/amd64for compatibility with this project. - After opening in the container, terminals and Python execution run inside Docker.
If you haven't already done so already, install Docker (you will need Homebrew installed to be able to run this):
brew install --cask docker
open -a dockerThen start a Docker container using Python 3.7 (make sure you are within the WaveLoc folder):
docker run --platform linux/amd64 -it --rm \
-v "$PWD":/work \
-w /work \
python:3.7-bullseye bashObtain the TIMIT dataset and unzip it into WaveLoc/data/external/darpa-timit-acousticphonetic-continuous-speech in such a way that the files are for instance in WaveLoc/data/external/darpa-timit-acousticphonetic-continuous-speech/data/Test/DR1/... .
apt-get update
apt-get install -y pkg-config libhdf5-dev libnetcdf-dev gcc g++ gfortran
conda create -n waveloc python=3.7 # Not needed if using docker
conda activate waveloc # Not needed if using docker
pip install -r requirements.txt(cd gen_dataset && ./run.sh) # This takes a few hours
python train_mct.py # This also takes a few hours-
BRIR
Surrey binaural room impulse response (BRIR) database, including anechoic room and 4 reverberation room.
Room A B C D RT_60(s) 0.32 0.47 0.68 0.89 DDR(dB) 6.09 5.31 8.82 6.12 -
Sound source (TIMIT database) sentences per azimuth
Train Validate Evaluate 24 6 15
For each reverberant room, the rest 3 reverberant rooms and anechoic room are used for training
Training curves
Root mean square error(RMSE) is used as the metrics of performance. For each reverberant room, the evaluation was performed 3 times to get more stable results and the test dataset was regenerated each time.
Since binaural sound is directly fed to models without extra preprocess and there may be short pulses in speech, the localization result was reported based on chunks rather than frames. Each chunk consisted of 25 consecutive frames.
| Reverberant room | A | B | C | D |
|---|---|---|---|---|
| Results of this repository | 1.7 | 2.0 | 1.0 | 2.7 |
| Bingo Todd's result | 1.5 | 2.0 | 1.4 | 2.7 |
| Result in paper | 1.5 | 3.0 | 1.7 | 3.5 |

