Voice Activity Detection system

Visual Voice Activity Detection (VAD) system based on PyTorch. The system is based on the output of 3DI.

The method takes the canonical landmarks produced by 3DI and detects moments where the person in the video is speaking and moments where the person is silent.

Quickstart

You can immediately test the method by running

python demo.py

Installation

It is advised that you create and run a virtual environment as follows

python3 -m venv env
source ./env/bin/activate

The required pip packages can be installed by running

pip install -r requirements.txt

Testing

The code can be run on a single video through the script VAD.py which has three required parameters as follows

python VAD.py --file_lmks=#LMKS_FILE# --file_video_in=#INPUT_VIDEO# --file_video_out=#OUTPUT_VIDEO#

As seen, the code requires three arguments: The canonical landmarks file produced by 3DI (see here), the input video path and the output video path. Example command is:

python VAD.py --file_lmks=./data/test/input/CNN1.canonical_lmks --file_video_in=./data/test/input/CNN1.mp4 --file_video_out=./data/test/CNN1_output.mp4

There is also a script to run two videos from a dyadic interaction (VAD_dyadic.py) and merge the produced results (e.g., see video on top of this README file). To run a demo of VAD_dyadic.py, you can run

python demo.py

Training

To perform training, you first need to download the pre-processed training data from this link: (https://sariyanidi.com/wp-content/uploads/2023/11/VAD_train_data.zip). This dataset contains canonicalized landmarks files obtaind by processing the (publicly available) Visual Voice Activity Detection dataset released by Guy et al. (INRIA)..

Then, you need to unzip the data into the data folder so that the following directories are created and populated:

./data/VAD_train_data/raw/pos_class
./data/VAD_train_data/raw/neg_class

Then, you need to (once) run the following script to prepare the data for training:

python create_training_data.py

Finally, you can simply run the train.py script as

python train.py

Your trained model should be saved to the directory ./models/checkpoints.

Acknowledgments

We thank Guy et al. from INRIA for collecting and publicly releasing the Visual Voice Activity Detection dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Activity Detection system

Quickstart

Installation

Testing

Training

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
models		models
LICENSE		LICENSE
README.md		README.md
VAD.py		VAD.py
VAD_dyadic.py		VAD_dyadic.py
create_training_data.py		create_training_data.py
demo.py		demo.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Voice Activity Detection system

Quickstart

Installation

Testing

Training

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages