A simple neural network to compute mouth shape from audio. Data and idea taken from Magicboomliu/Viseme-Classification.
This project is a complete rewrite and imrpovment over Magicboomliu/Viseme-Classification. With better data resampling, cleanup and code quality. Also it's is in active development to improve upon the original accuracy.
- Download and put the
DataSetfolder from Viseme-Classification/DataSet into the root folder - Run
prepare_data.py. This will generate aDataSet/mel.npyandDataSet/label.npy. These are the dataset used for training - Run
baseline.pyto get a baseline accuracy for low-effot models (using sklearn, to evaulate initial performance for different ML algorithms) - Run
train.pyto generatemodel.pthwhich is trained by ptroch
For me, the baseline (neural network) yields 63% accuracy on testing set
Train Accuracy: 0.6464355131983196
Test Accuracy: 0.624623871614844
While the PyTorch one yields a stable 63% accuracy (the difference is statistically significant)
Epoch: 600, Loss: 1.7374, Test Accuracy: 0.6387, Test Loss: 2.1781
Lowest Loss: 2.1706
Best test Accuracy: 0.6387