Speech Emotion Recognition (SER) Application: Audio-based Neural Network for Affective Multiclass Analysis and Labelling using Artificial Intelligence [ANNAMALAI]
This project provides a simple graphical interface for uploading .mp3 or .wav speech audio files and predicting the emotional content of their five-second segments.
Read our report here.
First and foremost, download the model weights. If not, the GUI will not analyze your uploaded speech audio file.
- Download
final_model.zipfrom our Microsoft Teams. - Move
final_model.zipto thegui/directory, if it is not downloaded and saved there already. - Extract the contents of
final_model.zipinto the currentgui/directory. - Once the extraction is complete, there should be
best_meta_ffnn_model.pt,checkpoint-22112/andcheckpoint-55280/. These files and folders contain the model weights. - You may delete
final_model.zip.
Follow the steps below to set up and run the project.
It is recommended to use a virtual environment to avoid dependency conflicts.
- Install virtualenv if not already installed:
pip install virtualenv- Create a new virtual envionment named 'venv':
virtualenv venv- Activate the virtual environment:
source venv/bin/activate- Install project dependencies:
pip install -r requirements.txtOnce the environment is ready, follow these steps to launch the GUI:
-
Ensure you're inside the virtual environment.
If not, activate it using:source venv/bin/activate -
From the project root directory, navigate to the GUI folder:
cd gui -
Start the application:
python app.py
-
On first run, wait up to 1 minute for the backend to initialize.
-
Important: Clear any saved data and cookies from
localhostin your browser settings. -
Open
index.htmlin your browser (e.g., drag-and-drop into your browser or useFile > Open). -
Upload a
.mp3or.wavspeech audio file, and click Submit to process and view the predicted emotions for five-second segments. -
To analyze another file, refresh the page, then repeat step 7.
When you are done:
deactivateIf you wish to replicate our experiments and/or work on the combined dataset we worked on, you may download it.
- Download
dataset.ziphere. - Move
dataset.zipto thedata/directory, if it is not downloaded and saved there already. All our data preprocessing notebooks assume thatdataset.zipis located there. - Extract the contents of
dataset.zipinto the currentdata/directory. - You may delete
dataset.zip.
ai_project_ser/
│
├── classical_models/ # Our experiments/iterations using classical machine learning models
├── cnn/ # Our experiments/iterations using convolutional neural networks
├── data/ # Datasets required for conducting our experiments
├── ffnn/ # Our experiments/iterations using feed-forward neural networks
├── gui/
│ └── app.py # Flask backend
│ └── index.html # Frontend GUI
│ └── ... # Additional frontend assets
├── README.md # This file
├── requirements.txt # Python dependencies
├── transformer_models/ # Our experiments/iterations using transformer models
└── venv/ # Virtual environment (created after setup)
- For best results, use Google Chrome or Firefox.
- If you're running the app for the first time, initialization may take a little longer due to backend setup.
- Ensure your uploaded audio is clear and within a reasonable duration for better emotion detection accuracy.