Speech Emotion Recognition (SER) Application: Audio-based Neural Network for Affective Multiclass Analysis and Labelling using Artificial Intelligence [ANNAMALAI]

This project provides a simple graphical interface for uploading .mp3 or .wav speech audio files and predicting the emotional content of their five-second segments.

Read our report here.

⬇️ Download the Model

First and foremost, download the model weights. If not, the GUI will not analyze your uploaded speech audio file.

Download final_model.zip from our Microsoft Teams.
Move final_model.zip to the gui/ directory, if it is not downloaded and saved there already.
Extract the contents of final_model.zip into the current gui/ directory.
Once the extraction is complete, there should be best_meta_ffnn_model.pt, checkpoint-22112/ and checkpoint-55280/. These files and folders contain the model weights.
You may delete final_model.zip.

🔧 Setup Instructions

Follow the steps below to set up and run the project.

Set up the Python Virtual Environment

It is recommended to use a virtual environment to avoid dependency conflicts.

Install virtualenv if not already installed:

pip install virtualenv

Create a new virtual envionment named 'venv':

virtualenv venv

Activate the virtual environment:

source venv/bin/activate

Install project dependencies:

pip install -r requirements.txt

🚀 Using the GUI

Once the environment is ready, follow these steps to launch the GUI:

Ensure you're inside the virtual environment.
If not, activate it using:
```
source venv/bin/activate
```
From the project root directory, navigate to the GUI folder:
```
cd gui
```
Start the application:
```
python app.py
```
On first run, wait up to 1 minute for the backend to initialize.
Important: Clear any saved data and cookies from localhost in your browser settings.
Open index.html in your browser (e.g., drag-and-drop into your browser or use File > Open).
Upload a .mp3 or .wav speech audio file, and click Submit to process and view the predicted emotions for five-second segments.
To analyze another file, refresh the page, then repeat step 7.

🔚 Exiting the Virtual Environment

When you are done:

deactivate

⬇️ Download the Dataset

If you wish to replicate our experiments and/or work on the combined dataset we worked on, you may download it.

Download dataset.zip here.
Move dataset.zip to the data/ directory, if it is not downloaded and saved there already. All our data preprocessing notebooks assume that dataset.zip is located there.
Extract the contents of dataset.zip into the current data/ directory.
You may delete dataset.zip.

📁 Project Structure

ai_project_ser/
│
├── classical_models/   # Our experiments/iterations using classical machine learning models
├── cnn/   # Our experiments/iterations using convolutional neural networks
├── data/   # Datasets required for conducting our experiments
├── ffnn/   # Our experiments/iterations using feed-forward neural networks
├── gui/
│   └── app.py          # Flask backend
│   └── index.html      # Frontend GUI
│   └── ...             # Additional frontend assets
├── README.md           # This file
├── requirements.txt    # Python dependencies
├── transformer_models/   # Our experiments/iterations using transformer models
└── venv/               # Virtual environment (created after setup)

💡 Notes

For best results, use Google Chrome or Firefox.
If you're running the app for the first time, initialization may take a little longer due to backend setup.
Ensure your uploaded audio is clear and within a reasonable duration for better emotion detection accuracy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion Recognition (SER) Application: Audio-based Neural Network for Affective Multiclass Analysis and Labelling using Artificial Intelligence [ANNAMALAI]

⬇️ Download the Model

🔧 Setup Instructions

Set up the Python Virtual Environment

🚀 Using the GUI

🔚 Exiting the Virtual Environment

⬇️ Download the Dataset

📁 Project Structure

💡 Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
classical_models		classical_models
cnn		cnn
data		data
ffnn		ffnn
gui		gui
transformer_models		transformer_models
.gitattributes		.gitattributes
.gitignore		.gitignore
Final Presentation.pdf		Final Presentation.pdf
Final Report.pdf		Final Report.pdf
Project Proposal Presentation.pdf		Project Proposal Presentation.pdf
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Recognition (SER) Application: Audio-based Neural Network for Affective Multiclass Analysis and Labelling using Artificial Intelligence [ANNAMALAI]

⬇️ Download the Model

🔧 Setup Instructions

Set up the Python Virtual Environment

🚀 Using the GUI

🔚 Exiting the Virtual Environment

⬇️ Download the Dataset

📁 Project Structure

💡 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages