Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
93 changes: 80 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,31 +19,85 @@ This repository provides a comprehensive solution for real-time **speech-to-text

*Figure: High-level workflow of the application, including speech-to-text, sentiment analysis, and translation.*

---

## Key Highlights

**From-Scratch Implementation**: Complete Transformer architecture built from the ground up, demonstrating deep understanding of attention mechanisms, positional encodings, and encoder-decoder architectures.

**Production-Ready Pipeline**: End-to-end system integrating speech recognition, sentiment classification, and neural machine translation in a single application.

**Research-Grade Code**: Clean, well-documented implementation suitable for educational purposes and research experimentation.

**Hyperparameter Optimization**: Automated tuning with Optuna for both sentiment and translation models.

---
## Architecture

### Translation Transformer Model

The English-to-French translation system implements a **Transformer architecture built from scratch**. Rather than using pre-trained models or high-level APIs, this implementation provides full control over each component, from multi-head attention mechanisms to positional encodings.

![Transformer Architecture](docs/images/translation_transformer.jpeg)

*Figure: Detailed architecture of the Transformer model showing encoder-decoder structure with multi-head attention mechanisms.*

### Sentiment Analysis Model

The sentiment classifier uses a Bidirectional LSTM architecture:
- Embedding layer for word representations
- Bidirectional LSTM layers for capturing context from both directions
- Dense layers with dropout for classification
- Binary output (positive/negative sentiment)

---

## Features

### Speech Processing
- **Real-time Speech-to-Text**: Audio capture and transcription using Vosk library
- **English Language Support**: Optimized for US English accent (vosk-model-en-us-0.22)
- **Downloadable Transcripts**: Export recognized text as `.txt` files

### Sentiment Analysis
- **Speech-to-Text**: Converts spoken audio into text using the Vosk library.
- **Text Preprocessing**: Uses TensorFlow's `TextVectorization` layer to tokenize and vectorize text data.
- **Bidirectional LSTM Model**: Implements a deep learning model with embedding, bidirectional LSTM, and dense layers for sentiment classification.
- **Training and Evaluation**: Includes functionality to train the model on a dataset and evaluate its performance on validation and test sets.
- **Inference**: Provides an inference pipeline to predict sentiment for new text inputs.
- **Interactive Application**: A Dash-based web application for real-time speech-to-text and sentiment analysis.
- **Bidirectional LSTM Architecture**: Deep learning model with embedding and recurrent layers
- **TensorFlow Text Processing**: Efficient tokenization and vectorization with `TextVectorization`
- **Binary Classification**: Positive/negative sentiment prediction
- **Hyperparameter Optimization**: Automated tuning with Optuna
- **Alternative Architectures**: Optional BERT-based models for comparison

### English-to-French Translation
- **Transformer Model**: Implements a sequence-to-sequence Transformer model for English-to-French translation.
- **BLEU Score Evaluation**: Evaluates the quality of translations using the BLEU metric.
- **Preprocessing**: Includes utilities for tokenizing and vectorizing English and French text.
- **Model Saving and Loading**: Supports saving and loading trained Transformer models for reuse.
- **Integration with Speech-to-Text**: Translates recognized speech from English to French in real-time.
- **From-Scratch Transformer Implementation**: Full encoder-decoder architecture built without pre-trained models
- **Custom Multi-Head Attention**: Manually implemented attention mechanisms with configurable heads
- **Positional Encoding**: Hand-crafted sinusoidal position embeddings
- **BLEU Score Evaluation**: Translation quality metrics for model assessment
- **Flexible Architecture**: Easily configurable dimensions, layers, and attention heads
- **Model Persistence**: Save and load trained models for inference
- **Real-time Integration**: Seamless connection with speech-to-text pipeline

### Interactive Web Application
- **Dash Framework**: Responsive web interface for real-time interaction
- **Live Processing**: Instant speech recognition, sentiment analysis, and translation
- **Visual Feedback**: Clear display of recognized text, sentiment, and translations
- **Export Functionality**: Download transcripts for offline use

---

## Note on Models
## Performance

Current model performance on test datasets:

The sentiment analysis and translation models included in this repository are **toy models** designed for demonstration purposes. They may not achieve production-level accuracy and are intended for educational and exploratory use.
| Model | Metric | Score |
|-------|--------|-------|
| Sentiment Analysis (BiLSTM) | Test Accuracy | 95.00% |
| Translation (Transformer) | Test Accuracy | 67.26% |
| Translation (Transformer) | BLEU Score | 0.52 |

**Note on Model Status**: These models were **built from scratch as educational implementations** of the underlying architectures. The Transformer implementation provides a complete, working example of the attention mechanism without relying on pre-trained models or high-level abstractions. While they demonstrate solid understanding of these architectures, they are not optimized for production deployment. For production use, consider:
- Training on larger datasets (millions of examples)
- Increasing model capacity (more layers, larger dimensions)
- Extended training duration with learning rate scheduling
- Ensemble methods and model distillation

---

Expand Down Expand Up @@ -240,3 +294,16 @@ Sentiment_Analysis/
This project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.

---

## Citation

If you use this project in your research or work, please cite:

```bibtex
@software{sentiment_translation_2025,
author = {Vachier, Jeremy},
title = {Sentiment Analysis and Translation},
year = {2025},
url = {https://github.com/jvachier/Sentiment_Analysis}
}
```
Binary file added docs/images/translation_transformer.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.