A Python-based application that translates audio files from one language to another using Facebook's M2M100 translation model.
This project provides an audio translation pipeline that:
-
Converts audio files to text (speech-to-text)
-
Translates the text to a target language using M2M100
-
Supports multiple language pairs
- Multi-language support with M2M100 model
- High-quality translation using pre-trained AI models
- Easy-to-use Python interface
- Supports various audio formats
- Python 3.7 or higher
- pip (Python package manager)
- Internet connection (for first-time model download)
- Clone or download this repository:
git clone <your-repository-url>
cd ML_PROJECT- Install required dependencies:
pip install transformers torch torchaudio
pip install SpeechRecognition
pip install pydub- (Optional) If working with audio files, you may need FFmpeg:
# Windows (using Chocolatey)
choco install ffmpeg
# macOS (using Homebrew)
brew install ffmpeg
# Linux
sudo apt-get install ffmpegML_PROJECT/
├── main.py # Main application file
├── static/ # Static files (CSS, JS, images)
├── templates/ # HTML templates
│ └── index.html # Web interface
├── Voice-Translation-using-... # Additional documentation
└── README.md # This file
- Activate your Python environment:
# If using conda
conda activate base
# If using venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows- Run the application:
python main.py- The application will:
- Load the M2M100 tokenizer
- Load the translation model (may take time on first run)
- Process your audio input
- Output translated text
On the first run, the model will be downloaded automatically (approximately 2-3 GB). This is a one-time process.