Skip to content

akshara12code/Audio-text-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice Translation using Gen AI

A Python-based application that translates audio files from one language to another using Facebook's M2M100 translation model.

📋 Description

This project provides an audio translation pipeline that:

  • Converts audio files to text (speech-to-text)

  • Translates the text to a target language using M2M100

  • Supports multiple language pairs

    Screenshot 2025-12-10 144154 Screenshot 2025-12-10 144301 image

🚀 Features

  • Multi-language support with M2M100 model
  • High-quality translation using pre-trained AI models
  • Easy-to-use Python interface
  • Supports various audio formats

📦 Prerequisites

  • Python 3.7 or higher
  • pip (Python package manager)
  • Internet connection (for first-time model download)

🔧 Installation

  1. Clone or download this repository:
git clone <your-repository-url>
cd ML_PROJECT
  1. Install required dependencies:
pip install transformers torch torchaudio
pip install SpeechRecognition
pip install pydub
  1. (Optional) If working with audio files, you may need FFmpeg:
# Windows (using Chocolatey)
choco install ffmpeg

# macOS (using Homebrew)
brew install ffmpeg

# Linux
sudo apt-get install ffmpeg

📁 Project Structure

ML_PROJECT/
├── main.py                          # Main application file
├── static/                          # Static files (CSS, JS, images)
├── templates/                       # HTML templates
│   └── index.html                  # Web interface
├── Voice-Translation-using-...     # Additional documentation
└── README.md                        # This file

🎯 Usage

Basic Usage

  1. Activate your Python environment:
# If using conda
conda activate base

# If using venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows
  1. Run the application:
python main.py
  1. The application will:
    • Load the M2M100 tokenizer
    • Load the translation model (may take time on first run)
    • Process your audio input
    • Output translated text

First Run

On the first run, the model will be downloaded automatically (approximately 2-3 GB). This is a one-time process.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors