Connect the World, Frame by Frame

English｜简体中文｜繁體中文｜日本語｜Español｜Русский｜Français

🌟 Overview (Try VL Now!)

VideoLingo is an all-in-one video translation, localization, and dubbing tool aimed at generating Netflix-quality subtitles. It eliminates stiff machine translations and multi-line subtitles while adding high-quality dubbing, enabling global knowledge sharing across language barriers.

Key features:

🎥 YouTube video download via yt-dlp
🎙️ Word-level and Low-illusion subtitle recognition with MLX-Whisper (Mac) or WhisperX
📝 NLP and AI-powered subtitle segmentation
📚 Custom + AI-generated terminology for coherent translation
🔄 3-step Translate-Reflect-Adaptation for cinematic quality
✅ Netflix-standard, Single-line subtitles Only
🗣️ Dubbing with GPT-SoVITS, Azure, OpenAI, and more
🚀 One-click startup and processing in Streamlit
🌍 Multi-language support in Streamlit UI
📝 Detailed logging with progress resumption
🎵 Enhanced audio processing with pydub for better audio splitting

Difference from similar projects: Single-line subtitles only, superior translation quality, seamless dubbing experience

🎥 Demo

Dual Subtitles

trans.mp4

Cosy2 Voice Clone

dubbing.mp4

GPT-SoVITS with my voice

sovits.mp4

Language Support

Input Language Support(more to come):

*Chinese uses a separate punctuation-enhanced whisper model, for now...

Translation supports all languages, while dubbing language depends on the chosen TTS method.

🔄 Recent Updates

Improved Installation: Added error handling to prevent initialization failures on first install
Better Unicode Support: Fixed Chinese and other non-ASCII character handling in translation prompts
Enhanced Term Extraction: Improved proper noun translation accuracy
Audio Processing: Upgraded to pydub for more reliable audio splitting
Mac Optimization: Migrated to MLX-Whisper and Pyannote-audio for significantly faster performance on Apple Silicon.
Filler Word Removal: Automatically recognizes and filters verbal tics like "um", "uh", "right" in transcriptions.
UI Improvements: Added JSON format support toggle in LLM settings and one-click startup scripts.

Installation

Meet any problem? Chat with our free online AI agent here to help you.

Note: FFmpeg is required. Please install it via Homebrew:

macOS: brew install ffmpeg (via Homebrew)

Clone the repository

git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo

Install dependencies (requires conda)

bash run_installer.sh

Start the application

streamlit run st.py

APIs

VideoLingo supports OpenAI-Like API format and various TTS interfaces:

LLM: claude-3-5-sonnet, gpt-4.1, deepseek-v3, gemini-2.0-flash, ... (sorted by performance, be cautious with gemini-2.5-flash...)
Whisper: Run MLX-Whisper locally (recommended for Mac), or use ElevenLabs ASR API.
TTS: azure-tts, openai-tts, siliconflow-fishtts, fish-tts, GPT-SoVITS, edge-tts, *custom-tts(You can modify your own TTS in custom_tts.py!)

Note: VideoLingo works with 302.ai - one API key for all services (LLM, WhisperX, TTS). Or run locally with Ollama and Edge-TTS for free, no API needed!

Important: For multi-character diarization, you must:

Create a Hugging Face Access Token.

Accept terms for pyannote/speaker-diarization-3.1 and pyannote/segmentation-3.0.

Enter your token in the Streamlit sidebar or config.yaml.

For detailed installation, API configuration, and batch mode instructions, please refer to the documentation: English | 中文

Current Limitations

Whisper transcription performance may be affected by video background noise. For videos with loud background music, please enable Voice Separation Enhancement.
Using weaker models can lead to errors during processes due to strict JSON format requirements for responses (tried my best to prompt llm😊). If this error occurs, please delete the output folder and retry with a different LLM.
The dubbing feature may not be 100% perfect due to differences in speech rates and intonation between languages.
Multi-character dubbing is now supported via Pyannote diarization (experimental).

📄 License

This project is licensed under the Apache 2.0 License. Special thanks to the following open source projects for their contributions:

MLX-Whisper, pyannote-audio, whisperX, yt-dlp, json_repair, BELLE

📬 Contact Me

Submit Issues or Pull Requests on GitHub
DM me on Twitter: @Huanshere
Email me at: team@videolingo.io

⭐ Star History

If you find VideoLingo helpful, please give me a ⭐️!

Name		Name	Last commit message	Last commit date
Latest commit History 970 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
batch		batch
core		core
docs		docs
tests		tests
translations		translations
.cursorrules		.cursorrules
.gitignore		.gitignore
.pip-constraints.txt		.pip-constraints.txt
LICENSE		LICENSE
OneKeyStart.sh		OneKeyStart.sh
README.md		README.md
config.yaml		config.yaml
custom_terms.xlsx		custom_terms.xlsx
install.py		install.py
requirements.txt		requirements.txt
run_installer.sh		run_installer.sh
setup.py		setup.py
st.py		st.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Connect the World, Frame by Frame

🌟 Overview (Try VL Now!)

🎥 Demo

Dual Subtitles

Cosy2 Voice Clone

GPT-SoVITS with my voice

Language Support

🔄 Recent Updates

Installation

APIs

Current Limitations

📄 License

📬 Contact Me

⭐ Star History

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Connect the World, Frame by Frame

🌟 Overview (Try VL Now!)

🎥 Demo

Dual Subtitles

Cosy2 Voice Clone

GPT-SoVITS with my voice

Language Support

🔄 Recent Updates

Installation

APIs

Current Limitations

📄 License

📬 Contact Me

⭐ Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages