🌟 Overview (Try VL Now!)
VideoLingo is an all-in-one video translation, localization, and dubbing tool aimed at generating Netflix-quality subtitles. It eliminates stiff machine translations and multi-line subtitles while adding high-quality dubbing, enabling global knowledge sharing across language barriers.
Key features:
-
🎥 YouTube video download via yt-dlp
-
🎙️ Word-level and Low-illusion subtitle recognition with MLX-Whisper (Mac) or WhisperX
-
📝 NLP and AI-powered subtitle segmentation
-
📚 Custom + AI-generated terminology for coherent translation
-
🔄 3-step Translate-Reflect-Adaptation for cinematic quality
-
✅ Netflix-standard, Single-line subtitles Only
-
🗣️ Dubbing with GPT-SoVITS, Azure, OpenAI, and more
-
🚀 One-click startup and processing in Streamlit
-
🌍 Multi-language support in Streamlit UI
-
📝 Detailed logging with progress resumption
-
🎵 Enhanced audio processing with pydub for better audio splitting
Difference from similar projects: Single-line subtitles only, superior translation quality, seamless dubbing experience
trans.mp4 |
dubbing.mp4 |
sovits.mp4 |
Input Language Support(more to come):
🇺🇸 English 🤩 | 🇷🇺 Russian 😊 | 🇫🇷 French 🤩 | 🇩🇪 German 🤩 | 🇮🇹 Italian 🤩 | 🇪🇸 Spanish 🤩 | 🇯🇵 Japanese 😐 | 🇨🇳 Chinese* 😊
*Chinese uses a separate punctuation-enhanced whisper model, for now...
Translation supports all languages, while dubbing language depends on the chosen TTS method.
- Improved Installation: Added error handling to prevent initialization failures on first install
- Better Unicode Support: Fixed Chinese and other non-ASCII character handling in translation prompts
- Enhanced Term Extraction: Improved proper noun translation accuracy
- Audio Processing: Upgraded to pydub for more reliable audio splitting
- Mac Optimization: Migrated to MLX-Whisper and Pyannote-audio for significantly faster performance on Apple Silicon.
- Filler Word Removal: Automatically recognizes and filters verbal tics like "um", "uh", "right" in transcriptions.
- UI Improvements: Added JSON format support toggle in LLM settings and one-click startup scripts.
Meet any problem? Chat with our free online AI agent here to help you.
Note: FFmpeg is required. Please install it via Homebrew:
- macOS:
brew install ffmpeg(via Homebrew)
- Clone the repository
git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo- Install dependencies (requires
conda)
bash run_installer.sh- Start the application
streamlit run st.pyVideoLingo supports OpenAI-Like API format and various TTS interfaces:
- LLM:
claude-3-5-sonnet,gpt-4.1,deepseek-v3,gemini-2.0-flash, ... (sorted by performance, be cautious with gemini-2.5-flash...) - Whisper: Run MLX-Whisper locally (recommended for Mac), or use ElevenLabs ASR API.
- TTS:
azure-tts,openai-tts,siliconflow-fishtts,fish-tts,GPT-SoVITS,edge-tts,*custom-tts(You can modify your own TTS in custom_tts.py!)
Note: VideoLingo works with 302.ai - one API key for all services (LLM, WhisperX, TTS). Or run locally with Ollama and Edge-TTS for free, no API needed!
Important: For multi-character diarization, you must:
- Create a Hugging Face Access Token.
- Accept terms for pyannote/speaker-diarization-3.1 and pyannote/segmentation-3.0.
- Enter your token in the Streamlit sidebar or
config.yaml.
For detailed installation, API configuration, and batch mode instructions, please refer to the documentation: English | 中文
-
Whisper transcription performance may be affected by video background noise. For videos with loud background music, please enable Voice Separation Enhancement.
-
Using weaker models can lead to errors during processes due to strict JSON format requirements for responses (tried my best to prompt llm😊). If this error occurs, please delete the
outputfolder and retry with a different LLM. -
The dubbing feature may not be 100% perfect due to differences in speech rates and intonation between languages.
-
Multi-character dubbing is now supported via Pyannote diarization (experimental).
This project is licensed under the Apache 2.0 License. Special thanks to the following open source projects for their contributions:
MLX-Whisper, pyannote-audio, whisperX, yt-dlp, json_repair, BELLE
- Submit Issues or Pull Requests on GitHub
- DM me on Twitter: @Huanshere
- Email me at: team@videolingo.io
If you find VideoLingo helpful, please give me a ⭐️!
