GenVoice-AI is a modular, plug-and-play framework for generating AI-driven voice and video content from text using Retrieval-Augmented Generation (RAG). It supports multiple providers like HeyGen and ElevenLabs, and is extensible for future integrations.
This project demonstrates how to:
- Use RAG (Retrieval-Augmented Generation) to extract meaningful information from documents
- Generate video scripts using OpenAI GPT / Mistral
- Convert scripts into professional videos using HeyGen’s API
- Document Indexing (FAISS / Chroma / Qdrant)
- RAG Chain (LangChain or custom logic)
- Script Generation via OpenAI or Mistral
- HeyGen API for video rendering
- Frontend: Streamlit or Gradio or python backend or API based
- ✅ Text-to-Video with HeyGen
- ✅ Text-to-Speech with ElevenLabs
- 🧠 RAG-powered script generation (PDF/Text)
- 🛠️ CLI support for automation
- 🧩 Pluggable provider registry
- 📝 Logging with timestamps & color
- 🔊 Optional Speech-to-Text
- 🔐 Config via
.env
genvoice-ai/
├── genvoice/
│ ├── cli.py
│ ├── config.py
│ ├── logger.py
│ ├── utils.py
│ ├── registry.py
│ ├── speech_to_text.py
│ └── providers/
│ ├── heygen.py
│ └── elevenlabs.py
├── examples/
│ └── sample_script.txt
├── .env.example
├── requirements.txt
├── main.py
└── README.md
| Provider | Type | Description |
|---|---|---|
| HeyGen | Text → Video | Avatar-based video generation from script |
| ElevenLabs | Text → Speech | Natural voiceover generation from text |
# HeyGen Video Generation
python main.py --provider heygen --file examples/sample_script.txt
# ElevenLabs Audio Generation
python main.py --provider elevenlabs --file examples/sample_script.txt- Clone & Install
git clone https://github.com/supermldev/genvoice-ai.git
cd genvoice-ai
pip install -r requirements.txtConfigure .env
cp .env.example .env
# Fill in API keys, voice/avatar IDs, etc..env Configuration
# Common
OPENAI_API_KEY=your_openai_api_key
# HeyGen
HEYGEN_API_KEY=your_heygen_api_key
HEYGEN_AVATAR_ID=Angela-inTshirt-20220820
HEYGEN_VOICE_ID=1bd001e7e50f421d891986aad5158bc8
# ElevenLabs
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id
# Logging
GENVOICE_LOG_FILE=genvoice.log
Powered by LangChain & OpenAI to summarize text or PDF input into a script:
examples/sample_script.txt → 🎬 video/audio scriptAll logs are saved to GENVOICE_LOG_FILE with timestamp + color output:
[2025-06-10 12:32:05] [INFO] 📬 API Response: ...
[2025-06-10 12:32:06] [ERROR] ❌ Failed to generate video
Create a new module under genvoice/providers/ and register it in registry.py:
# registry.py
from .providers import mynewprovider
PROVIDER_REGISTRY = {
"heygen": heygen.generate_video,
"elevenlabs": elevenlabs.generate_video,
"mynew": mynewprovider.generate_video,
}genvoice-ai/
│
├── app.py # Main CLI or runner
├── .env.example # Sample environment variables
├── requirements.txt
├── README.md
│
├── genvoice/
│ ├── core/ # Core logic
│ │ ├── document_loader.py
│ │ ├── vector_store.py
│ │ └── script_generator.py
│ │
│ ├── providers/ # Modular video/voice services
│ │ ├── heygen.py
│ │ └── elevenlabs.py # e.g., future text-to-speech module
│ │
│ ├── utils/ # Helpers (logging, polling, etc.)
│ └── config.py # Config loader and constants
│
└── examples/
└── sample_script.txt
• 🎓 Educational Explainers
• 🏢 Internal Corporate Training
• 📢 Marketing Demos
• 🧾 Document Summaries into Video
• HeyGen Video Support
• ElevenLabs Audio Support
• RAG Script Generation
• Provider Registry Pattern
• Webhook support
• Text + Audio → Final Video (FFmpeg)
• UI (Gradio/Streamlit)
MIT © SuperML.dev Free to use and modify with credit.
Built with ❤️ by SuperML.dev
