NarrateAI-webui is a user-friendly tool that leverages the power of Kokoro TTS to convert your documents and books into high-quality audiobooks. With its intuitive Gradio web interface, you can effortlessly upload files in various formats and receive a polished audiobook ready for listening.
- 🚀 Upgraded TTS Engine: Now powered by Kokoro TTS (
hexgrad/Kokoro-82M) for diverse voice options and high-quality audio synthesis. - 📄 Expanded File Format Support: Convert a wider range of documents! We now support:
- Plain Text (
.txt) - PDF (
.pdf) - EPUB (
.epub) - Microsoft Word (
.docx) - HTML (
.html,.htm)
- Plain Text (
- ⚙️ Enhanced Customization via UI:
- A dedicated "Settings" tab in the web UI allows you to easily select language, voice, speaking speed, and processing device (CPU/CUDA).
- Changes are saved to
config/config.jsonand the TTS engine re-initializes on the fly.
- 🔧 Flexible Configuration: The
config/config.jsonfile provides advanced control over Kokoro TTS defaults and available voice mappings. - 🔄 Easy Updates: Keep your NarrateAI-webui up-to-date with the new
update.batscript.
- Multi-Format Upload: Seamlessly upload your documents in TXT, PDF, EPUB, DOCX, and HTML formats.
- High-Quality Local TTS: Utilizes Kokoro TTS for efficient and excellent text-to-speech conversion directly on your machine.
- Intuitive Web Interface: A clean and simple Gradio UI for easy operation.
- Audiobook Generator Tab: Upload your document and start the generation process.
- Settings Tab: Fine-tune TTS parameters like language, specific voice, speech rate, and choose between CPU or GPU (CUDA) for processing.
- Customizable Audio Output:
- Select preferred language and voice from available options.
- Adjust speaking speed (0.1x to 2.0x).
- Automatic Output: Generated audiobooks are saved in
.wavformat directly into theoutputs/directory (e.g.,outputs/your-file-name.wav). - Detailed Logging: Comprehensive logs are stored in the
logs/directory for monitoring and troubleshooting.
- RAM: 8GB
- Free Disk Space: 1GB (plus space for generated audiobooks and TTS models)
- RAM: 16GB
- Free Disk Space: 2GB+
- GPU Support: An NVIDIA GPU with CUDA support is recommended for optimal performance and faster processing. CPU-only mode is also available.
- OS Compatibility: Primarily developed and tested on Windows. Linux and macOS may work but have not been formally tested.
- Python 3.8+: Ensure Python is installed. Download from python.org. We recommend Python 3.12.x as used in the installer.
- CUDA Toolkit (Optional, for GPU acceleration): If you plan to use an NVIDIA GPU, install the appropriate CUDA Toolkit. Download from NVIDIA Developer. The installer provides options for different CUDA versions.
- Miniconda/Anaconda: A Conda environment is used for managing dependencies. Download Miniconda from docs.anaconda.com. Ensure you check "Add to PATH" during installation.
-
Clone the Repository:
git clone https://github.com/vancoder1/NarrateAI-webui.git cd NarrateAI-webui -
Run the Installer: Execute the
install.batscript. This will:- Create a Conda environment named
narrate. - Install Python.
- Install all necessary dependencies from
requirements.txt. - Prompt you to choose a PyTorch version (CUDA 12.8, CUDA 11.8, or CPU-only).
.\install.bat
- Create a Conda environment named
-
Wait for the installation to complete. The script will guide you through the PyTorch selection.
-
Start the Application: Run the
start.batscript. This will activate the Conda environment and launch the Gradio web UI..\start.bat
The application will automatically open in your default web browser.
-
Configure Settings (Optional but Recommended for First Use):
- Navigate to the Settings tab.
- Select your desired Language Code, Voice, Speed, and Device (CPU/CUDA).
- Click "Update Settings". The available voices will update based on the selected language.
-
Generate Audiobook:
- Navigate to the Audiobook Generator tab.
- Upload your document file (e.g.,
.txt,.pdf,.epub,.docx,.html). - The generation process will begin, showing progress updates.
- Once completed, an audio player will appear with your generated audiobook, and the
.wavfile will be available in theoutputs/directory (e.g.,outputs/your-book-title.wav).
The primary application settings can be managed through the Settings tab in the UI. These settings are persisted in config/config.json.
The config/config.json file stores:
{
"settings": {
"kokoro_tts": {
"lang_code": "a", // Default language code
"voice": "af_heart", // Default voice
"speed": 1.0, // Default speed
"device": "cpu", // Default device ('cpu' or 'cuda')
"language_voices_map": { // Defines available voices for language codes
"a": ["af_heart", "af_bella", ...],
"b": ["bf_emma", "bf_isabella", ...],
// ... other language codes and their voices
}
}
}
}You can manually edit this file for advanced configuration, but changes made through the UI will override these defaults.
To update NarrateAI-webui to the latest version:
- Ensure you have
gitinstalled. - Run the
update.batscript:This script will:.\update.bat
- Fetch the latest changes from the repository.
- Pull updates for your current branch.
- Upgrade dependencies based on
requirements.txt.
Contributions are highly welcome! If you have ideas, suggestions, feature requests, or find bugs, please:
- Open an issue on the GitHub repository to discuss the change.
- Fork the repository, make your changes, and submit a pull request.
Please ensure your code follows the existing style.
This project is licensed under the Apache License 2.0. See the LICENSE file for full details.
- Gradio: For the easy-to-use Python library that helps build the web UI (Gradio GitHub).
- Kokoro TTS: The powerful text-to-speech engine used for audio generation (hexgrad/Kokoro-82M on Hugging Face).
- All the creators and maintainers of the various Python libraries used in this project (see
requirements.txt).
For any questions, feedback, or issues, please open an issue on this GitHub repository. You can also reach out to ivanzaporozhets25@gmail.com.
Made with ❤️ by vancoder1