NarrateAI-webui: Audiobook Generator 🎧📚

NarrateAI-webui is a user-friendly tool that leverages the power of Kokoro TTS to convert your documents and books into high-quality audiobooks. With its intuitive Gradio web interface, you can effortlessly upload files in various formats and receive a polished audiobook ready for listening.

✨ What's New in this Major Update?

🚀 Upgraded TTS Engine: Now powered by Kokoro TTS (hexgrad/Kokoro-82M) for diverse voice options and high-quality audio synthesis.
📄 Expanded File Format Support: Convert a wider range of documents! We now support:
- Plain Text (.txt)
- PDF (.pdf)
- EPUB (.epub)
- Microsoft Word (.docx)
- HTML (.html, .htm)
⚙️ Enhanced Customization via UI:
- A dedicated "Settings" tab in the web UI allows you to easily select language, voice, speaking speed, and processing device (CPU/CUDA).
- Changes are saved to config/config.json and the TTS engine re-initializes on the fly.
🔧 Flexible Configuration: The config/config.json file provides advanced control over Kokoro TTS defaults and available voice mappings.
🔄 Easy Updates: Keep your NarrateAI-webui up-to-date with the new update.bat script.

🌟 Core Features

Multi-Format Upload: Seamlessly upload your documents in TXT, PDF, EPUB, DOCX, and HTML formats.
High-Quality Local TTS: Utilizes Kokoro TTS for efficient and excellent text-to-speech conversion directly on your machine.
Intuitive Web Interface: A clean and simple Gradio UI for easy operation.
- Audiobook Generator Tab: Upload your document and start the generation process.
- Settings Tab: Fine-tune TTS parameters like language, specific voice, speech rate, and choose between CPU or GPU (CUDA) for processing.
Customizable Audio Output:
- Select preferred language and voice from available options.
- Adjust speaking speed (0.1x to 2.0x).
Automatic Output: Generated audiobooks are saved in .wav format directly into the outputs/ directory (e.g., outputs/your-file-name.wav).
Detailed Logging: Comprehensive logs are stored in the logs/ directory for monitoring and troubleshooting.

🖥️ System Requirements

Minimum Requirements:

RAM: 8GB
Free Disk Space: 1GB (plus space for generated audiobooks and TTS models)

Recommended Requirements:

RAM: 16GB
Free Disk Space: 2GB+

Notes:

GPU Support: An NVIDIA GPU with CUDA support is recommended for optimal performance and faster processing. CPU-only mode is also available.
OS Compatibility: Primarily developed and tested on Windows. Linux and macOS may work but have not been formally tested.

🚀 Installation

Prerequisites

Python 3.8+: Ensure Python is installed. Download from python.org. We recommend Python 3.12.x as used in the installer.
CUDA Toolkit (Optional, for GPU acceleration): If you plan to use an NVIDIA GPU, install the appropriate CUDA Toolkit. Download from NVIDIA Developer. The installer provides options for different CUDA versions.
Miniconda/Anaconda: A Conda environment is used for managing dependencies. Download Miniconda from docs.anaconda.com. Ensure you check "Add to PATH" during installation.

Steps

Clone the Repository:

git clone https://github.com/vancoder1/NarrateAI-webui.git
cd NarrateAI-webui

Run the Installer: Execute the install.bat script. This will:
- Create a Conda environment named narrate.
- Install Python.
- Install all necessary dependencies from requirements.txt.
- Prompt you to choose a PyTorch version (CUDA 12.8, CUDA 11.8, or CPU-only).
```
.\install.bat
```
Wait for the installation to complete. The script will guide you through the PyTorch selection.

📖 Usage

Start the Application: Run the start.bat script. This will activate the Conda environment and launch the Gradio web UI.
```
.\start.bat
```
The application will automatically open in your default web browser.
Configure Settings (Optional but Recommended for First Use):
- Navigate to the Settings tab.
- Select your desired Language Code, Voice, Speed, and Device (CPU/CUDA).
- Click "Update Settings". The available voices will update based on the selected language.
Generate Audiobook:
- Navigate to the Audiobook Generator tab.
- Upload your document file (e.g., .txt, .pdf, .epub, .docx, .html).
- The generation process will begin, showing progress updates.
- Once completed, an audio player will appear with your generated audiobook, and the .wav file will be available in the outputs/ directory (e.g., outputs/your-book-title.wav).

🔧 Configuration

The primary application settings can be managed through the Settings tab in the UI. These settings are persisted in config/config.json.

The config/config.json file stores:

{
    "settings": {
        "kokoro_tts": {
            "lang_code": "a", // Default language code
            "voice": "af_heart", // Default voice
            "speed": 1.0, // Default speed
            "device": "cpu", // Default device ('cpu' or 'cuda')
            "language_voices_map": { // Defines available voices for language codes
                "a": ["af_heart", "af_bella", ...],
                "b": ["bf_emma", "bf_isabella", ...],
                // ... other language codes and their voices
            }
        }
    }
}

You can manually edit this file for advanced configuration, but changes made through the UI will override these defaults.

🔄 Updating the Application

To update NarrateAI-webui to the latest version:

Ensure you have git installed.
Run the update.bat script:
```
.\update.bat
```
This script will:
- Fetch the latest changes from the repository.
- Pull updates for your current branch.
- Upgrade dependencies based on requirements.txt.

🤝 Contributing

Contributions are highly welcome! If you have ideas, suggestions, feature requests, or find bugs, please:

Open an issue on the GitHub repository to discuss the change.
Fork the repository, make your changes, and submit a pull request.

Please ensure your code follows the existing style.

📜 License

This project is licensed under the Apache License 2.0. See the LICENSE file for full details.

🙏 Acknowledgements

Gradio: For the easy-to-use Python library that helps build the web UI (Gradio GitHub).
Kokoro TTS: The powerful text-to-speech engine used for audio generation (hexgrad/Kokoro-82M on Hugging Face).
All the creators and maintainers of the various Python libraries used in this project (see requirements.txt).

📬 Contact

For any questions, feedback, or issues, please open an issue on this GitHub repository. You can also reach out to ivanzaporozhets25@gmail.com.

Made with ❤️ by vancoder1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NarrateAI-webui: Audiobook Generator 🎧📚

✨ What's New in this Major Update?

🌟 Core Features

🖥️ System Requirements

Minimum Requirements:

Recommended Requirements:

Notes:

🚀 Installation

Prerequisites

Steps

📖 Usage

🔧 Configuration

🔄 Updating the Application

🤝 Contributing

📜 License

🙏 Acknowledgements

📬 Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
config		config
src/NarrateAI		src/NarrateAI
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.bat		install.bat
requirements.txt		requirements.txt
start.bat		start.bat
update.bat		update.bat

License

vancoder1/NarrateAI-webui

Folders and files

Latest commit

History

Repository files navigation

NarrateAI-webui: Audiobook Generator 🎧📚

✨ What's New in this Major Update?

🌟 Core Features

🖥️ System Requirements

Minimum Requirements:

Recommended Requirements:

Notes:

🚀 Installation

Prerequisites

Steps

📖 Usage

🔧 Configuration

🔄 Updating the Application

🤝 Contributing

📜 License

🙏 Acknowledgements

📬 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages