This repository contains a Python script for a Telegram bot that uses NVIDIA's NeMo ASR models to transcribe voice messages. The bot can handle both locally stored models and pre-trained Hugging Face models.
- Purpose: Transcribe voice messages sent to the Telegram bot.
- Key Features:
- Supports local NeMo ASR models and Hugging Face models.
- Converts
.oggvoice messages to.wavformat for transcription. - Deletes temporary
.oggfiles after processing.
The bot requires a configuration file in the config/ directory. Use config.yaml as the default configuration file. Below is an example configuration:
# Path to the local model file. This will be used first if it exists.
model_path: '/path/to/alternate/model/if/needed.nemo'
# Name of the Hugging Face model to fall back on if the local model is not available.
hf_model_name: 'nvidia/stt_hy_fastconformer_hybrid_large_pc'
# Token for the Telegram bot to interact with your work PC.
telegram_token: 'your_bot_token'Notes:
- Replace
/path/to/alternate/model/if/needed.nemowith the actual path to your local NeMo ASR model. - Replace
your_bot_tokenwith the Telegram bot token. - If you want to use a different configuration file (e.g., for another environment), create it in the
config/directory and update theconfig_pathin the script accordingly.
-
Use the
run.shscript to automatically set up the Docker environment, build the Docker image, and start the container:./run.sh
This script will:
- Build the Docker image.
- Stop and remove any existing containers with the same name.
- Start a new container for the bot.
-
Monitor the logs to ensure the bot is running:
docker logs -f armenian-asr-bot
-
Interact with the bot via Telegram:
- Send
/startto the bot. - Send a voice message, and the bot will transcribe it.
- Send
- Starting the Bot:
- Send
/startto the bot to begin.
- Send
- Sending a Voice Message:
- Send a voice message, and the bot will:
- Download the
.oggfile. - Convert it to
.wav. - Transcribe the message using the NeMo ASR model.
- Respond with the transcribed text.
- Download the
- Send a voice message, and the bot will:
The workspace/ directory is used to store temporary user files (e.g., .ogg and .wav files). Subdirectories are created for each user based on their Telegram ID.
The bot automatically deletes .ogg files after conversion to .wav.
For issues or questions, please contact the repository maintainer.