First open voice dataset for Avar language (~800,000 speakers, Dagestan).
- Voice collection (reading mode + spontaneous speech)
- Verification system (2-step verification)
- Profanity detection
- Google Drive integration
- Role system (user/verifier/admin/super-admin)
- HuggingFace export
- Hosting: Railway.app
- Storage: Google Drive (2 TB)
- Libraries: pyTelegramBotAPI, google-api-python-client, pydub
CC-BY 4.0 (for dataset) and MIT (for code)