VoiceOps is an asynchronous audio processing and transcription framework built with Django REST Framework, Celery, Redis, and PostgreSQL.
It is designed for API-first media workflows where transcription cannot be handled in a single request cycle. A client registers an audio asset, queues transcription, polls task state, and then consumes the resulting transcript or summary. This makes it suitable for speech data operations, transcript generation, summarization pipelines, and preparation of training data for audio ML workflows. It is built for real audio processing workloads and already integrates with GCS as the media ingestion layer.
The current pipeline covers:
- asset registration
- queued transcription
- queued transcript summarization
- task-state polling
- queue separation for transcription and summarization workloads
Planned future work:
- embedding pipelines for transcripts and summaries
- semantic search and retrieval over transcript corpora
- dataset curation workflows for audio and speech ML
- richer orchestration for multi-stage post-processing
Documentation:
- Setup: docs/setup.md
- API reference: docs/api.md
- Business logic: docs/business-logic.md