A production-ready Django application for transcribing video and audio files using OpenAI's Whisper model, with containerization, async task processing, and comprehensive security features.
Video Transcriber is a modern, full-featured web application for transcribing video and audio content. It leverages OpenAI's Whisper, a state-of-the-art speech recognition model, combined with Docker containerization, PostgreSQL, Celery async tasks, and strict security controls to provide a reliable, scalable platform for media transcription.
- Multiple File Formats: MP4, MPEG, MOV, AVI, WebM, OGG, MP3, WAV, FLAC
- Long Video Support: Automatic chunking for videos over 15 minutes
- Configurable Models: Tiny, Base, Small (default), Medium, Large (accuracy vs. speed tradeoff)
- Timestamp Segments: Per-utterance transcripts with precise timing
- Multi-format Export: Download as TXT (plaintext) or SRT (subtitles for video players)
- Docker Containerization: Reproducible deployments with docker-compose
- PostgreSQL Database: Production-grade data persistence and concurrent access
- Celery Task Queue: Asynchronous transcription processing with Redis broker
- Graceful Error Handling: Automatic recovery from worker crashes and deleted records
- Persistent Model Cache: Pre-downloaded Whisper models survive container restarts
- User Authentication: Registration, login, password reset with email
- Authorization: Users can only access their own videos (prevents IDOR attacks)
- Brute-force Protection: Lock accounts after 5 failed login attempts (24-hour cooldown)
- CSRF Protection: All forms include CSRF tokens
- Secure Cookies: HTTPS-only cookies in production
- Input Validation: File type, size, and content-type validation
- Real-time Status Updates: AJAX polling shows transcription progress
- Pagination: Video list with 6 items per page
- Bootstrap UI: Responsive, mobile-friendly interface
- User-friendly Titles: Video filenames automatically used as titles
- Progress Tracking: Visual status indicators (Pending, Processing, Completed, Failed)
- Docker & Docker Compose (recommended for production)
- Python 3.12+ (for local development)
- FFmpeg (for audio extraction from video files)
-
Clone the repository:
git clone https://github.com/Co-vengers/video_transcriber.git cd video_transcriber -
Create environment configuration:
cp video_transcriber/.env.example video_transcriber/.env # Edit video_transcriber/.env with your settings -
Build and start services:
docker-compose up --build
On ARM64 hosts (e.g., Apple Silicon or ARM-based Linux VMs), the Docker platform is defaulted to
linux/amd64for better Python ML wheel compatibility. To run native ARM64 instead:DOCKER_PLATFORM=linux/arm64 docker compose up --build
-
Access the application:
- Web UI: http://localhost:8000
- Admin: http://localhost:8000/admin (superuser required)
-
Create admin user (optional, in another terminal):
docker-compose exec -T web python manage.py createsuperuser
If you prefer running locally without Docker:
-
Clone the repository:
git clone https://github.com/Co-vengers/video_transcriber.git cd video_transcriber -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
cp video_transcriber/.env.example video_transcriber/.env # Edit .env with development settings (use SQLite for local dev) -
Run migrations:
python manage.py migrate
-
Create a superuser:
python manage.py createsuperuser
-
Start Redis (in separate terminal):
# Using Homebrew on macOS: brew services start redis # Or run directly: redis-server
-
Start Celery worker (in separate terminal):
cd video_transcriber celery -A video_transcriber worker --loglevel=info -
Start Django development server:
cd video_transcriber python manage.py runserver -
Access at http://localhost:8000
- Click "Register" to create a new account or "Login" with existing credentials
- Password reset available via email link
- Brute-force protection: Account locks after 5 failed attempts (24-hour cooldown)
- Click "Upload Video" from main menu
- Select a video or audio file (max 500 MB)
- Supported formats: MP4, MPEG, MOV, AVI, WebM, OGG, MP3, WAV, FLAC
- Choose Whisper model size:
- Tiny: Fastest, ~39M parameters, basic accuracy
- Base: Balanced, ~74M parameters
- Small: Default, ~244M parameters, good accuracy/speed tradeoff
- Medium: Slower, ~769M parameters, better accuracy
- Large: Slowest, ~1.5B parameters, highest accuracy
- Click "Upload" to start transcription
- Videos appear in "Videos" list with status badge
- Status updates in real-time (Pending β Processing β Completed/Failed)
- Progress visible without page refresh
Once transcription completes, download in multiple formats:
- TXT: Plain text transcript (copy/paste friendly)
- SRT: SubRip format with timestamps (import into video players)
- Format:
HH:MM:SS,mmm --> HH:MM:SS,mmm - Compatible with VLC, YouTube, browser video players
- Format:
- View all your transcribed videos with status
- Click video title to see full transcript and segments
- Delete videos to free up storage
- Videos are private (only you can see your transcripts)
Access Django admin at /admin:
- Manage user accounts
- View/filter transcription jobs by status and date
- Search videos by title or username
- Monitor transcription history
| Endpoint | Method | Purpose |
|---|---|---|
/ |
GET | Upload form |
/videos/ |
GET | List user's videos (paginated) |
/videos/<id>/ |
GET | View transcript and segments |
/videos/<id>/status/ |
GET | JSON status (for AJAX polling) |
/videos/<id>/download/<fmt>/ |
GET | Download transcript (fmt: txt or srt) |
/videos/<id>/delete/ |
POST | Delete video |
/register/ |
GET/POST | User registration |
/login/ |
GET/POST | User login |
/logout/ |
POST | User logout |
/password-reset/ |
GET/POST | Password reset flow |
- Web Framework: Django 5.1.7 (Python web framework)
- Application Server: Gunicorn 23.0.0 (production WSGI server)
- Database: PostgreSQL 16 (production relational database)
- Message Broker: Redis 7 (in-memory message queue)
- Task Queue: Celery 5.4.0 (asynchronous job processing)
- ML/AI Engine: OpenAI Whisper 20240930 (speech-to-text)
- Containerization: Docker & docker-compose (reproducible deployments)
- Frontend: Bootstrap 5 (responsive CSS framework)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Web Service β β Worker β β Database β β
β β (Gunicorn) β β (Celery) β β(PostgreSQL) β β
β β 2 workers β β concurrency=4β β β β
β ββββββββββ¬βββββββββ ββββββββ¬ββββββββ ββββββββββββββββ β
β β β β
β ββββββββββββ¬βββββββββ β
β β β
β βββββββββΌβββββββββ β
β β Redis Broker β β
β β (Task Queue) β β
β ββββββββββββββββββ β
β β
β Persistent Volumes: β
β - whisper_cache: /cache/whisper (461MB model) β
β - postgres_data: PostgreSQL data β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- User uploads video β Web service saves to
/media/videos/ - Task is queued β Gunicorn sends
process_transcriptiontask to Redis - Worker picks up task β Celery worker loads Whisper model (cached)
- Chunked transcription β Long videos split into 10-minute chunks
- Results saved β Transcripts and segments stored in PostgreSQL
- Frontend updates β AJAX polling shows real-time status
- User downloads β Download as TXT or SRT subtitle format
video_transcriber/
βββ Dockerfile # Container image definition
βββ docker-compose.yml # Multi-service orchestration
βββ requirements.txt # Python dependencies
βββ CHANGES_DOCUMENTATION.md # Full changelog and rationale
β
βββ video_transcriber/ # Django project config
β βββ settings.py # Django configuration
β βββ urls.py # Main URL routing
β βββ wsgi.py # WSGI application
β βββ celery.py # Celery configuration
β βββ __init__.py # Celery import
β
βββ transcription/ # Django app (main logic)
β βββ models.py # Video model with ownership
β βββ views.py # All HTTP view handlers
β βββ urls.py # App URL patterns
β βββ forms.py # VideoUploadForm with validation
β βββ tasks.py # Celery transcription task
β βββ admin.py # Django admin configuration
β βββ utils.py # Whisper transcription utilities
β βββ exports.py # TXT/SRT export functions
β β
β βββ management/
β β βββ commands/
β β βββ requeue_stale_transcriptions.py # Stale task recovery
β β
β βββ migrations/ # Database schema versions
β β βββ 0001_initial.py
β β βββ 0002_video_user.py
β β βββ 0003_alter_video_file_alter_video_user.py
β β βββ 0004_video_status.py
β β βββ 0005_video_segments.py
β β βββ 0006_fix_upload_to_path.py
β β
β βββ templates/ # HTML templates
β β βββ base.html # Navigation, Bootstrap layout
β β βββ upload.html # Video upload form
β β βββ video_list.html # Paginated video gallery
β β βββ video_detail.html # Transcript viewer, download
β β βββ auth/ # Authentication templates
β β βββ login.html
β β βββ register.html
β β βββ password_reset.html
β β βββ lockout.html
β β
β βββ static/ # CSS, JS, fonts
β β βββ transcription/
β β βββ favicon.svg
β β
β βββ tests.py # Unit tests
β
βββ media/ # User uploads (not in repo)
β βββ videos/
β
βββ .env.example # Environment template
βββ .python-version # Python 3.12
βββ .gitignore # Excluded files
βββ README.md # This file
- β User registration with password validation
- β Secure password reset via email
- β CSRF tokens on all forms
- β Permission checks (users can only access their own videos)
- β IDOR prevention (returns 404 if accessing others' content)
- β Brute-force protection (5 failed attempts β 24-hour lockout)
- β HTTPS-only cookies in production
- β Secure cookie flags (HttpOnly, SameSite)
- β Security headers (X-Frame-Options=DENY, X-Content-Type-Options=nosniff)
- β SQL injection prevention (parameterized queries via ORM)
- β Environment-based secrets (not in code)
- β File type validation (extension + MIME type)
- β File size limits (500 MB max)
- β Input sanitization on all forms
- β Secure file storage outside web root
- β Graceful error handling (no crashes on deleted records)
- β Task timeouts (1 hour hard, 55-minute soft)
- β Auto-requeue on worker loss
- β Automatic stale task recovery on startup
- β Non-root worker process (nobody:nogroup)
# Security
SECRET_KEY=your-secret-key-here
# Debug Mode (False in production)
DEBUG=True
# Allowed Hosts
ALLOWED_HOSTS=localhost,127.0.0.1,0.0.0.0
# Database
DB_ENGINE=postgres # or 'sqlite' for development
DB_NAME=video_transcriber
DB_USER=postgres
DB_PASSWORD=postgres
DB_HOST=db # Docker service name
DB_PORT=5432
# Message Broker & Results
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0
# Stale Task Recovery
STALE_PROCESSING_MINUTES=45 # How old before marking as stale
RECOVERY_MODEL_SIZE=small # Model to use for requeueFor password reset emails:
EMAIL_BACKEND=django.core.mail.backends.smtp.EmailBackend
EMAIL_HOST=smtp.gmail.com
EMAIL_PORT=587
EMAIL_USE_TLS=True
EMAIL_HOST_USER=your-email@gmail.com
EMAIL_HOST_PASSWORD=your-app-password
DEFAULT_FROM_EMAIL=noreply@example.com- GPU Support: Using CUDA GPU significantly improves transcription speed
- Model Selection:
tiny(39M params): 5-10x faster, lower accuracybase(74M params): 2-5x faster, decent accuracysmall(244M params): Default, good balancemedium(769M params): Slower, better accuracylarge(1.5B params): Very slow, best accuracy
- Long Videos: Automatically chunked (no manual splitting needed)
- Batch Processing: Queue multiple uploads for parallel processing
- Horizontal Scale: Add more worker containers for higher throughput
- Concurrent Limit: Currently
--pool=solo --concurrency=4(adjust as needed) - Database: PostgreSQL handles concurrent access safely
- Cache: Model stays in memory, re-downloads on restart (persists across container restarts via volume)
- Set
DEBUG=Falsein.env - Generate strong
SECRET_KEY(usepython -c "from django.core.management.utils import get_random_secret_key; print(get_random_secret_key())") - Configure
ALLOWED_HOSTSwith your domain - Set up HTTPS (nginx reverse proxy or cloud provider)
- Configure email backend for password resets
- Use strong database password
- Enable PostgreSQL backups
- Monitor worker health and logs
- Set up log aggregation (e.g., CloudWatch, ELK)
- Configure resource limits in docker-compose
Videos stuck in "Processing" status:
# Manually requeue stale videos
docker-compose exec -T web python manage.py requeue_stale_transcriptions --minutes=0Worker not picking up tasks:
# Check Celery worker logs
docker-compose logs -f worker
# Restart worker
docker-compose restart workerDatabase connection errors:
# Check PostgreSQL is healthy
docker-compose exec db pg_isready -U postgres
# View database service logs
docker-compose logs dbRedis connection issues:
# Verify Redis is accessible
docker-compose exec redis redis-cli ping
# Should return: PONGModel download stuck:
- First run downloads 461MB model (~70 seconds)
- Model is cached in persistent volume
whisper_cache:/cache/whisper - Subsequent runs load from cache (~5 seconds)
Run unit tests:
# With Docker
docker-compose exec -T web python manage.py test
# Locally
python manage.py test transcriptionTest coverage includes:
- β Authorization (IDOR prevention)
- β AJAX status endpoint
- β Chunked transcription merging
- β Task deletion safety
- β Form validation
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch:
git checkout -b feat/amazing-feature - Make your changes and add tests if applicable
- Commit with conventional messages:
git commit -m "feat: description" - Push to your fork:
git push origin feat/amazing-feature - Open a Pull Request against
mainbranch
# Create local development environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set up local .env with SQLite
cp video_transcriber/.env.example video_transcriber/.env
# Edit to use: DB_ENGINE=sqlite, remove CELERY_* vars for testing
# Run migrations
python manage.py migrate
# Run tests
python manage.py test
# Start development servers (in separate terminals)
redis-server
celery -A video_transcriber worker --loglevel=info
python manage.py runserverThis project is licensed under the MIT License - see the LICENSE file for details.
See CHANGES_DOCUMENTATION.md for detailed documentation of all changes, including:
- Infrastructure & containerization improvements
- Security hardening details
- Task reliability enhancements
- Feature additions and rationale
- Migration guide from previous version
See IMPROVEMENTS.md for planned future enhancements.
- OpenAI Whisper β State-of-the-art speech recognition
- Django β Web framework
- Celery β Async task queue
- PostgreSQL β Reliable database
- Bootstrap β Responsive CSS framework
- All contributors who have helped build and improve this tool
- GitHub: Co-vengers/video_transcriber
- Issues: GitHub Issues
- Team: Co-vengers
- Current Version: 2.0.0 (Production-ready)
- Python: 3.12+
- Django: 5.1.7
- Release Date: March 2026