SoundNote

A production-ready Django application for transcribing video and audio files using OpenAI's Whisper model, with containerization, async task processing, and comprehensive security features.

Overview

Video Transcriber is a modern, full-featured web application for transcribing video and audio content. It leverages OpenAI's Whisper, a state-of-the-art speech recognition model, combined with Docker containerization, PostgreSQL, Celery async tasks, and strict security controls to provide a reliable, scalable platform for media transcription.

Key Features

Core Transcription

Multiple File Formats: MP4, MPEG, MOV, AVI, WebM, OGG, MP3, WAV, FLAC
Long Video Support: Automatic chunking for videos over 15 minutes
Configurable Models: Tiny, Base, Small (default), Medium, Large (accuracy vs. speed tradeoff)
Timestamp Segments: Per-utterance transcripts with precise timing
Multi-format Export: Download as TXT (plaintext) or SRT (subtitles for video players)

Architecture & Reliability

Docker Containerization: Reproducible deployments with docker-compose
PostgreSQL Database: Production-grade data persistence and concurrent access
Celery Task Queue: Asynchronous transcription processing with Redis broker
Graceful Error Handling: Automatic recovery from worker crashes and deleted records
Persistent Model Cache: Pre-downloaded Whisper models survive container restarts

Security

User Authentication: Registration, login, password reset with email
Authorization: Users can only access their own videos (prevents IDOR attacks)
Brute-force Protection: Lock accounts after 5 failed login attempts (24-hour cooldown)
CSRF Protection: All forms include CSRF tokens
Secure Cookies: HTTPS-only cookies in production
Input Validation: File type, size, and content-type validation

User Experience

Real-time Status Updates: AJAX polling shows transcription progress
Pagination: Video list with 6 items per page
Bootstrap UI: Responsive, mobile-friendly interface
User-friendly Titles: Video filenames automatically used as titles
Progress Tracking: Visual status indicators (Pending, Processing, Completed, Failed)

Installation

Prerequisites

Docker & Docker Compose (recommended for production)
Python 3.12+ (for local development)
FFmpeg (for audio extraction from video files)

Quick Start with Docker (Recommended)

Clone the repository:

git clone https://github.com/Co-vengers/video_transcriber.git
cd video_transcriber

Create environment configuration:

cp video_transcriber/.env.example video_transcriber/.env
# Edit video_transcriber/.env with your settings

Build and start services:
```
docker-compose up --build
```
On ARM64 hosts (e.g., Apple Silicon or ARM-based Linux VMs), the Docker platform is defaulted to linux/amd64 for better Python ML wheel compatibility. To run native ARM64 instead:
```
DOCKER_PLATFORM=linux/arm64 docker compose up --build
```
Access the application:
- Web UI: http://localhost:8000
- Admin: http://localhost:8000/admin (superuser required)

Create admin user (optional, in another terminal):

docker-compose exec -T web python manage.py createsuperuser

Local Development Setup

If you prefer running locally without Docker:

Clone the repository:

git clone https://github.com/Co-vengers/video_transcriber.git
cd video_transcriber

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

cp video_transcriber/.env.example video_transcriber/.env
# Edit .env with development settings (use SQLite for local dev)

Run migrations:
```
python manage.py migrate
```
Create a superuser:
```
python manage.py createsuperuser
```

Start Redis (in separate terminal):

# Using Homebrew on macOS:
brew services start redis
# Or run directly:
redis-server

Start Celery worker (in separate terminal):

cd video_transcriber
celery -A video_transcriber worker --loglevel=info

Start Django development server:

cd video_transcriber
python manage.py runserver

Access at http://localhost:8000

Usage

Web Interface Workflow

1. Register / Login

Click "Register" to create a new account or "Login" with existing credentials
Password reset available via email link
Brute-force protection: Account locks after 5 failed attempts (24-hour cooldown)

2. Upload Video

Click "Upload Video" from main menu
Select a video or audio file (max 500 MB)
Supported formats: MP4, MPEG, MOV, AVI, WebM, OGG, MP3, WAV, FLAC
Choose Whisper model size:
- Tiny: Fastest, ~39M parameters, basic accuracy
- Base: Balanced, ~74M parameters
- Small: Default, ~244M parameters, good accuracy/speed tradeoff
- Medium: Slower, ~769M parameters, better accuracy
- Large: Slowest, ~1.5B parameters, highest accuracy
Click "Upload" to start transcription

3. Monitor Progress

Videos appear in "Videos" list with status badge
Status updates in real-time (Pending → Processing → Completed/Failed)
Progress visible without page refresh

4. Download Transcripts

Once transcription completes, download in multiple formats:

TXT: Plain text transcript (copy/paste friendly)
SRT: SubRip format with timestamps (import into video players)
- Format: HH:MM:SS,mmm --> HH:MM:SS,mmm
- Compatible with VLC, YouTube, browser video players

5. Manage Videos

View all your transcribed videos with status
Click video title to see full transcript and segments
Delete videos to free up storage
Videos are private (only you can see your transcripts)

Admin Interface

Access Django admin at /admin:

Manage user accounts
View/filter transcription jobs by status and date
Search videos by title or username
Monitor transcription history

API Endpoints (for developers)

Endpoint	Method	Purpose
`/`	GET	Upload form
`/videos/`	GET	List user's videos (paginated)
`/videos/<id>/`	GET	View transcript and segments
`/videos/<id>/status/`	GET	JSON status (for AJAX polling)
`/videos/<id>/download/<fmt>/`	GET	Download transcript (fmt: txt or srt)
`/videos/<id>/delete/`	POST	Delete video
`/register/`	GET/POST	User registration
`/login/`	GET/POST	User login
`/logout/`	POST	User logout
`/password-reset/`	GET/POST	Password reset flow

Architecture

Technology Stack

Web Framework: Django 5.1.7 (Python web framework)
Application Server: Gunicorn 23.0.0 (production WSGI server)
Database: PostgreSQL 16 (production relational database)
Message Broker: Redis 7 (in-memory message queue)
Task Queue: Celery 5.4.0 (asynchronous job processing)
ML/AI Engine: OpenAI Whisper 20240930 (speech-to-text)
Containerization: Docker & docker-compose (reproducible deployments)
Frontend: Bootstrap 5 (responsive CSS framework)

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Docker Compose                          │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌─────────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │   Web Service   │  │   Worker     │  │   Database   │   │
│  │   (Gunicorn)    │  │  (Celery)    │  │(PostgreSQL)  │   │
│  │   2 workers     │  │ concurrency=4│  │              │   │
│  └────────┬────────┘  └──────┬───────┘  └──────────────┘   │
│           │                   │                              │
│           └──────────┬────────┘                              │
│                      │                                       │
│              ┌───────▼────────┐                              │
│              │  Redis Broker  │                              │
│              │  (Task Queue)  │                              │
│              └────────────────┘                              │
│                                                               │
│            Persistent Volumes:                               │
│            - whisper_cache: /cache/whisper (461MB model)     │
│            - postgres_data: PostgreSQL data                  │
│                                                               │
└─────────────────────────────────────────────────────────────┘

Data Flow

User uploads video → Web service saves to /media/videos/
Task is queued → Gunicorn sends process_transcription task to Redis
Worker picks up task → Celery worker loads Whisper model (cached)
Chunked transcription → Long videos split into 10-minute chunks
Results saved → Transcripts and segments stored in PostgreSQL
Frontend updates → AJAX polling shows real-time status
User downloads → Download as TXT or SRT subtitle format

Project Structure

video_transcriber/
├── Dockerfile              # Container image definition
├── docker-compose.yml      # Multi-service orchestration
├── requirements.txt        # Python dependencies
├── CHANGES_DOCUMENTATION.md # Full changelog and rationale
│
├── video_transcriber/      # Django project config
│   ├── settings.py         # Django configuration
│   ├── urls.py             # Main URL routing
│   ├── wsgi.py             # WSGI application
│   ├── celery.py           # Celery configuration
│   └── __init__.py         # Celery import
│
├── transcription/          # Django app (main logic)
│   ├── models.py           # Video model with ownership
│   ├── views.py            # All HTTP view handlers
│   ├── urls.py             # App URL patterns
│   ├── forms.py            # VideoUploadForm with validation
│   ├── tasks.py            # Celery transcription task
│   ├── admin.py            # Django admin configuration
│   ├── utils.py            # Whisper transcription utilities
│   ├── exports.py          # TXT/SRT export functions
│   │
│   ├── management/
│   │   └── commands/
│   │       └── requeue_stale_transcriptions.py  # Stale task recovery
│   │
│   ├── migrations/         # Database schema versions
│   │   ├── 0001_initial.py
│   │   ├── 0002_video_user.py
│   │   ├── 0003_alter_video_file_alter_video_user.py
│   │   ├── 0004_video_status.py
│   │   ├── 0005_video_segments.py
│   │   └── 0006_fix_upload_to_path.py
│   │
│   ├── templates/          # HTML templates
│   │   ├── base.html       # Navigation, Bootstrap layout
│   │   ├── upload.html     # Video upload form
│   │   ├── video_list.html # Paginated video gallery
│   │   ├── video_detail.html # Transcript viewer, download
│   │   └── auth/           # Authentication templates
│   │       ├── login.html
│   │       ├── register.html
│   │       ├── password_reset.html
│   │       └── lockout.html
│   │
│   ├── static/             # CSS, JS, fonts
│   │   └── transcription/
│   │       └── favicon.svg
│   │
│   └── tests.py            # Unit tests
│
├── media/                  # User uploads (not in repo)
│   └── videos/
│
├── .env.example            # Environment template
├── .python-version         # Python 3.12
├── .gitignore              # Excluded files
└── README.md               # This file

Security Features

Authentication & Authorization

✅ User registration with password validation
✅ Secure password reset via email
✅ CSRF tokens on all forms
✅ Permission checks (users can only access their own videos)
✅ IDOR prevention (returns 404 if accessing others' content)

Attack Prevention

✅ Brute-force protection (5 failed attempts → 24-hour lockout)
✅ HTTPS-only cookies in production
✅ Secure cookie flags (HttpOnly, SameSite)
✅ Security headers (X-Frame-Options=DENY, X-Content-Type-Options=nosniff)
✅ SQL injection prevention (parameterized queries via ORM)

Data Protection

✅ Environment-based secrets (not in code)
✅ File type validation (extension + MIME type)
✅ File size limits (500 MB max)
✅ Input sanitization on all forms
✅ Secure file storage outside web root

Worker Reliability

✅ Graceful error handling (no crashes on deleted records)
✅ Task timeouts (1 hour hard, 55-minute soft)
✅ Auto-requeue on worker loss
✅ Automatic stale task recovery on startup
✅ Non-root worker process (nobody:nogroup)

Environment Configuration

Docker Setup (`.env` file)

# Security
SECRET_KEY=your-secret-key-here

# Debug Mode (False in production)
DEBUG=True

# Allowed Hosts
ALLOWED_HOSTS=localhost,127.0.0.1,0.0.0.0

# Database
DB_ENGINE=postgres           # or 'sqlite' for development
DB_NAME=video_transcriber
DB_USER=postgres
DB_PASSWORD=postgres
DB_HOST=db                   # Docker service name
DB_PORT=5432

# Message Broker & Results
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/0

# Stale Task Recovery
STALE_PROCESSING_MINUTES=45   # How old before marking as stale
RECOVERY_MODEL_SIZE=small      # Model to use for requeue

Email Configuration (Optional)

For password reset emails:

EMAIL_BACKEND=django.core.mail.backends.smtp.EmailBackend
EMAIL_HOST=smtp.gmail.com
EMAIL_PORT=587
EMAIL_USE_TLS=True
EMAIL_HOST_USER=your-email@gmail.com
EMAIL_HOST_PASSWORD=your-app-password
DEFAULT_FROM_EMAIL=noreply@example.com

Performance & Deployment

Performance Tips

GPU Support: Using CUDA GPU significantly improves transcription speed
Model Selection:
- tiny (39M params): 5-10x faster, lower accuracy
- base (74M params): 2-5x faster, decent accuracy
- small (244M params): Default, good balance
- medium (769M params): Slower, better accuracy
- large (1.5B params): Very slow, best accuracy
Long Videos: Automatically chunked (no manual splitting needed)
Batch Processing: Queue multiple uploads for parallel processing

Scaling

Horizontal Scale: Add more worker containers for higher throughput
Concurrent Limit: Currently --pool=solo --concurrency=4 (adjust as needed)
Database: PostgreSQL handles concurrent access safely
Cache: Model stays in memory, re-downloads on restart (persists across container restarts via volume)

Production Checklist

Troubleshooting

Common Issues

Videos stuck in "Processing" status:

# Manually requeue stale videos
docker-compose exec -T web python manage.py requeue_stale_transcriptions --minutes=0

Worker not picking up tasks:

# Check Celery worker logs
docker-compose logs -f worker

# Restart worker
docker-compose restart worker

Database connection errors:

# Check PostgreSQL is healthy
docker-compose exec db pg_isready -U postgres

# View database service logs
docker-compose logs db

Redis connection issues:

# Verify Redis is accessible
docker-compose exec redis redis-cli ping
# Should return: PONG

Model download stuck:

First run downloads 461MB model (~70 seconds)
Model is cached in persistent volume whisper_cache:/cache/whisper
Subsequent runs load from cache (~5 seconds)

Testing

Run unit tests:

# With Docker
docker-compose exec -T web python manage.py test

# Locally
python manage.py test transcription

Test coverage includes:

✅ Authorization (IDOR prevention)
✅ AJAX status endpoint
✅ Chunked transcription merging
✅ Task deletion safety
✅ Form validation

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch: git checkout -b feat/amazing-feature
Make your changes and add tests if applicable
Commit with conventional messages: git commit -m "feat: description"
Push to your fork: git push origin feat/amazing-feature
Open a Pull Request against main branch

Development Workflow

# Create local development environment
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Set up local .env with SQLite
cp video_transcriber/.env.example video_transcriber/.env
# Edit to use: DB_ENGINE=sqlite, remove CELERY_* vars for testing

# Run migrations
python manage.py migrate

# Run tests
python manage.py test

# Start development servers (in separate terminals)
redis-server
celery -A video_transcriber worker --loglevel=info
python manage.py runserver

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGES_DOCUMENTATION.md for detailed documentation of all changes, including:

Infrastructure & containerization improvements
Security hardening details
Task reliability enhancements
Feature additions and rationale
Migration guide from previous version

See IMPROVEMENTS.md for planned future enhancements.

Acknowledgments

OpenAI Whisper — State-of-the-art speech recognition
Django — Web framework
Celery — Async task queue
PostgreSQL — Reliable database
Bootstrap — Responsive CSS framework
All contributors who have helped build and improve this tool

Contact & Support

GitHub: Co-vengers/video_transcriber
Issues: GitHub Issues
Team: Co-vengers

Version

Current Version: 2.0.0 (Production-ready)
Python: 3.12+
Django: 5.1.7
Release Date: March 2026

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
video_transcriber		video_transcriber
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
CHANGES_DOCUMENTATION.md		CHANGES_DOCUMENTATION.md
Dockerfile		Dockerfile
IMPROVEMENTS.md		IMPROVEMENTS.md
LICENSE		LICENSE
README.md		README.md
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
soundnote.service		soundnote.service

Folders and files

Latest commit

History

Repository files navigation

SoundNote

Overview

Key Features

Core Transcription

Architecture & Reliability

Security

User Experience

Installation

Prerequisites

Quick Start with Docker (Recommended)

Local Development Setup

Usage

Web Interface Workflow

1. Register / Login

2. Upload Video

3. Monitor Progress

4. Download Transcripts

5. Manage Videos

Admin Interface

API Endpoints (for developers)

Architecture

Technology Stack

System Architecture

Data Flow

Project Structure

Security Features

Authentication & Authorization

Attack Prevention

Data Protection

Worker Reliability

Environment Configuration

Docker Setup (.env file)

Email Configuration (Optional)

Performance & Deployment

Performance Tips

Scaling

Production Checklist

Troubleshooting

Common Issues

Testing

Contributing

Development Workflow

License

Changelog

Acknowledgments

Contact & Support

Version

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Docker Setup (`.env` file)

Packages