Sermon Workflow - Phase 1: Speech-to-Text Service

Automated sermon content workflow system that converts video/audio sermons into high-quality Simplified Chinese subtitles using Google Cloud Speech-to-Text v2 API.

🎯 Features

Multi-source ingestion: YouTube URLs and local audio/video files
Google Cloud STT v2 API: High-accuracy Simplified Chinese transcription with batch processing
Intelligent audio chunking: Automatic splitting of large files for optimal STT performance
Phrase management system: Domain-specific religious terms for improved accuracy
Subtitle generation: SRT and WebVTT formats with proper line wrapping
REST API: Comprehensive RESTful endpoints for job and phrase management
Batch processing: CLI tool for processing multiple files with concurrent job support
Storage options: Local filesystem or Google Cloud Storage with automatic cleanup
Docker support: Containerized deployment with health checks
Cost monitoring: Real-time STT cost estimation and limits
Comprehensive testing: Validation tools and diagnostic scripts
Production-ready: Structured logging, monitoring, and error handling

🏗️ Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   YouTube URL   │    │   Local Files   │    │   File Upload   │
│                 │    │                 │    │                 │
└─────────┬───────┘    └─────────┬───────┘    └─────────┬───────┘
          │                      │                      │
          └──────────────────────┼──────────────────────┘
                                 │
                    ┌─────────────▼───────────────┐
                    │     FastAPI Service         │
                    │                             │
                    └─────────────┬───────────────┘
                                  │
                    ┌─────────────▼───────────────┐
                    │   Background Workers        │
                    │                             │
                    └─────────────┬───────────────┘
                                  │
              ┌───────────────────┼───────────────────┐
              │                   │                   │
    ┌─────────▼──────────┐ ┌─────▼──────┐ ┌─────────▼──────────┐
    │   YouTube          │ │   Audio    │ │   Google Cloud      │
    │   Downloader       │ │   Processor│ │   Speech-to-Text   │
    │   (yt-dlp)         │ │   (pydub)  │ │   v2 API           │
    └─────────┬──────────┘ └─────┬──────┘ └─────────┬──────────┘
              │                  │                  │
              └──────────────────┼──────────────────┘
                                 │
                    ┌─────────────▼───────────────┐
                    │   Phrase Manager            │
                    │   (Domain-specific terms)   │
                    └─────────────┬───────────────┘
                                  │
                    ┌─────────────▼───────────────┐
                    │   Subtitle Builder          │
                    │   (SRT/WebVTT)             │
                    └─────────────┬───────────────┘
                                  │
                    ┌─────────────▼───────────────┐
                    │   Storage Manager           │
                    │   (Local / Google Cloud)    │
                    └─────────────────────────────┘

🚀 Quick Start

Prerequisites

Python 3.11+
FFmpeg
Google Cloud credentials (for STT)
Docker (optional)
Redis (optional, for task queue)

Installation

Clone the repository

git clone <repository-url>
cd sermon-workflow

Install dependencies

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Configure environment

cp .env.template .env
# Edit .env with your configuration

Set up Google Cloud credentials
- Create a service account in Google Cloud Console
- Download the JSON key file
- Set GOOGLE_APPLICATION_CREDENTIALS in .env

Running the Service

Local development:

uvicorn app.main:app --reload

Docker:

docker-compose up --build

Production:

uvicorn app.main:app --host 0.0.0.0 --port 8000

📋 API Usage

1. Create Transcription Job

From YouTube URL:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "youtube",
    "url": "https://www.youtube.com/watch?v=VIDEO_ID",
    "title": "Sunday Sermon"
  }'

From local file:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "file",
    "file_path": "/path/to/audio.mp3",
    "title": "Wednesday Service"
  }'

File upload:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe/upload" \
  -F "file=@sermon.mp3" \
  -F "title=Sunday Sermon"

2. Check Job Status

curl "http://localhost:8000/api/v1/jobs/{job_id}"

3. List Jobs

curl "http://localhost:8000/api/v1/jobs/?limit=10&offset=0"

4. Phrase Management

Get all phrases:

curl "http://localhost:8000/api/v1/phrases/"

Get phrases by language:

curl "http://localhost:8000/api/v1/phrases/language/cmn-Hans-CN"

Add new phrase:

curl -X POST "http://localhost:8000/api/v1/phrases/" \
  -H "Content-Type: application/json" \
  -d '{
    "phrase": "恩典尔湾",
    "language": "chinese",
    "category": "church_names"
  }'

Search phrases:

curl -X POST "http://localhost:8000/api/v1/phrases/search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "耶稣",
    "language": "chinese"
  }'

5. Health Check

curl "http://localhost:8000/health"

🔧 Batch Processing

Use the CLI tool for processing multiple files:

Create CSV input file:

source_type,source,title
youtube,https://www.youtube.com/watch?v=VIDEO1,Sunday Sermon 1
file,/path/to/audio1.mp3,Wednesday Service 1
file,/path/to/audio2.mp3,Friday Prayer

Run batch processing:

python scripts/batch_transcribe.py input.csv

Options:

python scripts/batch_transcribe.py input.csv \
  --output results.csv \
  --concurrent-jobs 5 \
  --timeout 7200

🧪 Testing & Validation

Quick Tests

Test YouTube extraction:

python scripts/quick_youtube_test.py

Test STT conversion:

python scripts/quick_stt_test.py

Test chunking system:

python scripts/quick_chunking_test.py

Comprehensive Validation

Validate chunking system:

python scripts/validate_chunking_system.py audio_file.mp3

Test GCS STT support:

python scripts/test_gcs_stt.py

Diagnose Google STT issues:

python scripts/diagnose_google_stt.py

Test Suites

Run chunked extraction test:

python tests/test_chunked_extraction.py

Run comprehensive YouTube test:

python tests/test_youtube_extraction.py

Run single chunk STT test:

python tests/test_single_chunk_stt.py

⚙️ Configuration

Key configuration options in .env:

# Google Cloud
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
GOOGLE_CLOUD_PROJECT=your-project-id
GCS_BUCKET_NAME=your-bucket-name

# Speech-to-Text
STT_LANGUAGE_CODE=cmn-Hans-CN
STT_MODEL=default
STT_COST_LIMIT_USD=10.0

# Storage
STORAGE_TYPE=local  # or 'gcs'
LOCAL_STORAGE_PATH=./data/processed
MAX_FILE_SIZE_MB=500

# API
API_HOST=0.0.0.0
API_PORT=8000
API_KEY=your-api-key

# Redis (optional)
REDIS_URL=redis://localhost:6379/0

# Development
DEBUG=true
LOG_LEVEL=INFO

📁 Project Structure

sermon-workflow/
├── app/
│   ├── __init__.py
│   ├── main.py                 # FastAPI application
│   ├── config.py               # Configuration management
│   ├── models.py               # Data models
│   ├── workers.py              # Background job processing
│   ├── routers/
│   │   ├── jobs.py             # Job management routes
│   │   └── phrases.py          # Phrase management routes
│   ├── services/
│   │   ├── ingest/
│   │   │   ├── downloader.py   # YouTube downloader
│   │   │   └── audio_extractor.py  # Audio processing & chunking
│   │   ├── stt/
│   │   │   └── google_stt.py   # Google Cloud STT v2 API
│   │   ├── subtitles/
│   │   │   └── builder.py      # Subtitle generation
│   │   ├── phrase_manager.py   # Phrase management service
│   │   └── storage.py          # Storage management
│   └── config/
│       └── phrases.json        # Domain-specific phrases
├── scripts/
│   ├── batch_transcribe.py     # Batch processing CLI
│   ├── validate_chunking_system.py  # Chunking validation
│   ├── test_gcs_stt.py         # GCS STT testing
│   ├── diagnose_google_stt.py  # STT diagnostics
│   └── quick_*.py              # Quick test scripts
├── tests/
│   ├── test_chunked_extraction.py  # Comprehensive chunking test
│   ├── test_youtube_extraction.py  # YouTube workflow test
│   ├── test_single_chunk_stt.py    # STT conversion test
│   └── test_*.py                   # Other test files
├── data/
│   ├── raw/                    # Raw audio files
│   └── processed/              # Processed outputs
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── .env.template

🔍 Testing

Run the development server:

uvicorn app.main:app --reload

Test with sample YouTube video:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "youtube",
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "title": "Test Video"
  }'

Check job status:

curl "http://localhost:8000/api/v1/jobs/{job_id}"

Test phrase management:

curl "http://localhost:8000/api/v1/phrases/health"

📊 Monitoring

Health endpoint: GET /health
Statistics: GET /stats
Configuration: GET /config (debug mode only)
Structured logging: JSON format with configurable levels
Performance metrics: Processing time, cost estimation, file sizes

🐳 Docker Deployment

Development

# Basic development setup
docker-compose up --build

# With Redis for task queue
docker-compose --profile redis up -d

# With admin interface
docker-compose --profile admin up -d

Production

# Production with Redis
docker-compose -f docker-compose.prod.yml --profile production up -d

# Production with monitoring stack
docker-compose -f docker-compose.prod.yml --profile production --profile monitoring up -d

Environment Setup

# Copy and configure environment file
cp .env.template .env
# Edit .env with your production settings

# For production, ensure service account is available
# The service-account.json file will be mounted into the container

🔐 Security

API key authentication (optional)
File upload validation and size limits
Resource limits (file size, processing time)
Cost limits for STT usage
Non-root container execution
CORS configuration for web clients

📈 Performance & Scaling

Concurrent processing: Background tasks with configurable limits
Intelligent chunking: Automatic audio splitting for optimal STT performance
File streaming: Efficient handling of large audio files
Storage optimization: Automatic cleanup and lifecycle management
Cost monitoring: Real-time STT cost estimation and limits
Batch operations: Support for long audio files via Google Cloud Storage

🚧 Known Limitations

Database: Currently uses in-memory storage (SQLite/PostgreSQL integration planned)
Task queue: Simple background tasks (Redis/RQ integration available)
Authentication: Basic API key auth (OAuth2 planned for production)
Monitoring: Basic health checks (Prometheus metrics available)

🛣️ Roadmap

Phase 2: Video clipping and highlight extraction
Phase 3: Devotional content generation with LLM
Phase 4: Multi-platform content distribution
Database: PostgreSQL integration
Queue: Redis/RQ for robust job processing
Monitoring: Prometheus + Grafana dashboard
Multi-language: Support for additional languages
Advanced phrase adaptation: Dynamic phrase learning

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

For issues and questions:

Check the logs: docker-compose logs
Health check: curl http://localhost:8000/health
Documentation: http://localhost:8000/docs
Run diagnostics: python scripts/diagnose_google_stt.py

📋 Environment Variables Reference

Variable	Description	Default
`GOOGLE_APPLICATION_CREDENTIALS`	Path to GCP service account key	Required
`GOOGLE_CLOUD_PROJECT`	GCP project ID	Required
`GCS_BUCKET_NAME`	GCS bucket for file storage	Optional
`STT_LANGUAGE_CODE`	Speech-to-Text language	`cmn-Hans-CN`
`STT_MODEL`	STT model type	`default`
`STT_COST_LIMIT_USD`	Maximum STT cost per job	`10.0`
`STORAGE_TYPE`	Storage backend (`local` or `gcs`)	`local`
`LOCAL_STORAGE_PATH`	Local storage directory	`./data/processed`
`MAX_FILE_SIZE_MB`	Maximum file size for processing	`500`
`API_HOST`	API server host	`0.0.0.0`
`API_PORT`	API server port	`8000`
`API_KEY`	API authentication key	Optional
`REDIS_URL`	Redis connection URL	`redis://localhost:6379/0`
`DEBUG`	Enable debug mode	`true`
`LOG_LEVEL`	Logging level	`INFO`

🎯 Key Features Summary

Latest Enhancements

Google Cloud Speech-to-Text v2 API: Full support for the latest API with improved accuracy
Intelligent Audio Chunking: Automatic splitting of large files to stay within Google STT limits
Phrase Management System: Domain-specific religious terms for improved transcription accuracy
Comprehensive Testing Suite: Validation tools and diagnostic scripts for troubleshooting
Batch Processing: CLI tool with concurrent job support for processing multiple files
Production-Ready: Structured logging, health checks, and monitoring endpoints

Technical Improvements

Audio Processing: Optimized for STT with automatic format conversion and quality preservation
Error Handling: Robust error handling with detailed logging and recovery mechanisms
Performance: Efficient processing pipeline with configurable concurrency limits
Scalability: Support for both local and cloud storage with automatic cleanup
Monitoring: Real-time cost tracking and performance metrics
Docker Optimization: Multi-stage builds, proper file mounting, and production-ready configurations

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
AI sermon workflow.md		AI sermon workflow.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
product design.md		product design.md
requirements-gcs.txt		requirements-gcs.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Sermon Workflow - Phase 1: Speech-to-Text Service

🎯 Features

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

Running the Service

📋 API Usage

1. Create Transcription Job

2. Check Job Status

3. List Jobs

4. Phrase Management

5. Health Check

🔧 Batch Processing

🧪 Testing & Validation

Quick Tests

Comprehensive Validation

Test Suites

⚙️ Configuration

📁 Project Structure

🔍 Testing

📊 Monitoring

🐳 Docker Deployment

Development

Production

Environment Setup

🔐 Security

📈 Performance & Scaling

🚧 Known Limitations

🛣️ Roadmap

🤝 Contributing

📄 License

🆘 Support

📋 Environment Variables Reference

🎯 Key Features Summary

Latest Enhancements

Technical Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages