Skip to content

JonathanJing/AI-sermon-workflow

Repository files navigation

Sermon Workflow - Phase 1: Speech-to-Text Service

Automated sermon content workflow system that converts video/audio sermons into high-quality Simplified Chinese subtitles using Google Cloud Speech-to-Text v2 API.

🎯 Features

  • Multi-source ingestion: YouTube URLs and local audio/video files
  • Google Cloud STT v2 API: High-accuracy Simplified Chinese transcription with batch processing
  • Intelligent audio chunking: Automatic splitting of large files for optimal STT performance
  • Phrase management system: Domain-specific religious terms for improved accuracy
  • Subtitle generation: SRT and WebVTT formats with proper line wrapping
  • REST API: Comprehensive RESTful endpoints for job and phrase management
  • Batch processing: CLI tool for processing multiple files with concurrent job support
  • Storage options: Local filesystem or Google Cloud Storage with automatic cleanup
  • Docker support: Containerized deployment with health checks
  • Cost monitoring: Real-time STT cost estimation and limits
  • Comprehensive testing: Validation tools and diagnostic scripts
  • Production-ready: Structured logging, monitoring, and error handling

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   YouTube URL   β”‚    β”‚   Local Files   β”‚    β”‚   File Upload   β”‚
β”‚                 β”‚    β”‚                 β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                      β”‚                      β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚     FastAPI Service         β”‚
                    β”‚                             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Background Workers        β”‚
                    β”‚                             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚                   β”‚                   β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   YouTube          β”‚ β”‚   Audio    β”‚ β”‚   Google Cloud      β”‚
    β”‚   Downloader       β”‚ β”‚   Processorβ”‚ β”‚   Speech-to-Text   β”‚
    β”‚   (yt-dlp)         β”‚ β”‚   (pydub)  β”‚ β”‚   v2 API           β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚                  β”‚                  β”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Phrase Manager            β”‚
                    β”‚   (Domain-specific terms)   β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Subtitle Builder          β”‚
                    β”‚   (SRT/WebVTT)             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚   Storage Manager           β”‚
                    β”‚   (Local / Google Cloud)    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Python 3.11+
  • FFmpeg
  • Google Cloud credentials (for STT)
  • Docker (optional)
  • Redis (optional, for task queue)

Installation

  1. Clone the repository

    git clone <repository-url>
    cd sermon-workflow
  2. Install dependencies

    python -m venv .venv
    source .venv/bin/activate  # On Windows: .venv\Scripts\activate
    pip install -r requirements.txt
  3. Configure environment

    cp .env.template .env
    # Edit .env with your configuration
  4. Set up Google Cloud credentials

    • Create a service account in Google Cloud Console
    • Download the JSON key file
    • Set GOOGLE_APPLICATION_CREDENTIALS in .env

Running the Service

Local development:

uvicorn app.main:app --reload

Docker:

docker-compose up --build

Production:

uvicorn app.main:app --host 0.0.0.0 --port 8000

πŸ“‹ API Usage

1. Create Transcription Job

From YouTube URL:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "youtube",
    "url": "https://www.youtube.com/watch?v=VIDEO_ID",
    "title": "Sunday Sermon"
  }'

From local file:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "file",
    "file_path": "/path/to/audio.mp3",
    "title": "Wednesday Service"
  }'

File upload:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe/upload" \
  -F "file=@sermon.mp3" \
  -F "title=Sunday Sermon"

2. Check Job Status

curl "http://localhost:8000/api/v1/jobs/{job_id}"

3. List Jobs

curl "http://localhost:8000/api/v1/jobs/?limit=10&offset=0"

4. Phrase Management

Get all phrases:

curl "http://localhost:8000/api/v1/phrases/"

Get phrases by language:

curl "http://localhost:8000/api/v1/phrases/language/cmn-Hans-CN"

Add new phrase:

curl -X POST "http://localhost:8000/api/v1/phrases/" \
  -H "Content-Type: application/json" \
  -d '{
    "phrase": "恩典尔湾",
    "language": "chinese",
    "category": "church_names"
  }'

Search phrases:

curl -X POST "http://localhost:8000/api/v1/phrases/search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "耢稣",
    "language": "chinese"
  }'

5. Health Check

curl "http://localhost:8000/health"

πŸ”§ Batch Processing

Use the CLI tool for processing multiple files:

  1. Create CSV input file:

    source_type,source,title
    youtube,https://www.youtube.com/watch?v=VIDEO1,Sunday Sermon 1
    file,/path/to/audio1.mp3,Wednesday Service 1
    file,/path/to/audio2.mp3,Friday Prayer
  2. Run batch processing:

    python scripts/batch_transcribe.py input.csv
  3. Options:

    python scripts/batch_transcribe.py input.csv \
      --output results.csv \
      --concurrent-jobs 5 \
      --timeout 7200

πŸ§ͺ Testing & Validation

Quick Tests

Test YouTube extraction:

python scripts/quick_youtube_test.py

Test STT conversion:

python scripts/quick_stt_test.py

Test chunking system:

python scripts/quick_chunking_test.py

Comprehensive Validation

Validate chunking system:

python scripts/validate_chunking_system.py audio_file.mp3

Test GCS STT support:

python scripts/test_gcs_stt.py

Diagnose Google STT issues:

python scripts/diagnose_google_stt.py

Test Suites

Run chunked extraction test:

python tests/test_chunked_extraction.py

Run comprehensive YouTube test:

python tests/test_youtube_extraction.py

Run single chunk STT test:

python tests/test_single_chunk_stt.py

βš™οΈ Configuration

Key configuration options in .env:

# Google Cloud
GOOGLE_APPLICATION_CREDENTIALS=path/to/service-account.json
GOOGLE_CLOUD_PROJECT=your-project-id
GCS_BUCKET_NAME=your-bucket-name

# Speech-to-Text
STT_LANGUAGE_CODE=cmn-Hans-CN
STT_MODEL=default
STT_COST_LIMIT_USD=10.0

# Storage
STORAGE_TYPE=local  # or 'gcs'
LOCAL_STORAGE_PATH=./data/processed
MAX_FILE_SIZE_MB=500

# API
API_HOST=0.0.0.0
API_PORT=8000
API_KEY=your-api-key

# Redis (optional)
REDIS_URL=redis://localhost:6379/0

# Development
DEBUG=true
LOG_LEVEL=INFO

πŸ“ Project Structure

sermon-workflow/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                 # FastAPI application
β”‚   β”œβ”€β”€ config.py               # Configuration management
β”‚   β”œβ”€β”€ models.py               # Data models
β”‚   β”œβ”€β”€ workers.py              # Background job processing
β”‚   β”œβ”€β”€ routers/
β”‚   β”‚   β”œβ”€β”€ jobs.py             # Job management routes
β”‚   β”‚   └── phrases.py          # Phrase management routes
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ ingest/
β”‚   β”‚   β”‚   β”œβ”€β”€ downloader.py   # YouTube downloader
β”‚   β”‚   β”‚   └── audio_extractor.py  # Audio processing & chunking
β”‚   β”‚   β”œβ”€β”€ stt/
β”‚   β”‚   β”‚   └── google_stt.py   # Google Cloud STT v2 API
β”‚   β”‚   β”œβ”€β”€ subtitles/
β”‚   β”‚   β”‚   └── builder.py      # Subtitle generation
β”‚   β”‚   β”œβ”€β”€ phrase_manager.py   # Phrase management service
β”‚   β”‚   └── storage.py          # Storage management
β”‚   └── config/
β”‚       └── phrases.json        # Domain-specific phrases
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ batch_transcribe.py     # Batch processing CLI
β”‚   β”œβ”€β”€ validate_chunking_system.py  # Chunking validation
β”‚   β”œβ”€β”€ test_gcs_stt.py         # GCS STT testing
β”‚   β”œβ”€β”€ diagnose_google_stt.py  # STT diagnostics
β”‚   └── quick_*.py              # Quick test scripts
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ test_chunked_extraction.py  # Comprehensive chunking test
β”‚   β”œβ”€β”€ test_youtube_extraction.py  # YouTube workflow test
β”‚   β”œβ”€β”€ test_single_chunk_stt.py    # STT conversion test
β”‚   └── test_*.py                   # Other test files
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/                    # Raw audio files
β”‚   └── processed/              # Processed outputs
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ requirements.txt
└── .env.template

πŸ” Testing

Run the development server:

uvicorn app.main:app --reload

Test with sample YouTube video:

curl -X POST "http://localhost:8000/api/v1/jobs/transcribe" \
  -H "Content-Type: application/json" \
  -d '{
    "source_type": "youtube",
    "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "title": "Test Video"
  }'

Check job status:

curl "http://localhost:8000/api/v1/jobs/{job_id}"

Test phrase management:

curl "http://localhost:8000/api/v1/phrases/health"

πŸ“Š Monitoring

  • Health endpoint: GET /health
  • Statistics: GET /stats
  • Configuration: GET /config (debug mode only)
  • Structured logging: JSON format with configurable levels
  • Performance metrics: Processing time, cost estimation, file sizes

🐳 Docker Deployment

Development

# Basic development setup
docker-compose up --build

# With Redis for task queue
docker-compose --profile redis up -d

# With admin interface
docker-compose --profile admin up -d

Production

# Production with Redis
docker-compose -f docker-compose.prod.yml --profile production up -d

# Production with monitoring stack
docker-compose -f docker-compose.prod.yml --profile production --profile monitoring up -d

Environment Setup

# Copy and configure environment file
cp .env.template .env
# Edit .env with your production settings

# For production, ensure service account is available
# The service-account.json file will be mounted into the container

πŸ” Security

  • API key authentication (optional)
  • File upload validation and size limits
  • Resource limits (file size, processing time)
  • Cost limits for STT usage
  • Non-root container execution
  • CORS configuration for web clients

πŸ“ˆ Performance & Scaling

  • Concurrent processing: Background tasks with configurable limits
  • Intelligent chunking: Automatic audio splitting for optimal STT performance
  • File streaming: Efficient handling of large audio files
  • Storage optimization: Automatic cleanup and lifecycle management
  • Cost monitoring: Real-time STT cost estimation and limits
  • Batch operations: Support for long audio files via Google Cloud Storage

🚧 Known Limitations

  1. Database: Currently uses in-memory storage (SQLite/PostgreSQL integration planned)
  2. Task queue: Simple background tasks (Redis/RQ integration available)
  3. Authentication: Basic API key auth (OAuth2 planned for production)
  4. Monitoring: Basic health checks (Prometheus metrics available)

πŸ›£οΈ Roadmap

  • Phase 2: Video clipping and highlight extraction
  • Phase 3: Devotional content generation with LLM
  • Phase 4: Multi-platform content distribution
  • Database: PostgreSQL integration
  • Queue: Redis/RQ for robust job processing
  • Monitoring: Prometheus + Grafana dashboard
  • Multi-language: Support for additional languages
  • Advanced phrase adaptation: Dynamic phrase learning

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

For issues and questions:

  • Check the logs: docker-compose logs
  • Health check: curl http://localhost:8000/health
  • Documentation: http://localhost:8000/docs
  • Run diagnostics: python scripts/diagnose_google_stt.py

πŸ“‹ Environment Variables Reference

Variable Description Default
GOOGLE_APPLICATION_CREDENTIALS Path to GCP service account key Required
GOOGLE_CLOUD_PROJECT GCP project ID Required
GCS_BUCKET_NAME GCS bucket for file storage Optional
STT_LANGUAGE_CODE Speech-to-Text language cmn-Hans-CN
STT_MODEL STT model type default
STT_COST_LIMIT_USD Maximum STT cost per job 10.0
STORAGE_TYPE Storage backend (local or gcs) local
LOCAL_STORAGE_PATH Local storage directory ./data/processed
MAX_FILE_SIZE_MB Maximum file size for processing 500
API_HOST API server host 0.0.0.0
API_PORT API server port 8000
API_KEY API authentication key Optional
REDIS_URL Redis connection URL redis://localhost:6379/0
DEBUG Enable debug mode true
LOG_LEVEL Logging level INFO

🎯 Key Features Summary

Latest Enhancements

  • Google Cloud Speech-to-Text v2 API: Full support for the latest API with improved accuracy
  • Intelligent Audio Chunking: Automatic splitting of large files to stay within Google STT limits
  • Phrase Management System: Domain-specific religious terms for improved transcription accuracy
  • Comprehensive Testing Suite: Validation tools and diagnostic scripts for troubleshooting
  • Batch Processing: CLI tool with concurrent job support for processing multiple files
  • Production-Ready: Structured logging, health checks, and monitoring endpoints

Technical Improvements

  • Audio Processing: Optimized for STT with automatic format conversion and quality preservation
  • Error Handling: Robust error handling with detailed logging and recovery mechanisms
  • Performance: Efficient processing pipeline with configurable concurrency limits
  • Scalability: Support for both local and cloud storage with automatic cleanup
  • Monitoring: Real-time cost tracking and performance metrics
  • Docker Optimization: Multi-stage builds, proper file mounting, and production-ready configurations

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors