Multilingual Video Processor

A standalone, open-source, cloud-function-ready service that provides video translation functionality. The service takes video files as input and generates translated versions in multiple target languages using Speech-to-Text (STT), Translation, and Text-to-Speech (TTS) services.

Features

Video Translation: Translate video audio to multiple target languages
Speech-to-Text: Transcribe audio from videos using Google Cloud Speech-to-Text
Multi-language Support: Translate to multiple languages concurrently
Text-to-Speech: Generate natural-sounding speech in target languages
Audio Sync: Automatically sync translated audio with original video
Cloud Function Ready: Deploy as Google Cloud Function
Secure: Input validation, rate limiting, and secure credential handling
Observable: Structured logging and progress tracking

Architecture

STT → Translation → TTS → Audio Sync → Output

Speech-to-Text: Extract audio from video and transcribe to text
Translation: Translate transcribed text to target languages
Text-to-Speech: Generate audio from translated text
Audio Sync: Replace audio track in video with translated audio
Output: Upload translated videos to cloud storage

Quick Start

Prerequisites

Go 1.23 or later
Google Cloud Project with the following APIs enabled:
- Cloud Speech-to-Text API
- Cloud Translation API
- Cloud Text-to-Speech API
- Cloud Storage API
- See docs/DEPLOYMENT.md for detailed API enablement instructions
FFmpeg installed (for video processing):
- macOS: brew install ffmpeg
- Linux: apt-get install ffmpeg or yum install ffmpeg
- Windows: Download from ffmpeg.org
Google Cloud credentials: Service account JSON with required permissions:
- roles/storage.objectAdmin for GCS operations
- roles/speech.client for Speech-to-Text API
- roles/cloudtranslate.user for Translation API
- roles/cloudtts.user for Text-to-Speech API

Installation

Clone the repository:

git clone https://github.com/sinouw/multilingual-video-processor.git
cd multilingual-video-processor

Install dependencies:

go mod download

Configure environment variables:

cp .env.example .env
# Edit .env with your configuration

Set up Google Cloud credentials:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json

Configuration

All configuration is done via environment variables. See .env.example for a complete template.

Required Environment Variables

GCS_BUCKET_OUTPUT: Output bucket for translated videos (required)
GOOGLE_TRANSLATE_API_KEY: Google Translation API key (required)

Optional Environment Variables

GOOGLE_APPLICATION_CREDENTIALS: Path to service account JSON (optional, can use default credentials)
GCS_BUCKET_INPUT: Input bucket for GCS URLs (optional)
SUPPORTED_LANGUAGES: Comma-separated list of supported languages (default: "en,ar,de,ru")
SOURCE_LANGUAGE: Default source language (optional, auto-detect if empty)
MAX_VIDEO_DURATION: Maximum video duration in seconds (default: 600)
MAX_VIDEO_SIZE_MB: Maximum video size in MB (default: 500)
MAX_CONCURRENT_JOBS: Maximum concurrent jobs (default: 10)
MAX_CONCURRENT_TRANSLATIONS: Maximum concurrent translations per job (default: 3)
REQUEST_TIMEOUT: Request timeout in seconds (default: 540)
LOG_LEVEL: Logging level - debug, info, warn, error (default: "info")
API_VERSION: API version (default: "v1")
ENABLE_HEALTH_CHECK: Enable health check endpoints (default: "true")
RATE_LIMIT_RPM: Rate limit requests per minute (default: 60)
WEBHOOK_URL: Webhook URL for job completion notifications (optional)
CORS_ORIGINS: Comma-separated CORS origins (default: "*")
JOB_TTL: Job time-to-live duration (default: "24h")
MAX_REQUEST_BODY_SIZE_BYTES: Maximum request body size in bytes (default: 1048576)

API Usage

Submit Translation Job

curl -X POST https://your-function-url/v1/translate \
  -H "Content-Type: application/json" \
  -d '{
    "videoUrl": "gs://bucket/path/to/video.mp4",
    "targetLanguages": ["en", "ar", "de"],
    "sourceLanguage": "fr"
  }'

Check Job Status

curl https://your-function-url/v1/status/{jobId}

Example Response:

{
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "results": {
    "en": {
      "status": "completed",
      "videoUrl": "gs://bucket/translations/job-id/en.mp4",
      "translatedText": "Hello, this is the translated text.",
      "progress": 100,
      "processedAt": "2026-01-19T12:00:00Z"
    },
    "ar": {
      "status": "completed",
      "videoUrl": "gs://bucket/translations/job-id/ar.mp4",
      "translatedText": "مرحبا، هذا هو النص المترجم.",
      "progress": 100,
      "processedAt": "2026-01-19T12:00:00Z"
    }
  }
}

Health Check

curl https://your-function-url/health

Webhook Configuration

Configure webhooks to receive notifications when translation jobs complete or fail. Set the WEBHOOK_URL environment variable to your webhook endpoint.

Webhook Payload:

{
  "event": "job.completed",
  "jobId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "results": {
    "en": {
      "status": "completed",
      "videoUrl": "gs://bucket/translations/job-id/en.mp4",
      "translatedText": "Hello, this is the translated text.",
      "progress": 100,
      "processedAt": "2026-01-19T12:00:00Z"
    }
  },
  "timestamp": "2026-01-19T12:00:00Z"
}

Event Types:

job.processing: Job started processing
job.completed: Job completed successfully
job.failed: Job failed (includes error message in payload)

Webhooks are triggered asynchronously and include retry logic for failed deliveries.

Deployment

Deploy to Google Cloud Functions

Build and deploy:

./deploy.sh

Or use gcloud directly:

gcloud functions deploy multilingual-video-processor \
  --gen2 \
  --runtime go123 \
  --trigger-http \
  --memory 4096MB \
  --timeout 540s \
  --allow-unauthenticated

See docs/DEPLOYMENT.md for detailed deployment instructions.

Supported Languages

Currently supported target languages:

English (en)
Arabic (ar)
German (de)
Russian (ru)

More languages can be added via configuration by updating the SUPPORTED_LANGUAGES environment variable.

Video Format Requirements

Supported video formats:

MP4
AVI
MOV
MKV

Video size and duration limits are configurable via environment variables:

MAX_VIDEO_SIZE_MB: Maximum video size in MB (default: 500)
MAX_VIDEO_DURATION: Maximum video duration in seconds (default: 600)

See docs/API.md for more details on video format requirements.

Development

Local Development

Install dependencies:

go mod download

Install Functions Framework:

go install github.com/GoogleCloudPlatform/functions-framework-go/cmd/functions-framework@latest

Run tests:

go test ./...

Run locally:

functions-framework --target=TranslateVideo --port=8080

The function will be available at http://localhost:8080. You can change the port by setting the PORT environment variable (defaults to 8080).

Test locally using the example clients:
- See examples/simple/main.go for basic usage
- See examples/advanced/main.go for advanced usage with retries and polling

Using Makefile:

make test          # Run tests
make test-coverage # Run tests with coverage
make lint          # Run linter
make build         # Build binary
make run-local     # Run locally with Functions Framework

See docs/DEVELOPMENT.md for more development details.

Project Structure

multilingual-video-processor/
├── cmd/cloudfunction/     # Cloud Function entry point
├── internal/              # Internal packages
│   ├── stt/              # Speech-to-Text module
│   ├── translation/      # Translation module
│   ├── tts/              # Text-to-Speech module
│   ├── video/            # Video processing
│   ├── storage/          # Storage abstraction
│   ├── config/           # Configuration
│   ├── validator/        # Input validation
│   ├── api/              # API handlers
│   └── utils/            # Utilities
├── pkg/models/           # Public models
├── test/                 # Tests
├── examples/             # Usage examples
└── docs/                 # Documentation

Examples

The repository includes example clients demonstrating how to use the API:

examples/simple/main.go: Basic usage example showing how to submit a translation job and poll for status
examples/advanced/main.go: Advanced usage with error handling, retry logic, exponential backoff, and webhook integration

Both examples demonstrate the complete workflow from job submission to completion.

Troubleshooting

Common Issues

Function fails to deploy:

Verify all required Google Cloud APIs are enabled
Check that the service account has necessary permissions
Review deployment logs: gcloud functions logs read multilingual-video-processor --gen2 --region=us-central1

API authentication errors:

Verify GOOGLE_APPLICATION_CREDENTIALS is set correctly or default credentials are configured
Check that the service account has the required IAM roles
Ensure GOOGLE_TRANSLATE_API_KEY is valid

FFmpeg not found:

Install FFmpeg using the platform-specific commands in Prerequisites
Verify installation: ffmpeg -version
Ensure FFmpeg is in your system PATH

Video processing failures:

Check video format is supported (MP4, AVI, MOV, MKV)
Verify video size is within limits (MAX_VIDEO_SIZE_MB)
Check video duration is within limits (MAX_VIDEO_DURATION)
Review function logs for detailed error messages

Timeout issues:

Increase REQUEST_TIMEOUT environment variable (default: 540 seconds)
Consider increasing Cloud Function timeout: --timeout=900s
Monitor Cloud Function metrics in Cloud Console

For more detailed troubleshooting, see docs/DEPLOYMENT.md.

Cost Considerations

This service uses several Google Cloud APIs that incur costs:

Cloud Speech-to-Text API: Charges per minute of audio processed
Cloud Translation API: Charges per character translated
Cloud Text-to-Speech API: Charges per character synthesized
Cloud Storage API: Charges for storage and network egress

Monitor your usage and set up billing alerts. See Google Cloud Pricing for current rates.

Rate Limits: The service includes rate limiting (configurable via RATE_LIMIT_RPM) to help manage costs and prevent API quota exhaustion.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Security

For security concerns, please see SECURITY.md.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Author

Yassine El Ouni (@sinouw)

Acknowledgments

Built with:

Google Cloud Speech-to-Text API
Google Cloud Translation API
Google Cloud Text-to-Speech API
FFmpeg

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
cmd/cloudfunction		cmd/cloudfunction
docs		docs
examples		examples
internal		internal
pkg/models		pkg/models
test		test
.env.example		.env.example
.gcloudignore		.gcloudignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
cloudbuild.yaml		cloudbuild.yaml
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

Multilingual Video Processor

Features

Architecture

Quick Start

Prerequisites

Installation

Configuration

Required Environment Variables

Optional Environment Variables

API Usage

Submit Translation Job

Check Job Status

Health Check

Webhook Configuration

Deployment

Deploy to Google Cloud Functions

Supported Languages

Video Format Requirements

Development

Local Development

Project Structure

Examples

Troubleshooting

Common Issues

Cost Considerations

Contributing

Security

License

Support

Author

Acknowledgments

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages