A standalone, open-source, cloud-function-ready service that provides video translation functionality. The service takes video files as input and generates translated versions in multiple target languages using Speech-to-Text (STT), Translation, and Text-to-Speech (TTS) services.
- Video Translation: Translate video audio to multiple target languages
- Speech-to-Text: Transcribe audio from videos using Google Cloud Speech-to-Text
- Multi-language Support: Translate to multiple languages concurrently
- Text-to-Speech: Generate natural-sounding speech in target languages
- Audio Sync: Automatically sync translated audio with original video
- Cloud Function Ready: Deploy as Google Cloud Function
- Secure: Input validation, rate limiting, and secure credential handling
- Observable: Structured logging and progress tracking
STT → Translation → TTS → Audio Sync → Output
- Speech-to-Text: Extract audio from video and transcribe to text
- Translation: Translate transcribed text to target languages
- Text-to-Speech: Generate audio from translated text
- Audio Sync: Replace audio track in video with translated audio
- Output: Upload translated videos to cloud storage
- Go 1.23 or later
- Google Cloud Project with the following APIs enabled:
- Cloud Speech-to-Text API
- Cloud Translation API
- Cloud Text-to-Speech API
- Cloud Storage API
- See docs/DEPLOYMENT.md for detailed API enablement instructions
- FFmpeg installed (for video processing):
- macOS:
brew install ffmpeg - Linux:
apt-get install ffmpegoryum install ffmpeg - Windows: Download from ffmpeg.org
- macOS:
- Google Cloud credentials: Service account JSON with required permissions:
roles/storage.objectAdminfor GCS operationsroles/speech.clientfor Speech-to-Text APIroles/cloudtranslate.userfor Translation APIroles/cloudtts.userfor Text-to-Speech API
- Clone the repository:
git clone https://github.com/sinouw/multilingual-video-processor.git
cd multilingual-video-processor- Install dependencies:
go mod download- Configure environment variables:
cp .env.example .env
# Edit .env with your configuration- Set up Google Cloud credentials:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.jsonAll configuration is done via environment variables. See .env.example for a complete template.
GCS_BUCKET_OUTPUT: Output bucket for translated videos (required)GOOGLE_TRANSLATE_API_KEY: Google Translation API key (required)
GOOGLE_APPLICATION_CREDENTIALS: Path to service account JSON (optional, can use default credentials)GCS_BUCKET_INPUT: Input bucket for GCS URLs (optional)SUPPORTED_LANGUAGES: Comma-separated list of supported languages (default: "en,ar,de,ru")SOURCE_LANGUAGE: Default source language (optional, auto-detect if empty)MAX_VIDEO_DURATION: Maximum video duration in seconds (default: 600)MAX_VIDEO_SIZE_MB: Maximum video size in MB (default: 500)MAX_CONCURRENT_JOBS: Maximum concurrent jobs (default: 10)MAX_CONCURRENT_TRANSLATIONS: Maximum concurrent translations per job (default: 3)REQUEST_TIMEOUT: Request timeout in seconds (default: 540)LOG_LEVEL: Logging level - debug, info, warn, error (default: "info")API_VERSION: API version (default: "v1")ENABLE_HEALTH_CHECK: Enable health check endpoints (default: "true")RATE_LIMIT_RPM: Rate limit requests per minute (default: 60)WEBHOOK_URL: Webhook URL for job completion notifications (optional)CORS_ORIGINS: Comma-separated CORS origins (default: "*")JOB_TTL: Job time-to-live duration (default: "24h")MAX_REQUEST_BODY_SIZE_BYTES: Maximum request body size in bytes (default: 1048576)
curl -X POST https://your-function-url/v1/translate \
-H "Content-Type: application/json" \
-d '{
"videoUrl": "gs://bucket/path/to/video.mp4",
"targetLanguages": ["en", "ar", "de"],
"sourceLanguage": "fr"
}'curl https://your-function-url/v1/status/{jobId}Example Response:
{
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"results": {
"en": {
"status": "completed",
"videoUrl": "gs://bucket/translations/job-id/en.mp4",
"translatedText": "Hello, this is the translated text.",
"progress": 100,
"processedAt": "2026-01-19T12:00:00Z"
},
"ar": {
"status": "completed",
"videoUrl": "gs://bucket/translations/job-id/ar.mp4",
"translatedText": "مرحبا، هذا هو النص المترجم.",
"progress": 100,
"processedAt": "2026-01-19T12:00:00Z"
}
}
}curl https://your-function-url/healthConfigure webhooks to receive notifications when translation jobs complete or fail. Set the WEBHOOK_URL environment variable to your webhook endpoint.
Webhook Payload:
{
"event": "job.completed",
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"results": {
"en": {
"status": "completed",
"videoUrl": "gs://bucket/translations/job-id/en.mp4",
"translatedText": "Hello, this is the translated text.",
"progress": 100,
"processedAt": "2026-01-19T12:00:00Z"
}
},
"timestamp": "2026-01-19T12:00:00Z"
}Event Types:
job.processing: Job started processingjob.completed: Job completed successfullyjob.failed: Job failed (includes error message in payload)
Webhooks are triggered asynchronously and include retry logic for failed deliveries.
- Build and deploy:
./deploy.sh- Or use gcloud directly:
gcloud functions deploy multilingual-video-processor \
--gen2 \
--runtime go123 \
--trigger-http \
--memory 4096MB \
--timeout 540s \
--allow-unauthenticatedSee docs/DEPLOYMENT.md for detailed deployment instructions.
Currently supported target languages:
- English (en)
- Arabic (ar)
- German (de)
- Russian (ru)
More languages can be added via configuration by updating the SUPPORTED_LANGUAGES environment variable.
Supported video formats:
- MP4
- AVI
- MOV
- MKV
Video size and duration limits are configurable via environment variables:
MAX_VIDEO_SIZE_MB: Maximum video size in MB (default: 500)MAX_VIDEO_DURATION: Maximum video duration in seconds (default: 600)
See docs/API.md for more details on video format requirements.
- Install dependencies:
go mod download- Install Functions Framework:
go install github.com/GoogleCloudPlatform/functions-framework-go/cmd/functions-framework@latest- Run tests:
go test ./...- Run locally:
functions-framework --target=TranslateVideo --port=8080The function will be available at http://localhost:8080. You can change the port by setting the PORT environment variable (defaults to 8080).
- Test locally using the example clients:
- See examples/simple/main.go for basic usage
- See examples/advanced/main.go for advanced usage with retries and polling
Using Makefile:
make test # Run tests
make test-coverage # Run tests with coverage
make lint # Run linter
make build # Build binary
make run-local # Run locally with Functions FrameworkSee docs/DEVELOPMENT.md for more development details.
multilingual-video-processor/
├── cmd/cloudfunction/ # Cloud Function entry point
├── internal/ # Internal packages
│ ├── stt/ # Speech-to-Text module
│ ├── translation/ # Translation module
│ ├── tts/ # Text-to-Speech module
│ ├── video/ # Video processing
│ ├── storage/ # Storage abstraction
│ ├── config/ # Configuration
│ ├── validator/ # Input validation
│ ├── api/ # API handlers
│ └── utils/ # Utilities
├── pkg/models/ # Public models
├── test/ # Tests
├── examples/ # Usage examples
└── docs/ # Documentation
The repository includes example clients demonstrating how to use the API:
- examples/simple/main.go: Basic usage example showing how to submit a translation job and poll for status
- examples/advanced/main.go: Advanced usage with error handling, retry logic, exponential backoff, and webhook integration
Both examples demonstrate the complete workflow from job submission to completion.
Function fails to deploy:
- Verify all required Google Cloud APIs are enabled
- Check that the service account has necessary permissions
- Review deployment logs:
gcloud functions logs read multilingual-video-processor --gen2 --region=us-central1
API authentication errors:
- Verify
GOOGLE_APPLICATION_CREDENTIALSis set correctly or default credentials are configured - Check that the service account has the required IAM roles
- Ensure
GOOGLE_TRANSLATE_API_KEYis valid
FFmpeg not found:
- Install FFmpeg using the platform-specific commands in Prerequisites
- Verify installation:
ffmpeg -version - Ensure FFmpeg is in your system PATH
Video processing failures:
- Check video format is supported (MP4, AVI, MOV, MKV)
- Verify video size is within limits (
MAX_VIDEO_SIZE_MB) - Check video duration is within limits (
MAX_VIDEO_DURATION) - Review function logs for detailed error messages
Timeout issues:
- Increase
REQUEST_TIMEOUTenvironment variable (default: 540 seconds) - Consider increasing Cloud Function timeout:
--timeout=900s - Monitor Cloud Function metrics in Cloud Console
For more detailed troubleshooting, see docs/DEPLOYMENT.md.
This service uses several Google Cloud APIs that incur costs:
- Cloud Speech-to-Text API: Charges per minute of audio processed
- Cloud Translation API: Charges per character translated
- Cloud Text-to-Speech API: Charges per character synthesized
- Cloud Storage API: Charges for storage and network egress
Monitor your usage and set up billing alerts. See Google Cloud Pricing for current rates.
Rate Limits: The service includes rate limiting (configurable via RATE_LIMIT_RPM) to help manage costs and prevent API quota exhaustion.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
For security concerns, please see SECURITY.md.
This project is licensed under the MIT License - see the LICENSE file for details.
Yassine El Ouni (@sinouw)
Built with:
- Google Cloud Speech-to-Text API
- Google Cloud Translation API
- Google Cloud Text-to-Speech API
- FFmpeg