MediaFactory

🎬 MediaFactory

Professional Multimedia Processing Platform

Features • Installation • Quick Start • Configuration • Building • Troubleshooting

A professional multimedia processing platform for subtitle generation and video-related tasks.

Features

High-Quality Audio Extraction - 48kHz stereo with voice enhancement filters
Speech-to-Text - Faster Whisper (4-6x faster than OpenAI Whisper)
Translation - Local models (NLLB, M2M) or LLM APIs (OpenAI, GLM)
Subtitle Generation - Complete pipeline with automatic translation fallback
Batch Processing - Efficient multi-file processing
30+ Languages supported for transcription and translation
Unified Progress Tracking - Stage-aware progress updates with GUI bridge
Self-Contained Deployment - All data in installation directory for clean uninstall
Setup Wizard - First-run configuration with hardware detection

Competitive Comparison

Feature	MediaFactory	pyVideoTrans	VideoCaptioner	SubtitleEdit
Core Focus	Multimedia Platform	Video Translation	LLM Subtitle Assistant	Subtitle Editor
License	MIT	GPL-3.0	GPL-3.0	GPL/LGPL
Platform	Cross-platform	Cross-platform	Cross-platform	Windows

Speech Recognition	✅ Faster Whisper	✅ Multiple	✅ Multiple	✅ Whisper
VAD Filtering	✅ Silero VAD	✅	✅	❌
Translation	✅ Local + LLM	✅ Multiple	✅ LLM-focused	✅ Google/DeepL
Batch Translation	✅ Recursive Validation	✅	✅	❌

SRT Format	✅	✅	✅	✅
ASS Format	✅ 5 Style Templates	✅	✅ Multiple	✅ Full Editor
Bilingual Subtitles	✅ 4 Layouts	✅	✅	❌
Soft Subtitle Embed	✅ mov_text	✅	✅	✅
Hard Subtitle Burn	✅	✅	✅	✅

TTS Dubbing	❌	✅ Multiple TTS	❌	✅ Azure/ElevenLabs
Voice Cloning	❌	✅ F5-TTS	❌	❌
Subtitle Editing	❌	❌	❌	✅ Full Editor
Waveform Editor	❌	❌	❌	✅ Visual Sync
300+ Formats	❌	❌	❌	✅

GUI	✅ customtkinter	✅ PySide6	✅ Qt	✅ WinForms
CLI Mode	✅	❌	❌	✅
Plugin Architecture	✅ Core Feature	❌	❌	❌
Pipeline Orchestration	✅ Core Feature	✅	❌	❌
Event System	✅ EventBus	❌	❌	❌
Config System	✅ TOML + Pydantic	❌	JSON	XML
Batch Processing	✅	✅	✅	✅

Why MediaFactory?

Architecture Advantages:

Plugin System - Tool + PluginRegistry for extensibility
Pipeline Architecture - ProcessingStage pattern for composable workflows
Event System - EventBus for decoupled components
Type-Safe Config - TOML + Pydantic v2 with hot reload

Key Features:

🚀 Faster Whisper - 4-6x faster than OpenAI Whisper
🎯 VAD Filtering - Built-in Silero VAD reduces hallucinations
🌐 Unified LLM Backend - OpenAI-compatible API supports all major services
📝 Batch Translation - Recursive validation with auto-repair
🎨 ASS Styling - 5 preset templates + custom style files
🔀 Bilingual Support - 4 layout options for dual-language subtitles
📦 Self-Contained - All data in installation directory

What MediaFactory is NOT:

Not a full subtitle editor (use SubtitleEdit for manual editing)
Not a video dubbing tool (use pyVideoTrans for TTS/voice cloning)
Not an online service (fully local processing)

Installation

MediaFactory uses uv for dependency management, providing fast and modern Python environment management.

Requirements

Python: 3.10+ (3.12 recommended)
FFmpeg: Included via imageio-ffmpeg (no manual installation needed)
GPU (optional): NVIDIA GPU with CUDA support for acceleration

Manual Install with uv

1. Install uv

macOS:

brew install uv

Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows:

winget install astral-sh.uv

2. Clone and Setup

git clone https://github.com/Dragon/MediaFactory.git
cd MediaFactory

3. Check Your Hardware (Optional)

# Detect GPU and get recommended configuration
uv run python scripts/utils/check_gpu.py

4. Install PyTorch

CPU Version (All platforms):

uv pip install torch --index-url https://download.pytorch.org/whl/cpu

CUDA Versions (NVIDIA GPU only):

CUDA Version	Command
CUDA 11.8	`uv pip install torch --index-url https://download.pytorch.org/whl/cu118`
CUDA 12.1	`uv pip install torch --index-url https://download.pytorch.org/whl/cu121`
CUDA 12.4	`uv pip install torch --index-url https://download.pytorch.org/whl/cu124`

5. Install Dependencies

uv sync

6. (Optional) Install Development Tools

uv sync --group dev

7. (Optional for Developers) Install Pre-commit Hooks

If you plan to contribute code, install pre-commit hooks to automatically check code quality before commits:

# Install Git hooks for pre-commit
pre-commit install

# Run pre-commit manually on all files
pre-commit run --all-files

The project uses pre-commit for code quality checks:

Black - Code formatting
Flake8 - Code linting
Bandit - Security checks

Hardware Requirements

CPU Version (Minimum)

Component	Requirement
Memory	4GB RAM
Storage	2GB available space
Compatibility	All platforms (Windows/macOS/Linux)

GPU Version (Accelerated)

Component	Requirement
GPU	NVIDIA GPU with CUDA support
VRAM	4GB minimum (8GB+ recommended)
Driver	NVIDIA driver ≥ 510.0
Storage	5GB available space

Note: RTX 50-series (Blackwell architecture) requires PyTorch nightly builds with CUDA 12.8+ support.

Advanced Options

Install with extras

# CPU version
uv sync --extra cpu

# GPU version (default CUDA 12.4)
uv sync --extra gpu

# Specific CUDA version
uv sync --extra cuda121

# Complete installation (GPU + dev tools)
uv sync --extra all

Download Models (Required)

MediaFactory requires translation models (not bundled in the installation package). Download separately:

# Download NLLB-600M (recommended, ~2.3GB)
uv run python scripts/utils/download_model.py nllb-600m

# Or NLLB-3.3B (higher quality, ~13GB)
uv run python scripts/utils/download_model.py nllb-3.3b

# List available models
uv run python scripts/utils/download_model.py --list

For users in China (use mirror):

uv run python scripts/utils/download_model.py nllb-600m --source=https://hf-mirror.com

Alternatively: Run the GUI and use the Setup Wizard to download models automatically.

Verify Installation

# Check PyTorch installation
uv run python -c "import torch; print(f'PyTorch: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"

# Check Faster Whisper
uv run python -c "import faster_whisper; print('Faster Whisper installed successfully')"

Quick Start

GUI

# Method 1: Using console script
uv run mediafactory-gui

# Method 2: Using Python module
uv run python -m mediafactory.gui.main

# Method 3: Using Python API
uv run python -c "from mediafactory import launch_gui; launch_gui()"

Python API

from mediafactory import get_plugin_registry, register_builtin_tools

registry = get_plugin_registry()
register_builtin_tools()
tool = registry.get("subtitle_generator")

# Generate subtitles
result = tool.execute_simple(
    input_path="video.mp4",
    parameters={
        "source_language": "auto",
        "target_language": "zh",
        "use_llm": False,
    }
)

print(f"Output: {result.output_path}")

Using LLM API

result = tool.execute_simple(
    input_path="video.mp4",
    parameters={
        "source_language": "auto",
        "target_language": "zh",
        "use_llm": True,
        "llm_backend": "openai",
    }
)

Configuration

Edit config.toml:

[whisper]
beam_size = 5

[model]
local_model_path = "models"

[api_translation]
backend = "openai"

[openai]
api_key = "YOUR_API_KEY"
model = "gpt-4o-mini"

[llm_translation]
enable_batch = true
batch_size = 100

Supported LLM backends: OpenAI, GLM (智谱AI).

Building

MediaFactory supports multiple packaging methods, from standalone executables to full installers.

PyInstaller Build

Create standalone executables with all dependencies:

# Build for current platform
python scripts/build/build_all.py

# Build for specific platform
python scripts/build/build_all.py --platform macos
python scripts/build/build_all.py --platform windows

# Clean build artifacts
python scripts/build/build_all.py --clean

Output: dist/MediaFactory.exe (Windows) or dist/MediaFactory.app (macOS)

Size: ~200-500MB (without ML dependencies)

Platform Installers

macOS (.dmg disk image):

python scripts/build/macos/create_dmg.py

Output: dist/MediaFactory-3.2.0.dmg

Windows (.exe installer):

Requires: Inno Setup 6.0+

python scripts/build/windows/package_windows.py
# Or use Inno Setup directly
iscc scripts/build/windows/installer_windows.iss

Output: dist/MediaFactory-Setup-3.2.0.exe

Setup Wizard

When users run the GUI for the first time, the Setup Wizard starts automatically:

Features:

Fast installation (uv downloads dependencies 10-100x faster than pip)
Auto-detection of GPU and recommended PyTorch version
Mirror selection for users in China
User-friendly graphical wizard

Setup Steps:

Welcome page - Introduction
Hardware detection - Auto-detect NVIDIA GPU and CUDA version
Mirror selection - Choose download source (China mirror / official)
Config generation - Create user config from config.toml.example
Dependency installation - Install PyTorch and other dependencies
Model download - Download Whisper and translation models

Estimated time: 5-15 minutes (depending on network speed)

Build Configuration

Edit build scripts to customize:

scripts/pyinstaller/build_installer.py:

PRODUCT_NAME = "MediaFactory"
PRODUCT_VERSION = "3.2.0"
ENCRYPT_BYTECODE = True
COMPRESS_OUTPUT = True

scripts/build/windows/installer_windows.iss:

#define AppVersion "3.2.0"
#define AppPublisher "Your Name"

Release Checklist

Before release, ensure:

Update version numbers in all build scripts
Test installer on clean systems
Verify first-run wizard works correctly
Test GPU/CPU auto-detection
Verify model download functionality
Check config file generation

Note: Translation models (2GB+) are NOT bundled. Users must download separately via the setup wizard or manually.

Troubleshooting

Common Issues

Problem	Solution
FFmpeg not found	MediaFactory uses built-in imageio-ffmpeg, no manual installation needed
Out of memory (OOM)	Use smaller Whisper model (small instead of large-v3), or force CPU mode
Poor accuracy	Try `large-v3` model; ensure audio source is clear; check source language setting
Missing translation model	Run `uv run python scripts/utils/download_model.py nllb-600m`
macOS GPU not used	Faster Whisper doesn't support MPS; CPU is used automatically
API translation fails	Check API key in config.toml; verify network connection and quota
Progress stuck at 0%	Ensure v3.0+; check GUI callbacks; see log file for errors

Log Files

All logs are written to: mediafactory_YYYYMMDD_HHMMSS.log (in application root directory)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.claude		.claude
.github		.github
docs		docs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
BUILD.md		BUILD.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
README.txt		README.txt
README_zh.md		README_zh.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

DragonL641/VideoDub

Folders and files

Latest commit

History

Repository files navigation

MediaFactory

🎬 MediaFactory

Features

Competitive Comparison

Why MediaFactory?

Installation

Requirements

Manual Install with uv

1. Install uv

2. Clone and Setup

3. Check Your Hardware (Optional)

4. Install PyTorch

5. Install Dependencies

6. (Optional) Install Development Tools

7. (Optional for Developers) Install Pre-commit Hooks

Hardware Requirements

CPU Version (Minimum)

GPU Version (Accelerated)

Advanced Options

Install with extras

Download Models (Required)

Verify Installation

Quick Start

GUI

Python API

Using LLM API

Configuration

Building

PyInstaller Build

Platform Installers

Setup Wizard

Build Configuration

Release Checklist

Troubleshooting

Common Issues

Log Files

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages