CaptionFoundry

AI-Powered Dataset Management for LORA Training - A desktop application for preparing high-quality image datasets with AI-assisted captioning, designed specifically for LORA and fine-tuning workflows.

Caption editing interface with AI-generated descriptions, quality assessment, and batch processing capabilities

What is CaptionFoundry?

CaptionFoundry streamlines the tedious process of preparing image datasets for AI model training. Instead of manually captioning hundreds of images, you can:

Organize your images into datasets with drag-and-drop simplicity
Auto-caption entire datasets using local vision AI models
Review and edit captions with quality scoring and suggestions
Export perfectly formatted datasets ready for training

The entire workflow runs locally on your machine - no cloud services, no API costs, complete privacy.

Key Features

Folder Tracking - Track local image folders with drag-and-drop support
Thumbnail Browser - Fast thumbnail grid with WebP compression and lazy loading
Dataset Management - Organize images into named datasets with descriptions
Caption Sets - Multiple caption styles per dataset (booru tags, natural language, etc.)
AI Auto-Captioning - Generate captions using local Ollama or LM Studio vision models
Bulk Caption Editing - Apply prepend/append, find/replace (with regex), trim, case conversion, and pattern removal to all captions at once
Version History - Full version tracking with rollback for all caption changes (manual edits, bulk operations, AI generation)
Quality Scoring - Automatic quality assessment with detailed flags
Manual Editing - Click any image to edit its caption with real-time preview
Smart Export - Export with sequential numbering, format conversion, metadata stripping
Desktop App - Native file dialogs and true drag-and-drop via Electron
100% Non-Destructive - Your original images and captions are never modified, moved, or deleted

Requirements

Python 3.10+ - Download
Node.js 18+ - Download
Vision Model Backend (at least one):
- Ollama with a vision model
- LM Studio with a vision model loaded

New to vision models? See QUICKSTART.md for detailed setup instructions.

Installation

Windows

install.bat

Linux/macOS

chmod +x install.sh
./install.sh

The installer will:

Create a Python virtual environment (venv/)
Install Python dependencies from requirements.txt
Install Node.js dependencies (npm install)

Starting the Application

Windows

start.bat

Linux/macOS

./start.sh

This launches:

The FastAPI backend server (port 8675)
The Electron desktop application (connects to backend)

The app window will open automatically once the backend is ready.

Non-Destructive Design

CaptionFoundry never modifies your original files. All operations are 100% safe and non-destructive:

✅ Folder tracking only reads file metadata - never writes to your images or captions
✅ Thumbnails are generated in data/thumbnails/ - originals untouched
✅ Captions are stored in SQLite database - paired caption files (.txt) are only read, never modified
✅ Exports copy files to the destination - source files remain pristine
✅ Removing images from datasets only affects the database - files stay in place
✅ Deleting folders/datasets removes tracking records - your actual files are safe

Feel confident experimenting - your source files are always protected.

Basic Workflow

Add Folders - Drag folders onto the app or use the folder picker
Browse Images - Click a folder to see thumbnails
Create Dataset - Select images and click "Create Dataset"
Add Caption Set - Choose a captioning style (booru, natural, descriptive)
Auto-Caption - Click "Auto-Caption All" to generate captions with AI
Bulk Edit (Optional) - Use bulk operations to refine all captions at once (prepend/append text, find/replace with regex, case conversion, etc.)
Review & Edit - Click any image to review/edit its caption (full version history available with rollback)
Export - Export the completed dataset with sequential naming

Bulk Caption Editing

Apply transformations to all captions in a caption set simultaneously:

Prepend/Append - Add text to the beginning or end of all captions
Find & Replace - Replace text across all captions with optional regex support
Trim Whitespace - Clean up extra spaces and newlines
Case Conversion - Change to uppercase, lowercase, title case, or sentence case
Remove Patterns - Delete matching text patterns (supports regex)

All operations provide a preview before applying, showing how many captions will be affected and sample before/after text.

Version History & Rollback

Every caption change is tracked with complete version history:

Automatic Versioning - All edits (manual, bulk, AI-generated) create version snapshots
Version Viewer - Click the history button to see all previous versions of any caption
Individual Rollback - Restore any caption to a previous version
Bulk Rollback - Undo an entire bulk edit operation across all affected captions

Version history includes the operation type, timestamp, and description for easy identification.

Configuration

Settings are stored in config/settings.yaml:

vision:
  backend: ollama              # or "lmstudio"
  ollama_url: http://localhost:11434
  lmstudio_url: http://localhost:1234
  default_model: qwen/qwen3-vl-4b
  max_tokens: 8192             # Important for thinking models
  timeout_seconds: 120

server:
  host: 127.0.0.1
  port: 8675

thumbnails:
  max_size: 256
  quality: 85
  format: webp

export:
  default_format: jpeg
  default_quality: 95
  default_padding: 6           # File numbering padding (000001.jpg)

Architecture

CaptionFoundry uses a hybrid architecture optimized for desktop use:

Frontend: HTML/CSS/JavaScript with Bootstrap 5
Desktop Shell: Electron (provides native file dialogs, drag-drop paths)
Backend: Python FastAPI (manages data, proxies vision AI requests)
Database: SQLite with SQLAlchemy 2.x ORM
Vision AI: Ollama or LM Studio (local, no cloud)

The Electron shell spawns the Python backend as a child process and loads the frontend from the backend server.

Project Structure

CaptionFoundry/
 electron/           # Electron main process
    main.js         # App entry, spawns Python backend
    preload.js      # IPC bridge for native features
 backend/            # FastAPI backend
    api/            # REST API routers
    services/       # Business logic
    models.py       # SQLAlchemy ORM models
    main.py         # FastAPI application
 frontend/           # Web frontend
    css/styles.css  # Custom styles
    js/             # JavaScript modules
    index.html      # Main HTML
 config/             # Configuration files
    settings.yaml   # App settings
 data/               # Data storage
    database.db     # SQLite database
    thumbnails/     # Thumbnail cache
    exports/        # Export staging area
    logs/           # Application logs
 install.bat/.sh     # Installation scripts
 start.bat/.sh       # Startup scripts
 package.json        # Node.js dependencies
 requirements.txt    # Python dependencies

Development

Run with DevTools

npm run dev

Run Backend Only

# Activate venv first
# Windows: venv\Scripts\activate
# Linux/Mac: source venv/bin/activate

python -m uvicorn backend.main:app --reload --port 8675

API Documentation

Once running, visit http://localhost:8675/docs for interactive Swagger documentation.

Troubleshooting

"No vision models available"

Make sure Ollama or LM Studio is running with a vision model loaded. See QUICKSTART.md for setup instructions.

Captions are cut off or incomplete

Increase max_tokens in config/settings.yaml. Thinking models (like Qwen3-VL) need higher token limits (8192+) to complete their reasoning.

Thumbnails not loading

Check that the data/thumbnails/ directory exists and is writable. Try restarting the app.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
alembic		alembic
backend		backend
config		config
documentation		documentation
electron		electron
frontend		frontend
.gitignore		.gitignore
LICENSE.md		LICENSE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
alembic.ini		alembic.ini
install.bat		install.bat
install.sh		install.sh
package.json		package.json
requirements.txt		requirements.txt
start.bat		start.bat
start.sh		start.sh
update.bat		update.bat
update.sh		update.sh

Folders and files

Latest commit

History

Repository files navigation

CaptionFoundry

What is CaptionFoundry?

Key Features

Requirements

Installation

Windows

Linux/macOS

Starting the Application

Windows

Linux/macOS

Non-Destructive Design

Basic Workflow

Bulk Caption Editing

Version History & Rollback

Configuration

Architecture

Project Structure

Development

Run with DevTools

Run Backend Only

API Documentation

Troubleshooting

"No vision models available"

Captions are cut off or incomplete

Thumbnails not loading

Export fails

License

See Also

Wanna be nice?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages