AI-Powered Dataset Management for LORA Training - A desktop application for preparing high-quality image datasets with AI-assisted captioning, designed specifically for LORA and fine-tuning workflows.
Caption editing interface with AI-generated descriptions, quality assessment, and batch processing capabilities
CaptionFoundry streamlines the tedious process of preparing image datasets for AI model training. Instead of manually captioning hundreds of images, you can:
- Organize your images into datasets with drag-and-drop simplicity
- Auto-caption entire datasets using local vision AI models
- Review and edit captions with quality scoring and suggestions
- Export perfectly formatted datasets ready for training
The entire workflow runs locally on your machine - no cloud services, no API costs, complete privacy.
- Folder Tracking - Track local image folders with drag-and-drop support
- Thumbnail Browser - Fast thumbnail grid with WebP compression and lazy loading
- Dataset Management - Organize images into named datasets with descriptions
- Caption Sets - Multiple caption styles per dataset (booru tags, natural language, etc.)
- AI Auto-Captioning - Generate captions using local Ollama or LM Studio vision models
- Bulk Caption Editing - Apply prepend/append, find/replace (with regex), trim, case conversion, and pattern removal to all captions at once
- Version History - Full version tracking with rollback for all caption changes (manual edits, bulk operations, AI generation)
- Quality Scoring - Automatic quality assessment with detailed flags
- Manual Editing - Click any image to edit its caption with real-time preview
- Smart Export - Export with sequential numbering, format conversion, metadata stripping
- Desktop App - Native file dialogs and true drag-and-drop via Electron
- 100% Non-Destructive - Your original images and captions are never modified, moved, or deleted
New to vision models? See QUICKSTART.md for detailed setup instructions.
install.batchmod +x install.sh
./install.shThe installer will:
- Create a Python virtual environment (venv/)
- Install Python dependencies from requirements.txt
- Install Node.js dependencies (npm install)
start.bat./start.shThis launches:
- The FastAPI backend server (port 8675)
- The Electron desktop application (connects to backend)
The app window will open automatically once the backend is ready.
CaptionFoundry never modifies your original files. All operations are 100% safe and non-destructive:
- ✅ Folder tracking only reads file metadata - never writes to your images or captions
- ✅ Thumbnails are generated in
data/thumbnails/- originals untouched - ✅ Captions are stored in SQLite database - paired caption files (.txt) are only read, never modified
- ✅ Exports copy files to the destination - source files remain pristine
- ✅ Removing images from datasets only affects the database - files stay in place
- ✅ Deleting folders/datasets removes tracking records - your actual files are safe
Feel confident experimenting - your source files are always protected.
- Add Folders - Drag folders onto the app or use the folder picker
- Browse Images - Click a folder to see thumbnails
- Create Dataset - Select images and click "Create Dataset"
- Add Caption Set - Choose a captioning style (booru, natural, descriptive)
- Auto-Caption - Click "Auto-Caption All" to generate captions with AI
- Bulk Edit (Optional) - Use bulk operations to refine all captions at once (prepend/append text, find/replace with regex, case conversion, etc.)
- Review & Edit - Click any image to review/edit its caption (full version history available with rollback)
- Export - Export the completed dataset with sequential naming
Apply transformations to all captions in a caption set simultaneously:
- Prepend/Append - Add text to the beginning or end of all captions
- Find & Replace - Replace text across all captions with optional regex support
- Trim Whitespace - Clean up extra spaces and newlines
- Case Conversion - Change to uppercase, lowercase, title case, or sentence case
- Remove Patterns - Delete matching text patterns (supports regex)
All operations provide a preview before applying, showing how many captions will be affected and sample before/after text.
Every caption change is tracked with complete version history:
- Automatic Versioning - All edits (manual, bulk, AI-generated) create version snapshots
- Version Viewer - Click the history button to see all previous versions of any caption
- Individual Rollback - Restore any caption to a previous version
- Bulk Rollback - Undo an entire bulk edit operation across all affected captions
Version history includes the operation type, timestamp, and description for easy identification.
Settings are stored in config/settings.yaml:
vision:
backend: ollama # or "lmstudio"
ollama_url: http://localhost:11434
lmstudio_url: http://localhost:1234
default_model: qwen/qwen3-vl-4b
max_tokens: 8192 # Important for thinking models
timeout_seconds: 120
server:
host: 127.0.0.1
port: 8675
thumbnails:
max_size: 256
quality: 85
format: webp
export:
default_format: jpeg
default_quality: 95
default_padding: 6 # File numbering padding (000001.jpg)CaptionFoundry uses a hybrid architecture optimized for desktop use:
- Frontend: HTML/CSS/JavaScript with Bootstrap 5
- Desktop Shell: Electron (provides native file dialogs, drag-drop paths)
- Backend: Python FastAPI (manages data, proxies vision AI requests)
- Database: SQLite with SQLAlchemy 2.x ORM
- Vision AI: Ollama or LM Studio (local, no cloud)
The Electron shell spawns the Python backend as a child process and loads the frontend from the backend server.
CaptionFoundry/
electron/ # Electron main process
main.js # App entry, spawns Python backend
preload.js # IPC bridge for native features
backend/ # FastAPI backend
api/ # REST API routers
services/ # Business logic
models.py # SQLAlchemy ORM models
main.py # FastAPI application
frontend/ # Web frontend
css/styles.css # Custom styles
js/ # JavaScript modules
index.html # Main HTML
config/ # Configuration files
settings.yaml # App settings
data/ # Data storage
database.db # SQLite database
thumbnails/ # Thumbnail cache
exports/ # Export staging area
logs/ # Application logs
install.bat/.sh # Installation scripts
start.bat/.sh # Startup scripts
package.json # Node.js dependencies
requirements.txt # Python dependencies
npm run dev# Activate venv first
# Windows: venv\Scripts\activate
# Linux/Mac: source venv/bin/activate
python -m uvicorn backend.main:app --reload --port 8675Once running, visit http://localhost:8675/docs for interactive Swagger documentation.
Make sure Ollama or LM Studio is running with a vision model loaded. See QUICKSTART.md for setup instructions.
Increase max_tokens in config/settings.yaml. Thinking models (like Qwen3-VL) need higher token limits (8192+) to complete their reasoning.
Check that the data/thumbnails/ directory exists and is writable. Try restarting the app.
Ensure the export destination folder exists and you have write permissions.
Apache 2.o - See LICENSE file for details.
- QUICKSTART.md - Detailed setup guide with Ollama/LM Studio instructions
- API Documentation - Interactive API reference (when running)