SnapSort helps you organize and extract personal photos from large, mixed-content drives with intelligent filtering and robust processing. It automatically sorts images by date while avoiding system files and application images, making it perfect for recovering memories from old hard drives or organizing large photo collections.
SnapSort is available as both a Python CLI tool and a full-stack web GUI โ run it locally, in Docker, or on Unraid.
SnapSort automatically organizes your images into a clean folder structure based on when they were taken. It reads EXIF data from your photos to determine the actual capture date, falling back to file modification dates when EXIF data isn't available. Your photos are sorted into year/month/day folders, making it easy to find specific memories.
The filtering system ensures you only get actual photos, not system icons, application images or thumbnails. It automatically skips system folders and filters out small images that are likely thumbnails or icons. SnapSort defaults to filtering out images less than 600ร600 pixels or 50 KB, but allows you to flexibly choose these values. If enabled, SnapSort will save a CSV file that allows you to manually review its decision on a file-by-file basis, letting you force-process images that were misclassified.
Supported image formats: SnapSort supports common photo and RAW formats out of the box. You can add additional extensions, but note that dimension-based filtering and enhanced deduplication (resolution, EXIF date matching) require Pillow support for the format. Formats that Pillow cannot open will still be processed using hash-based and metadata-based (file size, filename, mtime) duplicate comparison. Default formats:
.jpg,.jpeg,.png,.cr2,.nef,.arw,.tif,.tiff,.rw2,.orf,.dng,.heic,.heif
For EXIF extraction, it uses piexif/Pillow for JPEG/TIFF files and falls back to exiftool for other formats.
SnapSort includes a multi-strategy deduplication system:
- Hash-based: SHA256 (full or fast partial-hash sampling begin+middle+end) to detect exact duplicates โ a single implementation shared across seeding, dedup scoring, and file-exists checks
- Metadata-based: Compares dimensions, date taken, and file size for near-duplicate detection
- Configurable thresholds: Strict threshold (auto-skip) and log threshold (flag for review)
- Destination seeding: Pre-indexes existing files in the destination to avoid re-copying
- Actionable resolutions (Web GUI): Duplicate pairs can be resolved directly from the UI โ Skip leaves the destination as-is, Overwrite copies the source over the matched destination file, Keep Both copies the source alongside the match with a unique filename. All operations write to the destination only; source files are never modified
Stay informed about your jobs and drives with push notifications:
- ntfy.sh: Send notifications to your phone or desktop via any ntfy-compatible app โ supports default
ntfy.shor self-hosted servers, with optional token or basic authentication - Browser notifications: Native browser push notifications via the Notification API, powered by SSE from the backend
- Configurable events: Choose which events trigger notifications โ job start, completion, errors, progress updates (on a configurable interval), drive attach, and drive lost
- Drive monitoring: Automatic polling detects when drives are attached, safely ejected, or unexpectedly lost, and fires corresponding notifications
Real-time feedback during operation:
- CLI: Animated spinner during initialization (destination indexing and source scanning), inline progress with files processed, copied, skipped, errors, and ETA, plus a comprehensive log file recording every action
- Web GUI: Live progress bars per job, real-time file counter updates (polled every 500 ms), and an always-visible sidebar indicator showing the currently processing file
A full-stack web interface for managing photo organization visually:
- Dashboard โ overview stats across all jobs with configurable date formatting
- Jobs โ create, start, monitor, and delete organization runs with live progress bars; name your jobs for easy identification; choose a performance profile per job with a settings summary preview
SnapSort can benchmark the actual drives you'll use and recommend the best performance profile:
- Select your source and destination folders using the file picker (with drive detection)
- Automated testing measures sequential read, sequential write, and copy throughput on both volumes, plus single-thread and multi-thread hash speed using
ThreadPoolExecutor - Bottleneck analysis identifies whether the source volume, destination volume, or CPU hashing is the limiting factor โ shown with a visual bar chart
- Profile recommendation suggests the best built-in profile based on the slowest storage in the chain โ because the bottleneck sets the pace
- One-click apply writes the recommended profile's settings as your global defaults
Same source/destination path is blocked at both the frontend and backend.
SnapSort ships with 7 built-in performance profiles tuned for different storage types:
| Profile | Workers | Batch | Hash KB | Copies | Threading | I/O Mode |
|---|---|---|---|---|---|---|
| NVMe Gen4 SSD | 16 | 100 | 16384 | 8 | Multi | Parallel |
| NVMe Gen3 SSD | 12 | 75 | 8192 | 6 | Multi | Parallel |
| SATA SSD | 8 | 50 | 4096 | 4 | Multi | Parallel |
| 7200 RPM HDD | 1 | 10 | 4096 | 1 | Single | Sequential |
| 5400 RPM HDD | 1 | 5 | 2048 | 1 | Single | Sequential |
| USB External | 2 | 15 | 2048 | 1 | Single | Sequential |
| Default | 4 | 25 | 4096 | 2 | Multi | Parallel |
Profiles can be applied globally from Settings, per-job during job creation, or automatically from benchmark results. Custom profiles can be created, edited, and deleted directly from the Settings page.
Note: Performance profiles, storage benchmarks, and drive detection are Web GUI features. The CLI uses direct configuration constants. Adding
--profileand--benchmarkCLI flags is on the roadmap.
SnapSort will never write to, modify, rename, move, or delete any file or directory in your source locations. Source drives and directories are treated as strictly read-only at every layer of the application:
- Python engine: Every copy operation verifies the destination is not inside the source directory before writing. A
RuntimeErroris raised if violated. - Node.js backend: A dedicated
sourceGuardmodule checks every destructive file operation against all known source directories. Job creation is rejected if the source and destination directories overlap in any direction. - API layer: No endpoint exists that can modify or delete source files. The only file operations SnapSort performs on disk are writing to the destination directory and cleaning up its own output. Duplicate resolutions (Overwrite, Keep Both) are guarded by
assertNotInSource()before any write. - Overlap protection: Job creation is rejected if source and destination paths overlap in any direction (same directory, destination inside source, or source inside destination). Enforced at the Python engine, Node.js backend, and React frontend.
This is SnapSort's #1 invariant โ enforced by defense-in-depth across the full stack.
SnapSort ships as a unified single container:
- Multi-stage Dockerfile: Frontend build โ backend dependencies โ runtime with Node.js + Python
- Single service on port 8080 (configurable)
- docker-compose.yml for easy deployment
- Unraid XML template for native Docker tab integration with configurable ports, photo share path, and appdata path
SnapSort/
โโโ photo_organizer.py # Python engine (CLI + JSON mode)
โโโ photo_utils.py # Image processing, EXIF, copy logic
โโโ dedup_utils.py # Deduplication index & matching
โโโ path_utils.py # Destination path construction
โโโ logging_utils.py # CSV/log utilities (CLI)
โโโ VERSION # Single source of truth for version (read by Python, Node.js, and frontend)
โโโ backend/ # Node.js Express API
โ โโโ src/
โ โโโ index.js # Express server (serves API + SPA, SSE log stream)
โ โโโ sourceGuard.js # Read-only enforcement for source paths
โ โโโ db/ # SQLite schema + DAO (incl. performance_profiles table)
โ โโโ routes/ # REST endpoints (jobs, photos, duplicates, benchmarks, profiles, settings, etc.)
โ โโโ services/ # Python bridge, ntfy.sh service, browser notify service, CPU monitor, drive monitor, log buffer
โโโ frontend/ # React 18 + Vite SPA
โ โโโ src/
โ โโโ pages/ # Dashboard, Jobs, Photos (incl. Duplicates tab), Benchmarks, Settings, Diagnostics
โ โโโ components/ # Modal, DataTable, FilePicker, Badge, StatCard, SparklineCard, PhotoDetailModal, Sidebar, etc.
โ โโโ hooks/ # useNotifications (browser push via SSE)
โ โโโ SettingsContext.jsx # Global theme, date/time format provider
โ โโโ dateFormat.js # Shared date/time formatting utilities
โ โโโ index.css # Custom CSS with dark & light themes
โโโ scripts/ # Dev utilities (DB seeding, cleanup)
โโโ Dockerfile # Unified multi-stage build
โโโ docker-compose.yml # Single-service deployment
โโโ unraid/ # Unraid Docker template
โโโ generate_test_data.py # Test dataset generator
โโโ package.json # Root dev script (concurrently)
Tech Stack:
- Backend: Node.js, Express 4, better-sqlite3 (WAL mode), uuid, cors, exifr
- Frontend: React 18, Vite 6, React Router 6, Lucide React icons, custom CSS with dark & light themes
- Engine: Python 3.9+, Pillow, piexif
- Deployment: Docker (Alpine-based), Unraid XML template
The CLI generates detailed CSV logs that serve multiple purposes beyond simple record-keeping. These files contain complete configuration information embedded within them, making each log self-contained and portable. Config and filtering heuristics are saved in the CSV as a single cell in the second row, making it robust and spreadsheet-friendly.
When running in manual or resume mode, the script automatically reads all config values from the CSV and applies them, ensuring consistency across sessions.
The Web GUI does not use CSV logging โ all photo metadata, skip reasons, and duplicate information are stored in the SQLite database and accessible through the Photos page.
The CLI writes a detailed photo_organizer.log file recording every action with timestamps. The log file name can be customised at startup. In JSON mode (used by the Web GUI), file logging is suppressed โ all events are streamed via the JSON protocol to the Node.js backend instead.
SnapSort offers three distinct operation modes:
- Normal Copy: Scans and processes all files according to current heuristics
- Manual Copy (CLI only): Only copies files explicitly marked in the CSV (
copy_anyway == yes), with destination paths reconstructed automatically - Resume Copy (CLI only): Continues a previous operation by skipping files already listed in the CSV, reading and applying all config and heuristics from the existing file
The Web GUI currently operates in Normal mode only. The equivalent of Manual Copy is available via the "Copy Anyway" override feature on the Photos page, where you can select skipped photos and force-copy them. Resume and Manual CSV-based modes are CLI-specific workflows.
| Capability | CLI | Web GUI |
|---|---|---|
| Auto-skip strict duplicates | โ | โ |
| Log potential duplicates | โ (log file + CSV) | โ (database) |
| Interactive duplicate resolution | โ | โ (Skip / Overwrite / Keep Both per pair) |
| Resolutions apply to files | โ | โ (copies to destination, never modifies source) |
| Bulk duplicate resolution | โ | โ |
| Override skipped files | โ
(CSV copy_anyway column) |
โ (Photos page override action) |
| Per-photo file hash stored | โ | โ (SHA-256 partial or full, stored in DB) |
Run this single command to install and start SnapSort โ it handles everything automatically:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Rediwed/SnapSort/main/scripts/install.sh)"The installer will automatically:
- Install Git, Docker, and Docker Compose if they're not already present (macOS & Linux)
- Clone SnapSort to
~/SnapSort(or pull updates if it already exists) - Ask for your photo folder path and preferred port
- Build and start the container
No prerequisites required โ just a terminal and an internet connection.
Docker packages everything SnapSort needs into a single container โ no manual dependency management required.
-
Install Docker Desktop (includes Docker Compose):
- macOS / Windows: Download from docker.com/products/docker-desktop
- Linux: Follow the official install guide, then install Docker Compose
-
Clone the repository:
git clone https://github.com/Rediwed/SnapSort.git cd SnapSort
Before starting, edit docker-compose.yml to mount the drives/folders you want SnapSort to access. The default looks like this:
volumes:
- db-data:/app/backend/data # persist SQLite database
- /mnt/user/photos:/mnt/photos # โ change this to your photo folderReplace /mnt/user/photos with the actual path on your system, e.g.:
- macOS:
/Users/you/Pictures:/mnt/photos - Windows (WSL):
/mnt/c/Users/you/Pictures:/mnt/photos - Linux:
/home/you/Pictures:/mnt/photos
You can add multiple volume mounts if you have photos on different drives:
volumes:
- db-data:/app/backend/data
- /Volumes/ExternalHDD:/mnt/external:ro # read-only source
- /Users/you/Pictures:/mnt/photos # destinationdocker compose up -dThis builds the container on first run (takes a few minutes) and starts it in the background. Open http://localhost:8080 in your browser.
| Command | What it does |
|---|---|
docker compose up -d |
Start SnapSort in the background |
docker compose logs -f |
Follow live logs |
docker compose down |
Stop SnapSort |
docker compose up -d --build |
Rebuild after pulling updates |
Run the backend and frontend directly on your machine โ useful if you want to contribute or customize.
| Dependency | Version | Install |
|---|---|---|
| Node.js | 18+ | nodejs.org or brew install node |
| npm | (bundled with Node.js) | โ |
| Python | 3.9+ | python.org or brew install python |
| pip | (bundled with Python) | โ |
| exiftool | latest (optional) | exiftool.org or brew install exiftool |
# Clone the repo
git clone https://github.com/Rediwed/SnapSort.git
cd SnapSort
# Install all dependencies (root + backend + frontend)
npm install && npm install --prefix backend && npm install --prefix frontend
pip install -r requirements.txt
# Start both backend and frontend in dev mode
npm run devOpen http://localhost:5173 โ the backend runs on port 4000, and the frontend dev server on 5173 with an API proxy.
If you just want the Python photo organizer without the web GUI:
# Clone and install Python dependencies
git clone https://github.com/Rediwed/SnapSort.git
cd SnapSort
pip install -r requirements.txt
# Run the organizer
python3 photo_organizer.pyChoose your operation mode, set source/destination directories, and SnapSort will organize your photos from the terminal.
SnapSort includes a native Unraid Docker template:
- Copy
unraid/snapsort.xmlto/boot/config/plugins/dockerMan/templates-user/ - Go to Docker โ Add Container โ select the SnapSort template
- Configure your photo share path (e.g.
/mnt/user/photos) and port (default: 8080) - Click Apply โ Unraid will pull/build the container and start it
Generate realistic test datasets to validate SnapSort's behavior without using your real photos:
python3 generate_test_data.pyThis creates 5 source datasets simulating real-world scenarios (camera SD card, downloads folder, phone backup, old desktop, external HDD) plus edge cases (corrupt files, zero-byte, wrong extensions, borderline dimensions). Then use the ๐งช Load Test Data button in the web GUI to run them all.
- Filtering heuristics: Adjust minimum size, resolution, or system folders via the GUI Settings page or by editing constants in
photo_organizer.py - Supported formats: Add or remove extensions via the GUI Settings page or in the
SUPPORTED_EXTENSIONStuple inphoto_organizer.py - Deduplication: Configure strict/log thresholds and partial hash size
- Job management: The GUI supports creating multiple jobs with different source/destination pairs, each with independent filter settings
- System/application folder filtering: A unified set of auto-skipped system folders (e.g.
windows,appdata,cache,$recycle.bin,system volume information,temp) is used for both directory-tree pruning during scanning and per-file path filtering during copying. The set is defined inphoto_organizer.pyand is currently non-configurable. A configurable filtering system for both CLI and GUI is on the roadmap
You can run multiple instances of SnapSort simultaneously to process different folders or drives in parallel. The web GUI supports running multiple jobs concurrently with independent progress tracking.
Tips:
- Use different destination folders for each concurrent source to avoid duplicate naming conflicts
- The GUI's job system handles concurrent runs with separate progress bars and status tracking
- Directory creation and file copying are safe for concurrent use
- Manual & Resume modes in Web GUI: Bring CSV-based Manual Copy and Resume Copy workflows to the web interface
- CLI
--profileflag: Apply named performance profiles from the command line - CLI
--benchmarkflag: Run storage I/O benchmarks directly from the terminal - Post-run duplicate review in CLI: Interactive review of flagged duplicates after processing
- Configurable system folder filtering: Editable blocklist for system/application folders in both CLI and GUI
- Improved folder-name awareness: Retain event/memory grouping when photos span nested folders
- Project management: Support multi-drive projects with cross-drive analysis, manual evaluation, and unified reporting
- Auto-start on drive connection: Automatically trigger jobs when a configured drive is attached
- Analyze-only mode: Build CSV without copying files for manual review
- Customizable storage template: User-defined destination folder structure using template variables (inspired by Immich), e.g.
{{y}}/{{y}}-{{MM}}/{{filename}}.{{ext}}. Support variables for date/time ({{y}},{{MM}},{{dd}}), camera metadata ({{make}},{{model}}), and file info ({{filename}},{{ext}},{{filetype}})\n- Dedicated Duplicates page: Evaluate whether duplicate management warrants its own top-level page given its critical nature
| ID | Description | Priority |
|---|---|---|
| TB-001 | Migrate backend from CommonJS to ES Modules โ backend/src/index.js and all route files currently use require()/module.exports. Should be migrated to import/export to match the ESM convention used across all other projects in this workspace. Requires updating backend/package.json to add "type": "module" and converting all require() calls. |
Low |
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License. You're free to use and adapt the project for non-commercial purposes with proper attribution to @Rediwed. See the LICENSE file for complete details.



