Skip to content

doc-bricks/UniversalDocsGrabber

UniversalDocsGrabber

Desktop tool for automatically downloading, converting, and organizing documents from IMAP mailboxes.

Deutsche Dokumentation: README-DE.md

Status Python Platform

UniversalDocsGrabber Screenshot

Features

  • Multi-account IMAP support
  • Search profiles with sender, subject, and date filters
  • Downloads PDF, DOCX, DOC, JPG, PNG, and other document types
  • Automatic PDF conversion for documents, images, and text bodies
  • OCR support for scanned PDFs (via Tesseract)
  • SHA-256 hash-based duplicate detection
  • Built-in scheduler for recurring scans (15 min to 24 h)
  • Rule-based auto-categorization (invoices, shipping, contracts, taxes, insurance, etc.)

Installation

Requirements

  • Python 3.8+
  • Windows (for Word conversion via win32com)
  • Optional: Tesseract OCR
  • Optional: Poppler

Setup

pip install -r requirements.txt

Optional: Poppler

  1. Download from https://github.com/oschwartz10612/poppler-windows/releases
  2. Extract to C:\Program Files\poppler\
  3. Adjust POPPLER_PATH in UniversalDocsGrabberV1.py if needed

Optional: Tesseract

  1. Download from https://github.com/UB-Mannheim/tesseract/wiki
  2. Install to C:\Program Files\Tesseract-OCR\
  3. Add to PATH

Run

python UniversalDocsGrabberV1.py

or double-click START.bat

Typical Workflow

  1. Add an IMAP account in the πŸ”‘ Accounts tab
  2. Create a search profile with group, filters, and target folder
  3. Set a date range
  4. Start the scan with πŸš€ START
  5. Browse results in the πŸ“‚ Documents tab

Features in Detail

Search Profiles

  • Group-based organization for thematic sorting
  • Profile-specific override settings
  • Per-run date filters

Conversion

  • Word β†’ PDF via win32com or docx2pdf
  • TXT β†’ PDF via reportlab
  • Images β†’ PDF via Pillow
  • OCR for PDFs without a text layer

Scheduler & Auto-Categorization

  • Recurring scans from 15 minutes to 24 hours
  • Runs skipped if another scan is already active
  • Rule-based auto-categorization for invoices, shipping, contracts, cancellations, taxes, insurance, applications, and banking

Deduplication

  • SHA-256 hash check
  • Configurable per profile

Local Data

  • %USERPROFILE%\.univ_docs_grabber\config_v1.json
  • %USERPROFILE%\.univ_docs_grabber\documents.json
  • %USERPROFILE%\Downloads\UnivDocs\

Known Limitations

  • OCR requires Tesseract and Poppler
  • Word conversion requires Windows components
  • Search is intentionally conservative β€” limited mail count per profile

Related Tools

Part of the doc-bricks mail suite:

Tool Description
MailProcessor System tray launcher for all Universal Mail Tools
UniversalMailCleaner Rule-based IMAP mailbox cleaner with safe mode
UniversalInvoiceMail Extract invoices and receipts from IMAP mail

License

MIT β€” Lukas Geiger

About

IMAP & Gmail document downloader with OCR, drag&drop profiles and query builder

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors