FileFolio

FileFolio helps privacy-conscious professionals keep large PDF collections searchable and organized using local AI. No cloud, no telemetry, all on your machine.

Status: Actively maintained, used on my own 1,000+ PDF collection. Expect breaking changes before v1.0, but I'm responsive to issues and feedback.

Why FileFolio?

You have hundreds of PDF bills, reports, or research papers scattered in folders.
You care about privacy and do not want to upload them to cloud AI services.
You still want smart search, auto-tagging, and reasonable file names.

FileFolio watches a folder, uses a local LLM via Ollama to analyze each PDF, and keeps everything searchable in one interface.

Features

Automatic organization – watches a folder and imports new PDFs, extracting text (with OCR), then generating categories and tags
Privacy-first – all processing happens locally with Ollama, no cloud services, no telemetry or analytics
Fast retrieval – full-text search across content and metadata, plus thumbnail previews
Disaster-proof – backup and restore your entire library via ZIP
Multi-language support – UI available in multiple languages
Dark mode – toggle between light and dark themes

Prerequisites

Python 3.10+
Ollama installed locally
Poppler (for PDF processing)
- macOS: brew install poppler
- Ubuntu/Debian: apt-get install poppler-utils
- Windows: Download from poppler releases
Tesseract (for OCR on scanned documents)
- macOS: brew install tesseract
- Ubuntu/Debian: apt-get install tesseract-ocr
- Windows: Download from Tesseract releases

Quick start

Clone the repository

git clone https://github.com/imkrishsub/filefolio.git
cd filefolio

Create and activate virtual environment

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Start Ollama (in a separate terminal)

ollama serve

Run the application

python backend/main.py

Open your browser Navigate to: http://127.0.0.1:8000

Configuration

Custom port

Set a custom port using the PORT environment variable:

PORT=8080 python backend/main.py

Testing

pytest

Full API and functionality coverage including unit tests, integration tests, and frontend tests.

Project structure

filefolio/
├── backend/
│   ├── main.py          # FastAPI server
│   └── sync_service.py  # Folder sync service
├── frontend/
│   ├── static/
│   │   ├── app.js       # Frontend JavaScript
│   │   ├── style.css    # Styles
│   │   └── i18n.json    # Translations
│   └── templates/
│       └── index.html   # Main interface
├── tests/               # Test suite
├── uploads/             # PDF storage (created on first run)
├── thumbnails/          # Document thumbnails (created on first run)
├── data/                # Database (created on first run)
├── setup.cfg            # Linting and tool configuration
├── pytest.ini           # Test configuration
└── requirements.txt

How it works

Upload - Drag and drop a PDF file into the web interface, or sync a local folder to automatically import new files
Extract - Text is extracted from the PDF (with OCR fallback for scanned documents)
Analyze - A local LLM analyzes the content to determine category, tags, and suggest a filename
Organize - The document is saved with metadata in a local SQLite database
Search - Find documents by content, category, tags, or filename

Tech stack

Backend: FastAPI (Python)
Frontend: Vanilla JavaScript
Database: SQLite
AI/LLM: Ollama
PDF Processing: PyPDF, pdf2image, pytesseract
Styling: Custom CSS

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
preview.png		preview.png
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FileFolio

Why FileFolio?

Features

Prerequisites

Quick start

Configuration

Custom port

Testing

Project structure

How it works

Tech stack

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FileFolio

Why FileFolio?

Features

Prerequisites

Quick start

Configuration

Custom port

Testing

Project structure

How it works

Tech stack

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages