Nepali-first document reading: upload PDFs and scanned images, run multi-engine OCR tuned for Devanagari / Nepali, then optionally clean the text with a local LLM (e.g. Gemma via Ollama) or Anthropic Claude.
- Nepali in the wild is mostly Devanagari scan noise: similar glyphs, matras, and skewed pages break generic OCR.
- This stack combines PaddleOCR (PP-OCRv529454948 Nepali
ne), EasyOCR (ne+en), and Tesseract where useful, then merges results intelligently. - Privacy-friendly: run OCR and post-correction on your machine; cloud APIs are optional.
| Area | What you get |
|---|---|
| OCR | Ensemble: PaddleOCR 3.x (PP-OCRv5, lang=ne) + EasyOCR + Tesseract; digital PDF text when pages are not scanned |
| Languages | Nepali (ne) and mixed Nepali/English |
| LLM cleanup | Configurable: Ollama (local), Anthropic, or disabled |
| API | FastAPI — upload, job status, full OCR/NLP payload, plain text |
| UI | React + Vite + Tailwind; proxies /api to the backend in dev |
| Jobs | Celery + Redis for async processing; inline processing mode for simple local setups |
Upload → FastAPI → SQLite (metadata) → Celery worker OR inline thread
↓
PDF / image → preprocess → OCR ensemble → post-process
↓
Optional: LLM Devanagari correction (Ollama / Claude)
- Python 3.11+ (3.11 recommended; match the Docker image)
- Node.js 20+ (for the frontend)
- System packages (local runs):
- Poppler (
pdfinfo, etc.) — required forpdf2image/ PDF rasterization- macOS:
brew install poppler - Debian/Ubuntu:
apt install poppler-utils
- macOS:
- Tesseract with Nepali data (optional but used by the ensemble)
- macOS:
brew install tesseract tesseract-lang(ensurenep/ Devanagari scripts available)
- macOS:
- Poppler (
Optional:
- Redis — if you use Celery workers (or rely on
PROCESS_INLINE=true). - Ollama — for local correction; e.g.
ollama pull gemma3:12b(or any capable model you prefer).
git clone git@github.com:rayraycodes/LocalNepaliDocumentChat.git
cd LocalNepaliDocumentChat
cp .env.example .env
# Edit .env: OLLAMA_MODEL, CORRECTION_LLM_BACKEND, optional ANTHROPIC_API_KEYpython -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -U pip
pip install -e ".[dev]"Start the API (from repo root):
uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload- Set
PROCESS_INLINE=truein.envif you are not running Redis + Celery (processing runs inside the API after upload). - With
PROCESS_INLINE=false, start Redis and a worker:
celery -A backend.celery_app worker --loglevel=infocd frontend
npm install
npm run devOpen http://localhost:5173. The Vite dev server proxies /api and /health to port 8000.
ollama serve
ollama pull gemma3:12b # or another model; set OLLAMA_MODEL in .env to matchRedis, API, worker, and frontend services are defined in docker-compose.yml.
cp .env.example .env
docker compose up --build- API: http://localhost:8000
- Frontend: http://localhost:5173
- Ensure
.envinside Compose matches Docker networking (e.g.REDIS_URL=redis://redis:6379/0when talking to theredisservice instead oflocalhost).
| Variable | Purpose |
|---|---|
CORRECTION_LLM_BACKEND |
ollama, anthropic, or none |
OLLAMA_BASE_URL |
Default http://127.0.0.1:11434 |
OLLAMA_MODEL |
Tag pulled in Ollama (e.g. gemma3:12b) |
ANTHROPIC_API_KEY |
If using Claude for correction |
PROCESS_INLINE |
Process jobs in the API process (no Celery) |
REDIS_URL |
Celery broker / result backend |
PADDLE_USE_DOC_ORIENTATION / PADDLE_USE_DOC_UNWARPING |
Trade accuracy vs. speed for Paddle preprocess |
See .env.example for the full list.
| Method | Path | Description |
|---|---|---|
GET |
/health |
Liveness |
POST |
/api/documents/upload |
Multipart file upload |
GET |
/api/documents/{job_id}/status |
Progress / stage |
GET |
/api/documents/{job_id}/result |
OCR + NLP JSON |
GET |
/api/documents/{job_id}/text |
Plain text (completed jobs) |
pytest
ruff check backend testsPre-download heavy models (optional):
python -m scripts.download_modelsIssues and pull requests are welcome. Please:
- Open an issue for larger changes or design questions.
- Keep PRs focused; match existing style and run
pytest/ruffwhen you touch Python.
This project is released under the MIT License.
Built with FastAPI, PaddleOCR, EasyOCR, Tesseract, and the broader open-source ML ecosystem.