DeepSeek-OCR PDF Tool

An OpenCode tool for processing PDF files using DeepSeek-OCR. Converts PDFs to high-quality images, performs OCR on each page, and returns markdown or plain text output.

Installation

This tool should be installed globally at ~/.config/opencode/tool/.

Automated Installation/Update (Recommended)

Use the provided deployment script for one-command installation and updates:

# Initial installation
./deploy-tool.sh

# Update after making changes to the repository
./deploy-tool.sh

# Force reinstallation (even if already installed)
./deploy-tool.sh --force

# Specify custom repository path
./deploy-tool.sh --repo /path/to/opencode-ocr

The script automatically detects if the tool is already installed and performs an update instead.

Manual Installation

# Create tool directory
mkdir -p ~/.config/opencode/tool/

# Copy files
cp pdf-ocr.ts ~/.config/opencode/tool/
cp pdf_ocr_backend.py ~/.config/opencode/tool/
cp pyproject.toml ~/.config/opencode/tool/

# Install Python dependencies
cd ~/.config/opencode/tool && uv sync

Usage

Important: Python scripts must be run using uv run to ensure proper dependency management:

# Direct backend execution (with .env file)
uv run --directory ~/.config/opencode/tool --env-file .env pdf_ocr_backend.py <pdf_path> <output_format>

# Via OpenCode agent
Agent will use the pdf-ocr tool automatically

Parameters

pdf_path: Absolute path to PDF file
output_format: Output format - "markdown" or "text" (defaults to "markdown")

Dependencies

openai>=1.0.0
PyMuPDF>=1.23.0
Pillow>=10.0.0

Configuration

The tool connects to an OpenAI-compatible endpoint running DeepSeek-OCR. The endpoint can be configured in three ways:

.env file (recommended for persistent configuration): Copy .env.example to .env and edit it:
```
cp .env.example .env
# Edit .env with your endpoint URL
```
Then run with uv run --env-file .env.

Environment variable:

export DEEPSEEK_OCR_BASE_URL="http://your-endpoint:8080/v1"

Command-line argument (overrides both above):

uv run --directory ~/.config/opencode/tool pdf_ocr_backend.py <pdf_path> <output_format> --base-url http://your-endpoint:8080/v1

If none of these are set, the tool will throw an error.

The tool uses the deepseek-ocr model name when making requests to the endpoint.

Technical Details

PDF-to-image conversion at 144 DPI (high quality for OCR)
PNG format with RGB color space
Sequential page processing for memory management
OCR parameters: temperature=0.0, max_tokens=8192, ngram_size=30, window_size=90

Future Enhancements

Partial page range support (e.g., process pages 5-10 only)
Progress reporting during OCR processing
Batch processing of multiple PDFs
Additional output formats (e.g., JSON, HTML)
Image quality settings configuration

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
pdf-ocr		pdf-ocr
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
deploy-tool.sh		deploy-tool.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepSeek-OCR PDF Tool

Installation

Automated Installation/Update (Recommended)

Manual Installation

Usage

Parameters

Dependencies

Configuration

Technical Details

Future Enhancements

About

Uh oh!

Releases

Packages

Languages

Skelectric/opencode-ocr

Folders and files

Latest commit

History

Repository files navigation

DeepSeek-OCR PDF Tool

Installation

Automated Installation/Update (Recommended)

Manual Installation

Usage

Parameters

Dependencies

Configuration

Technical Details

Future Enhancements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages