An OpenCode tool for processing PDF files using DeepSeek-OCR. Converts PDFs to high-quality images, performs OCR on each page, and returns markdown or plain text output.
This tool should be installed globally at ~/.config/opencode/tool/.
Use the provided deployment script for one-command installation and updates:
# Initial installation
./deploy-tool.sh
# Update after making changes to the repository
./deploy-tool.sh
# Force reinstallation (even if already installed)
./deploy-tool.sh --force
# Specify custom repository path
./deploy-tool.sh --repo /path/to/opencode-ocrThe script automatically detects if the tool is already installed and performs an update instead.
# Create tool directory
mkdir -p ~/.config/opencode/tool/
# Copy files
cp pdf-ocr.ts ~/.config/opencode/tool/
cp pdf_ocr_backend.py ~/.config/opencode/tool/
cp pyproject.toml ~/.config/opencode/tool/
# Install Python dependencies
cd ~/.config/opencode/tool && uv syncImportant: Python scripts must be run using uv run to ensure proper dependency management:
# Direct backend execution (with .env file)
uv run --directory ~/.config/opencode/tool --env-file .env pdf_ocr_backend.py <pdf_path> <output_format>
# Via OpenCode agent
Agent will use the pdf-ocr tool automaticallypdf_path: Absolute path to PDF fileoutput_format: Output format - "markdown" or "text" (defaults to "markdown")
- openai>=1.0.0
- PyMuPDF>=1.23.0
- Pillow>=10.0.0
The tool connects to an OpenAI-compatible endpoint running DeepSeek-OCR. The endpoint can be configured in three ways:
-
.env file (recommended for persistent configuration): Copy
.env.exampleto.envand edit it:cp .env.example .env # Edit .env with your endpoint URLThen run with
uv run --env-file .env. -
Environment variable:
export DEEPSEEK_OCR_BASE_URL="http://your-endpoint:8080/v1"
-
Command-line argument (overrides both above):
uv run --directory ~/.config/opencode/tool pdf_ocr_backend.py <pdf_path> <output_format> --base-url http://your-endpoint:8080/v1
If none of these are set, the tool will throw an error.
The tool uses the deepseek-ocr model name when making requests to the endpoint.
- PDF-to-image conversion at 144 DPI (high quality for OCR)
- PNG format with RGB color space
- Sequential page processing for memory management
- OCR parameters: temperature=0.0, max_tokens=8192, ngram_size=30, window_size=90
- Partial page range support (e.g., process pages 5-10 only)
- Progress reporting during OCR processing
- Batch processing of multiple PDFs
- Additional output formats (e.g., JSON, HTML)
- Image quality settings configuration