Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Development skills for AI coding agents. Plug into your favorite AI coding tool
| `minimax-xlsx` | Open, create, read, analyze, edit, or validate Excel/spreadsheet files (.xlsx, .xlsm, .csv, .tsv). Covers creating new xlsx from scratch via XML templates, reading and analyzing with pandas, editing existing files with zero format loss, formula recalculation, validation, and professional financial formatting. | Official |
| `minimax-docx` | Professional DOCX document creation, editing, and formatting using OpenXML SDK (.NET). Three pipelines: create new documents from scratch, fill/edit content in existing documents, or apply template formatting with XSD validation gate-check. | Official |
| `vision-analysis` | Analyze, describe, and extract information from images using vision AI models. Supports describe, OCR, UI mockup review, chart data extraction, and object detection. Powered by MiniMax VL API with OpenAI GPT-4V fallback. | Community |
| `pdf-reader` | Automatically detect when an agent cannot read PDFs and provide text extraction fallback using command-line tools (pdftotext/poppler-utils) with automatic detection, optional installation with user confirmation, and multi-platform support (macOS, Linux, Windows). | Community |
| `minimax-multimodal-toolkit` | Generate voice, music, video, and image content via MiniMax APIs — the unified entry for MiniMax multimodal use cases. Covers TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract) via FFmpeg. | Official |

## Installation
Expand Down
1 change: 1 addition & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
| `minimax-xlsx` | 打开、创建、读取、分析、编辑或验证 Excel/电子表格文件(.xlsx、.xlsm、.csv、.tsv)。支持通过 XML 模板从零创建 xlsx、使用 pandas 读取分析、零格式损失编辑现有文件、公式重算与验证、专业财务格式化。 | Official |
| `minimax-docx` | 基于 OpenXML SDK(.NET)的专业 DOCX 文档创建、编辑与排版。三条流水线:从零创建新文档、填写/编辑现有文档内容、应用模板格式并通过 XSD 验证门控检查。 | Official |
| `vision-analysis` | 使用视觉 AI 模型分析、描述和提取图像信息。支持描述、OCR 文字识别、UI 界面审查、图表数据提取和物体检测。基于 MiniMax VL API,OpenAI GPT-4V 作为备选。 | Community |
| `pdf-reader` | 自动检测 Agent 无法读取 PDF 的情况,并使用命令行工具(pdftotext/poppler-utils)提供文本提取后备方案。支持自动检测、用户确认后安装、多平台支持(macOS、Linux、Windows)。 | Community |
| `minimax-multimodal-toolkit` | 通过 MiniMax API 生成语音、音乐、视频和图片内容 — MiniMax 多模态使用场景的统一入口。涵盖 TTS(文字转语音、声音克隆、声音设计、多段合成)、音乐(带词歌曲、纯音乐)、视频(文生视频、图生视频、首尾帧、主体参考、模板、长视频多场景)、图片(文生图、图生图含角色参考),以及基于 FFmpeg 的媒体处理(格式转换、拼接、裁剪、提取)。 | Official |

## 安装
Expand Down
206 changes: 206 additions & 0 deletions skills/pdf-reader/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
---
name: pdf-reader
description: >
Automatically handle PDF text extraction when an agent lacks native PDF reading capability.
Use when: an agent responds with phrases like "I cannot read PDFs", "I don't have the ability to
read PDFs", "I can't access PDF content", "PDF reading is not supported", or similar statements
indicating the agent cannot process PDF files directly.
This skill intercepts that situation and provides a fallback workflow to extract PDF text using
command-line tools (pdftotext/poppler-utils), with automatic detection and optional installation.
Triggers: any message where the agent states it cannot read PDFs or lacks PDF capability.
license: MIT
metadata:
version: "1.0"
category: document-processing
sources:
- poppler-utils (pdftotext)
- pdfplumber (Python alternative)
submitted_by: https://github.com/divitkashyap
---

# PDF Reader Skill

Automatically detect when an agent cannot read PDFs and provide text extraction fallback using command-line tools with user confirmation for installation.

## Workflow

### Step 1: Detect PDF Reading Limitation

When the agent states it cannot read PDFs (phrases like "I cannot read PDFs", "I don't have the ability to read PDFs", etc.), activate this skill automatically.

### Step 2: Identify the Target PDF

Extract the PDF file path from the user's original request. Confirm the file exists:

```bash
ls -la "/path/to/document.pdf"
```

### Step 3: Check for Available PDF Tools

Check which PDF text extraction tools are available on the system:

```bash
# Check for pdftotext (poppler-utils)
which pdftotext || echo "NOT_FOUND"

# Check for pdfplumber (Python)
python3 -c "import pdfplumber; print('FOUND')" 2>/dev/null || echo "NOT_FOUND"

# Check for pymupdf
python3 -c "import fitz; print('FOUND')" 2>/dev/null || echo "NOT_FOUND"
```

### Step 4: Tool Selection Priority

Select the best available tool in this order:

1. **`pdftotext`** (poppler-utils) - Preferred, fastest, system-level tool
2. **`pdfplumber`** (Python) - Fallback if poppler not available
3. **`pymupdf`** (Python) - Alternative Python fallback

### Step 5: Installation (If Needed)

If no tool is found, ask the user for permission to install:

```
I need to install a PDF text extraction tool to read this PDF.

Available options:
1. pdftotext (poppler-utils) - Fast, system-level tool [Recommended]
2. pdfplumber - Python library alternative

Shall I proceed with installation? (y/n)
```

**Installation commands by platform:**

**macOS:**
```bash
brew install poppler # Installs pdftotext
# OR
pip3 install pdfplumber
```

**Linux (Ubuntu/Debian):**
```bash
sudo apt-get install poppler-utils
# OR
pip3 install pdfplumber
```

**Linux (Fedora/RHEL):**
```bash
sudo dnf install poppler-utils
# OR
pip3 install pdfplumber
```

**Windows:**
```powershell
# Use winget
winget install pdftotext
# OR
pip install pdfplumber
```

### Step 6: Extract PDF Text

Once a tool is available, extract text from the PDF:

**Using pdftotext:**
```bash
pdftotext -layout "/path/to/document.pdf" /tmp/pdf_extracted.txt
```

**Using pdfplumber (Python):**
```python
import pdfplumber

with pdfplumber.open("/path/to/document.pdf") as pdf:
text = ""
for page in pdf.pages:
page_text = page.extract_text()
if page_text:
text += page_text + "\n\n"

with open("/tmp/pdf_extracted.txt", "w") as f:
f.write(text)
```

**Using pymupdf (Python):**
```python
import fitz

doc = fitz.open("/path/to/document.pdf")
text = ""
for page in doc:
text += page.get_text() + "\n\n"
doc.close()

with open("/tmp/pdf_extracted.txt", "w") as f:
f.write(text)
```

### Step 7: Read Extracted Text

Read the extracted text file and present it to the user:

```bash
cat /tmp/pdf_extracted.txt
```

### Step 8: Continue Original Task

After extracting and presenting the PDF content, proceed with the user's original request using the extracted text as context.

## Platform-Specific Notes

### macOS

- poppler-utils can be installed via Homebrew: `brew install poppler`
- Python libraries work with system Python3 or pyenv

### Linux

- Most distributions have poppler-utils in their package managers
- pdfplumber/pymupdf require pip installation

### Windows

- poppler binaries available from official poppler releases or via winget/chocolatey
- Python libraries recommended for Windows: `pip install pdfplumber`

## Common Errors and Solutions

| Error | Cause | Solution |
|-------|-------|----------|
| `pdftotext: command not found` | poppler-utils not installed | Install via package manager or use Python alternative |
| `Permission denied` | Output directory not writable | Use `/tmp/` for output |
| `File not found` | Wrong PDF path | Verify path with `ls -la` |
| `PDF extraction failed` | Encrypted/protected PDF | Inform user and suggest manual extraction |
| `pdftotext: syntax error` | Malformed PDF | Try with `-raw` flag instead of `-layout` |

## Alternative Flags for pdftotext

```bash
# Basic extraction
pdftotext input.pdf output.txt

# Preserve layout (default)
pdftotext -layout input.pdf output.txt

# Simple extraction (no layout)
pdftotext -raw input.pdf output.txt

# Extract specific pages
pdftotext -f 1 -l 5 input.pdf output.txt

# Extract to stdout
pdftotext - # Reads from stdin
```

## File Size Limits

- For PDFs larger than 50MB, extract page ranges instead of entire document
- Use `-f` and `-l` flags to process in chunks if needed