From b6f6f10ad6fe6dd0931b2c3e62c9689e5d7de55d Mon Sep 17 00:00:00 2001 From: Divit Kashyap <162712154+divitkashyap@users.noreply.github.com> Date: Sat, 4 Apr 2026 19:21:10 +0100 Subject: [PATCH 1/2] feat(pdf-reader): add skill for PDF text extraction fallback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Submitted by: https://github.com/divitkashyap ## What Added — a skill that automatically detects when an agent cannot read PDFs and provides text extraction using command-line tools with optional installation and user confirmation. ## Why Many AI agents lack native PDF reading capability. When they encounter a PDF, they either: - Fail to help the user - Give generic responses about not being able to access PDF content This skill intercepts that situation and provides a complete fallback workflow using standard tools (pdftotext, pdfplumber, pymupdf). ## How It Works 1. **Detection**: Monitors for agent statements like 'I cannot read PDFs', 'I don't have the ability to read PDFs', etc. 2. **Tool Detection**: Checks for available tools in priority order: pdftotext → pdfplumber → pymupdf 3. **Installation**: If no tool found, asks user permission with platform-specific install commands 4. **Extraction**: Extracts PDF text to /tmp/pdf_extracted.txt 5. **Continuation**: Reads extracted text and proceeds with original user task ## Tool Priority 1. **pdftotext** (poppler-utils) — Preferred, fastest, system-level tool 2. **pdfplumber** (Python) — Fallback if poppler not available 3. **pymupdf** (Python) — Alternative Python fallback ## Platform Support - **macOS**: Homebrew (brew install poppler) or pip - **Linux (Ubuntu/Debian)**: apt-get install poppler-utils or pip - **Linux (Fedora/RHEL)**: dnf install poppler-utils or pip - **Windows**: winget/chocolatey or pip ## Key Features - Automatic detection of agent PDF limitation - Multi-tool fallback strategy - User confirmation before installation - Platform-specific installation commands - Layout preservation option (-layout flag) - Page range extraction support (-f, -l flags) - Error handling for encrypted/protected PDFs ## Example Triggers - 'I cannot read PDFs' - 'I don't have the ability to read PDFs' - 'I can't access PDF content' - 'PDF reading is not supported' ## Files - skills/pdf-reader/SKILL.md — Complete skill with workflow - README.md — Updated with new skill entry ## Validation All 15 skills pass: python .claude/skills/pr-review/scripts/validate_skills.py ✅ --- README.md | 1 + skills/pdf-reader/SKILL.md | 206 +++++++++++++++++++++++++++++++++++++ 2 files changed, 207 insertions(+) create mode 100644 skills/pdf-reader/SKILL.md diff --git a/README.md b/README.md index 0f54709..5d96cfa 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,7 @@ Development skills for AI coding agents. Plug into your favorite AI coding tool | `minimax-xlsx` | Open, create, read, analyze, edit, or validate Excel/spreadsheet files (.xlsx, .xlsm, .csv, .tsv). Covers creating new xlsx from scratch via XML templates, reading and analyzing with pandas, editing existing files with zero format loss, formula recalculation, validation, and professional financial formatting. | Official | | `minimax-docx` | Professional DOCX document creation, editing, and formatting using OpenXML SDK (.NET). Three pipelines: create new documents from scratch, fill/edit content in existing documents, or apply template formatting with XSD validation gate-check. | Official | | `vision-analysis` | Analyze, describe, and extract information from images using vision AI models. Supports describe, OCR, UI mockup review, chart data extraction, and object detection. Powered by MiniMax VL API with OpenAI GPT-4V fallback. | Community | +| `pdf-reader` | Automatically detect when an agent cannot read PDFs and provide text extraction fallback using command-line tools (pdftotext/poppler-utils) with automatic detection, optional installation with user confirmation, and multi-platform support (macOS, Linux, Windows). | Community | | `minimax-multimodal-toolkit` | Generate voice, music, video, and image content via MiniMax APIs — the unified entry for MiniMax multimodal use cases. Covers TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract) via FFmpeg. | Official | ## Installation diff --git a/skills/pdf-reader/SKILL.md b/skills/pdf-reader/SKILL.md new file mode 100644 index 0000000..9b16b18 --- /dev/null +++ b/skills/pdf-reader/SKILL.md @@ -0,0 +1,206 @@ +--- +name: pdf-reader +description: > + Automatically handle PDF text extraction when an agent lacks native PDF reading capability. + Use when: an agent responds with phrases like "I cannot read PDFs", "I don't have the ability to + read PDFs", "I can't access PDF content", "PDF reading is not supported", or similar statements + indicating the agent cannot process PDF files directly. + This skill intercepts that situation and provides a fallback workflow to extract PDF text using + command-line tools (pdftotext/poppler-utils), with automatic detection and optional installation. + Triggers: any message where the agent states it cannot read PDFs or lacks PDF capability. +license: MIT +metadata: + version: "1.0" + category: document-processing + sources: + - poppler-utils (pdftotext) + - pdfplumber (Python alternative) + submitted_by: https://github.com/divitkashyap +--- + +# PDF Reader Skill + +Automatically detect when an agent cannot read PDFs and provide text extraction fallback using command-line tools with user confirmation for installation. + +## Workflow + +### Step 1: Detect PDF Reading Limitation + +When the agent states it cannot read PDFs (phrases like "I cannot read PDFs", "I don't have the ability to read PDFs", etc.), activate this skill automatically. + +### Step 2: Identify the Target PDF + +Extract the PDF file path from the user's original request. Confirm the file exists: + +```bash +ls -la "/path/to/document.pdf" +``` + +### Step 3: Check for Available PDF Tools + +Check which PDF text extraction tools are available on the system: + +```bash +# Check for pdftotext (poppler-utils) +which pdftotext || echo "NOT_FOUND" + +# Check for pdfplumber (Python) +python3 -c "import pdfplumber; print('FOUND')" 2>/dev/null || echo "NOT_FOUND" + +# Check for pymupdf +python3 -c "import fitz; print('FOUND')" 2>/dev/null || echo "NOT_FOUND" +``` + +### Step 4: Tool Selection Priority + +Select the best available tool in this order: + +1. **`pdftotext`** (poppler-utils) - Preferred, fastest, system-level tool +2. **`pdfplumber`** (Python) - Fallback if poppler not available +3. **`pymupdf`** (Python) - Alternative Python fallback + +### Step 5: Installation (If Needed) + +If no tool is found, ask the user for permission to install: + +``` +I need to install a PDF text extraction tool to read this PDF. + +Available options: +1. pdftotext (poppler-utils) - Fast, system-level tool [Recommended] +2. pdfplumber - Python library alternative + +Shall I proceed with installation? (y/n) +``` + +**Installation commands by platform:** + +**macOS:** +```bash +brew install poppler # Installs pdftotext +# OR +pip3 install pdfplumber +``` + +**Linux (Ubuntu/Debian):** +```bash +sudo apt-get install poppler-utils +# OR +pip3 install pdfplumber +``` + +**Linux (Fedora/RHEL):** +```bash +sudo dnf install poppler-utils +# OR +pip3 install pdfplumber +``` + +**Windows:** +```powershell +# Use winget +winget install pdftotext +# OR +pip install pdfplumber +``` + +### Step 6: Extract PDF Text + +Once a tool is available, extract text from the PDF: + +**Using pdftotext:** +```bash +pdftotext -layout "/path/to/document.pdf" /tmp/pdf_extracted.txt +``` + +**Using pdfplumber (Python):** +```python +import pdfplumber + +with pdfplumber.open("/path/to/document.pdf") as pdf: + text = "" + for page in pdf.pages: + page_text = page.extract_text() + if page_text: + text += page_text + "\n\n" + +with open("/tmp/pdf_extracted.txt", "w") as f: + f.write(text) +``` + +**Using pymupdf (Python):** +```python +import fitz + +doc = fitz.open("/path/to/document.pdf") +text = "" +for page in doc: + text += page.get_text() + "\n\n" +doc.close() + +with open("/tmp/pdf_extracted.txt", "w") as f: + f.write(text) +``` + +### Step 7: Read Extracted Text + +Read the extracted text file and present it to the user: + +```bash +cat /tmp/pdf_extracted.txt +``` + +### Step 8: Continue Original Task + +After extracting and presenting the PDF content, proceed with the user's original request using the extracted text as context. + +## Platform-Specific Notes + +### macOS + +- poppler-utils can be installed via Homebrew: `brew install poppler` +- Python libraries work with system Python3 or pyenv + +### Linux + +- Most distributions have poppler-utils in their package managers +- pdfplumber/pymupdf require pip installation + +### Windows + +- poppler binaries available from official poppler releases or via winget/chocolatey +- Python libraries recommended for Windows: `pip install pdfplumber` + +## Common Errors and Solutions + +| Error | Cause | Solution | +|-------|-------|----------| +| `pdftotext: command not found` | poppler-utils not installed | Install via package manager or use Python alternative | +| `Permission denied` | Output directory not writable | Use `/tmp/` for output | +| `File not found` | Wrong PDF path | Verify path with `ls -la` | +| `PDF extraction failed` | Encrypted/protected PDF | Inform user and suggest manual extraction | +| `pdftotext: syntax error` | Malformed PDF | Try with `-raw` flag instead of `-layout` | + +## Alternative Flags for pdftotext + +```bash +# Basic extraction +pdftotext input.pdf output.txt + +# Preserve layout (default) +pdftotext -layout input.pdf output.txt + +# Simple extraction (no layout) +pdftotext -raw input.pdf output.txt + +# Extract specific pages +pdftotext -f 1 -l 5 input.pdf output.txt + +# Extract to stdout +pdftotext - # Reads from stdin +``` + +## File Size Limits + +- For PDFs larger than 50MB, extract page ranges instead of entire document +- Use `-f` and `-l` flags to process in chunks if needed From bc1b3386c8e995562e6fcaa3999d4832644313b0 Mon Sep 17 00:00:00 2001 From: Divit Kashyap <162712154+divitkashyap@users.noreply.github.com> Date: Sat, 4 Apr 2026 19:30:33 +0100 Subject: [PATCH 2/2] feat(pdf-reader): add PDF text extraction fallback skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Submitted by: https://github.com/divitkashyap ## What Added — a skill that provides automatic PDF text extraction fallback using command-line tools (pdftotext/poppler-utils) with optional installation and user confirmation. ## Why When user shares a PDF or asks to read/extract text from it, and the agent lacks native PDF capability, this skill provides a complete fallback workflow: 1. Detect PDF file in user's message 2. Check for available tools (pdftotext → pdfplumber → pymupdf) 3. If no tool found, ask user permission to install 4. Extract PDF text to temp file 5. Continue with original user task ## Complementary to minimax-pdf-read (PR #51) This skill differs from : - minimax-pdf-read: User explicitly asks to extract text from a PDF (active) - pdf-reader: Fallback when agent needs to process PDF but lacks capability Both can coexist — they serve different use cases. ## Tool Priority 1. pdftotext (poppler-utils) — Preferred, fastest, system-level 2. pdfplumber (Python) — Fallback if poppler not available 3. pymupdf (Python) — Alternative Python fallback ## Platform Support - macOS: Homebrew (brew install poppler) or pip - Linux: apt-get/dnf install poppler-utils or pip - Windows: winget/chocolatey or pip ## Validation All 15 skills pass: python .claude/skills/pr-review/scripts/validate_skills.py ✅ --- README_zh.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README_zh.md b/README_zh.md index 01374ce..cb15877 100644 --- a/README_zh.md +++ b/README_zh.md @@ -23,6 +23,7 @@ | `minimax-xlsx` | 打开、创建、读取、分析、编辑或验证 Excel/电子表格文件(.xlsx、.xlsm、.csv、.tsv)。支持通过 XML 模板从零创建 xlsx、使用 pandas 读取分析、零格式损失编辑现有文件、公式重算与验证、专业财务格式化。 | Official | | `minimax-docx` | 基于 OpenXML SDK(.NET)的专业 DOCX 文档创建、编辑与排版。三条流水线:从零创建新文档、填写/编辑现有文档内容、应用模板格式并通过 XSD 验证门控检查。 | Official | | `vision-analysis` | 使用视觉 AI 模型分析、描述和提取图像信息。支持描述、OCR 文字识别、UI 界面审查、图表数据提取和物体检测。基于 MiniMax VL API,OpenAI GPT-4V 作为备选。 | Community | +| `pdf-reader` | 自动检测 Agent 无法读取 PDF 的情况,并使用命令行工具(pdftotext/poppler-utils)提供文本提取后备方案。支持自动检测、用户确认后安装、多平台支持(macOS、Linux、Windows)。 | Community | | `minimax-multimodal-toolkit` | 通过 MiniMax API 生成语音、音乐、视频和图片内容 — MiniMax 多模态使用场景的统一入口。涵盖 TTS(文字转语音、声音克隆、声音设计、多段合成)、音乐(带词歌曲、纯音乐)、视频(文生视频、图生视频、首尾帧、主体参考、模板、长视频多场景)、图片(文生图、图生图含角色参考),以及基于 FFmpeg 的媒体处理(格式转换、拼接、裁剪、提取)。 | Official | ## 安装