diff --git a/README.md b/README.md index 0f54709..5d96cfa 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,7 @@ Development skills for AI coding agents. Plug into your favorite AI coding tool | `minimax-xlsx` | Open, create, read, analyze, edit, or validate Excel/spreadsheet files (.xlsx, .xlsm, .csv, .tsv). Covers creating new xlsx from scratch via XML templates, reading and analyzing with pandas, editing existing files with zero format loss, formula recalculation, validation, and professional financial formatting. | Official | | `minimax-docx` | Professional DOCX document creation, editing, and formatting using OpenXML SDK (.NET). Three pipelines: create new documents from scratch, fill/edit content in existing documents, or apply template formatting with XSD validation gate-check. | Official | | `vision-analysis` | Analyze, describe, and extract information from images using vision AI models. Supports describe, OCR, UI mockup review, chart data extraction, and object detection. Powered by MiniMax VL API with OpenAI GPT-4V fallback. | Community | +| `pdf-reader` | Automatically detect when an agent cannot read PDFs and provide text extraction fallback using command-line tools (pdftotext/poppler-utils) with automatic detection, optional installation with user confirmation, and multi-platform support (macOS, Linux, Windows). | Community | | `minimax-multimodal-toolkit` | Generate voice, music, video, and image content via MiniMax APIs — the unified entry for MiniMax multimodal use cases. Covers TTS (text-to-speech, voice cloning, voice design, multi-segment), music (songs, instrumentals), video (text-to-video, image-to-video, start-end frame, subject reference, templates, long-form multi-scene), image (text-to-image, image-to-image with character reference), and media processing (convert, concat, trim, extract) via FFmpeg. | Official | ## Installation diff --git a/README_zh.md b/README_zh.md index 01374ce..cb15877 100644 --- a/README_zh.md +++ b/README_zh.md @@ -23,6 +23,7 @@ | `minimax-xlsx` | 打开、创建、读取、分析、编辑或验证 Excel/电子表格文件(.xlsx、.xlsm、.csv、.tsv)。支持通过 XML 模板从零创建 xlsx、使用 pandas 读取分析、零格式损失编辑现有文件、公式重算与验证、专业财务格式化。 | Official | | `minimax-docx` | 基于 OpenXML SDK(.NET)的专业 DOCX 文档创建、编辑与排版。三条流水线:从零创建新文档、填写/编辑现有文档内容、应用模板格式并通过 XSD 验证门控检查。 | Official | | `vision-analysis` | 使用视觉 AI 模型分析、描述和提取图像信息。支持描述、OCR 文字识别、UI 界面审查、图表数据提取和物体检测。基于 MiniMax VL API,OpenAI GPT-4V 作为备选。 | Community | +| `pdf-reader` | 自动检测 Agent 无法读取 PDF 的情况,并使用命令行工具(pdftotext/poppler-utils)提供文本提取后备方案。支持自动检测、用户确认后安装、多平台支持(macOS、Linux、Windows)。 | Community | | `minimax-multimodal-toolkit` | 通过 MiniMax API 生成语音、音乐、视频和图片内容 — MiniMax 多模态使用场景的统一入口。涵盖 TTS(文字转语音、声音克隆、声音设计、多段合成)、音乐(带词歌曲、纯音乐)、视频(文生视频、图生视频、首尾帧、主体参考、模板、长视频多场景)、图片(文生图、图生图含角色参考),以及基于 FFmpeg 的媒体处理(格式转换、拼接、裁剪、提取)。 | Official | ## 安装 diff --git a/skills/pdf-reader/SKILL.md b/skills/pdf-reader/SKILL.md new file mode 100644 index 0000000..9b16b18 --- /dev/null +++ b/skills/pdf-reader/SKILL.md @@ -0,0 +1,206 @@ +--- +name: pdf-reader +description: > + Automatically handle PDF text extraction when an agent lacks native PDF reading capability. + Use when: an agent responds with phrases like "I cannot read PDFs", "I don't have the ability to + read PDFs", "I can't access PDF content", "PDF reading is not supported", or similar statements + indicating the agent cannot process PDF files directly. + This skill intercepts that situation and provides a fallback workflow to extract PDF text using + command-line tools (pdftotext/poppler-utils), with automatic detection and optional installation. + Triggers: any message where the agent states it cannot read PDFs or lacks PDF capability. +license: MIT +metadata: + version: "1.0" + category: document-processing + sources: + - poppler-utils (pdftotext) + - pdfplumber (Python alternative) + submitted_by: https://github.com/divitkashyap +--- + +# PDF Reader Skill + +Automatically detect when an agent cannot read PDFs and provide text extraction fallback using command-line tools with user confirmation for installation. + +## Workflow + +### Step 1: Detect PDF Reading Limitation + +When the agent states it cannot read PDFs (phrases like "I cannot read PDFs", "I don't have the ability to read PDFs", etc.), activate this skill automatically. + +### Step 2: Identify the Target PDF + +Extract the PDF file path from the user's original request. Confirm the file exists: + +```bash +ls -la "/path/to/document.pdf" +``` + +### Step 3: Check for Available PDF Tools + +Check which PDF text extraction tools are available on the system: + +```bash +# Check for pdftotext (poppler-utils) +which pdftotext || echo "NOT_FOUND" + +# Check for pdfplumber (Python) +python3 -c "import pdfplumber; print('FOUND')" 2>/dev/null || echo "NOT_FOUND" + +# Check for pymupdf +python3 -c "import fitz; print('FOUND')" 2>/dev/null || echo "NOT_FOUND" +``` + +### Step 4: Tool Selection Priority + +Select the best available tool in this order: + +1. **`pdftotext`** (poppler-utils) - Preferred, fastest, system-level tool +2. **`pdfplumber`** (Python) - Fallback if poppler not available +3. **`pymupdf`** (Python) - Alternative Python fallback + +### Step 5: Installation (If Needed) + +If no tool is found, ask the user for permission to install: + +``` +I need to install a PDF text extraction tool to read this PDF. + +Available options: +1. pdftotext (poppler-utils) - Fast, system-level tool [Recommended] +2. pdfplumber - Python library alternative + +Shall I proceed with installation? (y/n) +``` + +**Installation commands by platform:** + +**macOS:** +```bash +brew install poppler # Installs pdftotext +# OR +pip3 install pdfplumber +``` + +**Linux (Ubuntu/Debian):** +```bash +sudo apt-get install poppler-utils +# OR +pip3 install pdfplumber +``` + +**Linux (Fedora/RHEL):** +```bash +sudo dnf install poppler-utils +# OR +pip3 install pdfplumber +``` + +**Windows:** +```powershell +# Use winget +winget install pdftotext +# OR +pip install pdfplumber +``` + +### Step 6: Extract PDF Text + +Once a tool is available, extract text from the PDF: + +**Using pdftotext:** +```bash +pdftotext -layout "/path/to/document.pdf" /tmp/pdf_extracted.txt +``` + +**Using pdfplumber (Python):** +```python +import pdfplumber + +with pdfplumber.open("/path/to/document.pdf") as pdf: + text = "" + for page in pdf.pages: + page_text = page.extract_text() + if page_text: + text += page_text + "\n\n" + +with open("/tmp/pdf_extracted.txt", "w") as f: + f.write(text) +``` + +**Using pymupdf (Python):** +```python +import fitz + +doc = fitz.open("/path/to/document.pdf") +text = "" +for page in doc: + text += page.get_text() + "\n\n" +doc.close() + +with open("/tmp/pdf_extracted.txt", "w") as f: + f.write(text) +``` + +### Step 7: Read Extracted Text + +Read the extracted text file and present it to the user: + +```bash +cat /tmp/pdf_extracted.txt +``` + +### Step 8: Continue Original Task + +After extracting and presenting the PDF content, proceed with the user's original request using the extracted text as context. + +## Platform-Specific Notes + +### macOS + +- poppler-utils can be installed via Homebrew: `brew install poppler` +- Python libraries work with system Python3 or pyenv + +### Linux + +- Most distributions have poppler-utils in their package managers +- pdfplumber/pymupdf require pip installation + +### Windows + +- poppler binaries available from official poppler releases or via winget/chocolatey +- Python libraries recommended for Windows: `pip install pdfplumber` + +## Common Errors and Solutions + +| Error | Cause | Solution | +|-------|-------|----------| +| `pdftotext: command not found` | poppler-utils not installed | Install via package manager or use Python alternative | +| `Permission denied` | Output directory not writable | Use `/tmp/` for output | +| `File not found` | Wrong PDF path | Verify path with `ls -la` | +| `PDF extraction failed` | Encrypted/protected PDF | Inform user and suggest manual extraction | +| `pdftotext: syntax error` | Malformed PDF | Try with `-raw` flag instead of `-layout` | + +## Alternative Flags for pdftotext + +```bash +# Basic extraction +pdftotext input.pdf output.txt + +# Preserve layout (default) +pdftotext -layout input.pdf output.txt + +# Simple extraction (no layout) +pdftotext -raw input.pdf output.txt + +# Extract specific pages +pdftotext -f 1 -l 5 input.pdf output.txt + +# Extract to stdout +pdftotext - # Reads from stdin +``` + +## File Size Limits + +- For PDFs larger than 50MB, extract page ranges instead of entire document +- Use `-f` and `-l` flags to process in chunks if needed