MCP server for editing text in existing PDFs through content-stream surgery. Targets fidelity preservation (original font, exact position, in-place operators) and reports — honestly — when fidelity has to break.
Most PDF editors use a redact-and-replace approach — they white out the original text and stamp new text on top, usually with a substitute font. The result looks different from the original.
pdf-edit-mcp takes a different approach. It modifies the original PDF content stream operators directly, preserving the exact font, size, color, and position of the text being edited — when the embedded font already contains the glyphs you need.
| Traditional approach | pdf-edit-mcp | |
|---|---|---|
| Method | Redact old text, stamp new text | Modify content stream operators in place |
| Font | Substituted (often Helvetica) | Original font when possible; metric-equivalent fallback (e.g. Carlito for Calibri) when not |
| Position | Re-calculated | Exact original coordinates |
| Quality feedback | None | FidelityReport on every edit (font_substituted, glyphs_missing, overflow_detected, warnings) |
Powered by pdf-edit-engine — a Python library for PDF content stream surgery with two-tier font subset extension.
This matters more than the headline claim. The engine has three fidelity tiers, and every edit's FidelityReport tells you which one fired:
- Tier 1 — exact (
font_preserved=true,font_substituted=null): the embedded font already had every glyph the replacement needs. Output is byte-identical at the operator layer. - Tier 1.5 — in-place injection (
font_preserved=true, glyph appended to embedded font): glyph wasn't in the embedded font but was in your system font with the sameunitsPerEm. The original CIDs are preserved; only new glyphs are appended at fresh GIDs. Visual: indistinguishable from Tier 1. - Metric-equivalent fallback (
font_preserved=false,font_substituted="Carlito-Regular"or similar): the original font isn't installed system-wide, so an open-source font with matching metrics substitutes for the new glyphs. Visual: very close but not pixel-perfect; spacing is right because metrics match.
What straight-up fails (the engine raises, the MCP returns a structured error):
- The font is CFF / Type 1 / Type 3 (
FontNotFoundError— TrueType only for Tier 1.5 today). - The
unitsPerEmof the system font differs from the embedded font (rescaling out of scope). - The replacement is wider than the available bbox AND there's no room to reflow downward (
OverflowErrorsurfaced viaEditResult.warnings). - Multi-codepoint emoji or scripts the system fonts don't carry.
If you need fidelity guarantees for a specific PDF, run pdf_analyze_subset first to see what tier you'll land in.
- 38 tools across 7 categories (reading, text editing, block ops, section ops, annotations, document manipulation, metadata & security)
- 3 built-in MCP prompts that guide the editing workflow step by step
- Fidelity reporting on every edit:
font_preserved,font_substituted,overflow_detected,reflow_applied,glyphs_missing, plus awarningslist (auto-includes overflow notices) dry_runpreview onpdf_replace_text,pdf_replace_single,pdf_batch_replace— return the FidelityReport without writing the output PDF, so you can verify font/glyph coverage before committing- Per-page filtering on
pdf_find_text,pdf_get_text,pdf_get_fonts— restrict reads to a single 0-indexed page on multi-page PDFs - Layout overrides on
pdf_replace_blockandpdf_batch_replace_block— explicitline_heightandsection_gapfor uniform spacing across sibling sections - Batch operations — up to 500 find-and-replace edits per call, up to 50 block replacements per page, with auto-verification on the output
- Section intelligence — detects document structure by font hierarchy, swaps sections by fuzzy title match (raises on ambiguous match rather than silently picking)
- Atomic write — section-swap operations write to a temp file and rename only on full success; failures leave your output path untouched
- Engine-version pin enforced at startup — bridge hard-fails if
pdf-edit-engine < 0.1.2is installed, so missing fidelity fields can't masquerade asnull - Structured error codes — engine errors map to specific JSON-RPC codes (
-32001stale match,-32002encoding,-32003reflow,-32004font-not-found) with embedded recovery hints - Runs entirely local — no external APIs, no network calls, no API keys
- Node.js 20+
- Python 3.12+
- pdf-edit-engine ≥ 0.1.2:
pip install "pdf-edit-engine>=0.1.2"
Add to your claude_desktop_config.json:
{
"mcpServers": {
"pdf-edit-mcp": {
"command": "npx",
"args": ["-y", "@aryanbv/pdf-edit-mcp"]
}
}
}claude mcp add pdf-edit-mcp -- npx -y @aryanbv/pdf-edit-mcpnpx -y @aryanbv/pdf-edit-mcpIf python isn't in your PATH or you need a specific version:
{
"mcpServers": {
"pdf-edit-mcp": {
"command": "npx",
"args": ["-y", "@aryanbv/pdf-edit-mcp"],
"env": {
"PDF_EDIT_PYTHON": "/path/to/python3.12"
}
}
}
}| Tool | Description |
|---|---|
pdf_inspect |
Complete document overview — text, fonts, paragraphs, annotations in one call. Start here before editing. |
pdf_get_text |
Extract all text from a PDF |
pdf_find_text |
Find all occurrences of a string with page numbers and bounding box positions |
pdf_get_fonts |
List fonts with encoding type, glyph count, PostScript name, subset status |
pdf_get_text_layout |
Get every text block with exact position, font, and size |
pdf_extract_bbox_text |
Extract text from a bounding box region with gap-aware joining |
pdf_detect_paragraphs |
Detect paragraph boundaries with bounding boxes on a page |
pdf_detect_sections |
Analyze document structure — section tree with titles, bounding boxes, and text |
pdf_analyze_subset |
Check if an embedded font can render specific characters before editing |
| Tool | Description |
|---|---|
pdf_replace_text |
Replace all occurrences of a string (names, dates, typos, labels) |
pdf_replace_single |
Replace one specific occurrence by match index |
pdf_batch_replace |
Multiple find-and-replace edits in one atomic operation (up to 500 edits) |
| Tool | Description |
|---|---|
pdf_replace_block |
Replace all content within a bounding box with new text |
pdf_batch_replace_block |
Replace content in multiple bounding boxes atomically with cumulative shift tracking |
pdf_insert_text_block |
Insert text at a position, shift existing content down to make room |
pdf_delete_block |
Delete content in a bounding box, optionally close the gap |
| Tool | Description |
|---|---|
pdf_swap_sections |
Swap two sections by fuzzy title match — re-renders all siblings for uniform spacing |
pdf_replace_section |
Replace a section's entire content by fuzzy title match |
| Tool | Description |
|---|---|
pdf_get_annotations |
List all annotations with positions, types, and URLs |
pdf_add_annotation |
Add a link annotation at a position on a page |
pdf_update_annotation |
Update a link annotation's target URL |
pdf_delete_annotation_v2 |
Delete an annotation by page and index |
pdf_move_annotation |
Move an annotation to a new position |
pdf_add_hyperlink |
Add a clickable hyperlink to a page region |
pdf_add_highlight |
Add a highlight annotation with QuadPoints |
pdf_flatten_annotations |
Flatten all annotations into page content (non-editable) |
| Tool | Description |
|---|---|
pdf_merge |
Merge multiple PDFs into one document |
pdf_split |
Split a PDF into individual page files |
pdf_reorder_pages |
Reorder pages by 0-indexed page number array |
pdf_rotate_pages |
Rotate pages by 90, 180, or 270 degrees |
pdf_delete_pages |
Delete specific pages (0-indexed) |
pdf_crop_pages |
Crop all pages to a bounding box |
pdf_add_watermark |
Overlay a watermark PDF on all pages |
| Tool | Description |
|---|---|
pdf_edit_metadata |
Edit title, author, subject, creator, producer |
pdf_add_bookmark |
Add a navigation bookmark pointing to a page |
pdf_encrypt |
Encrypt with owner and user passwords |
pdf_decrypt |
Decrypt a password-protected PDF |
pdf_fill_form |
Fill form fields by name-value pairs |
Three built-in MCP prompts guide the editing process.
For structural changes — section swaps, rewrites, multi-field updates:
- Inspect — Call
pdf_inspectto get the full document overview - Understand structure — Use
pdf_detect_sectionsfor section tree,pdf_find_textfor simple text matches, orpdf_get_text_layoutfor raw block positions - Pre-check — Call
pdf_analyze_subsetif replacement text has unusual characters (bullets, em-dashes, non-Latin scripts) - Execute — Use
pdf_batch_replacefor text changes,pdf_swap_sectionsorpdf_replace_sectionfor structural changes, thenpdf_update_annotationif link URLs changed - Verify — Call
pdf_get_texton the output. Check for duplicates, missing content, and spurious spaces
For swapping two sections by name:
- Call
pdf_detect_sectionsto get the section tree - Identify both sections by title match
- Call
pdf_batch_replace_blockwith all sibling sections (not just the two being swapped) — unchanged siblings get their original text for uniform spacing - Verify with
pdf_get_text
For simple text changes — typos, dates, names:
- Call
pdf_find_textto locate the text - Call
pdf_replace_textorpdf_replace_single - Check
font_preservedin the fidelity report
AI Agent (Claude, GPT, etc.)
↓ MCP protocol (stdio)
index.ts — TypeScript MCP server
↓ JSON-RPC 2.0 over stdin/stdout
bridge.py — long-running Python subprocess
↓ direct import
pdf-edit-engine — Python library (pikepdf + fonttools + pdfminer)
- The TypeScript server spawns
bridge.pyonce at startup and keeps it alive for all tool calls, avoiding Python startup overhead on every request. - All inputs are validated by Zod schemas before reaching the Python layer.
stdoutis the IPC channel — all logging goes tostderr.
| Generator | Encoding | Character agreement | Notes |
|---|---|---|---|
| Chrome (Print to PDF) | Identity-H | 100% | Narrow font subsets exercise Tier 1.5 in-place glyph injection |
| Google Docs export | Identity-H | 100% | |
| Microsoft Word | Identity-H (Calibri) | 100% with Carlito metric-equivalent installed | font_substituted set when fallback fires |
| reportlab (Python) | WinAnsi | 100% | Synthetic test fixture |
What v0.1.1 does not support:
- Cross-page reflow — text expanding past a page boundary is not redistributed; you'll see an
overflow_detected: trueand a warning - CFF / Type 1 / Type 3 fonts — Tier 1.5 in-place glyph injection is TrueType only (
FontFile2↔glyftable). Edits that need new glyphs in a CFF font returnFontNotFoundErrorwith code-32004 unitsPerEmmismatch — if the embedded font and your installed system font use differentunitsPerEm, glyph rescaling is out of scope; the engine raises rather than ship distorted output- Image editing or generation — text-only
- Table structure detection — text and bbox extraction work, but no table semantics
- Encodings beyond Identity-H and WinAnsi —
MacRomanand custom/Differencesare decoded for reading but not exercised by the test fixtures - Right-to-left text — bidi reordering is not handled
- Multi-codepoint emoji / complex script glyphs that aren't in your system fonts — recorded as
glyphs_missingin the FidelityReport
JSON-RPC error codes the bridge can return (in addition to standard -32600/-32601/-32602):
| Code | Class | Hint |
|---|---|---|
-32000 |
PDFEditError (generic) |
Inspect the message for context |
-32001 |
OperatorError |
TextMatch is stale — re-run pdf_find_text and retry |
-32002 |
EncodingError |
Run pdf_analyze_subset to see which characters can't encode |
-32003 |
ReflowError |
Replacement may be too wide for the bbox — try shorter text |
-32004 |
FontNotFoundError |
Install the original font system-wide, or accept metric-equivalent fallback |
-32603 |
Internal error | Bug — please report at the issue tracker |
"Python not found" — Set PDF_EDIT_PYTHON to your Python 3.12+ path (see Custom Python path).
"No module named pdf_edit_engine" — Install the engine: pip install pdf-edit-engine
Bridge process crashes on startup — Verify Python >=3.12 (python --version) and check stderr for import errors.
Characters not rendering after replacement — Call pdf_analyze_subset before editing to check if the embedded font supports the new characters.
"Path must be absolute" — All pdf_path and output_path values must be absolute paths ending in .pdf.
git clone https://github.com/AryanBV/pdf-edit-mcp.git
cd pdf-edit-mcp
npm install && npm run buildnpm test # validation + security + integration tests
npm run inspect # launch MCP Inspector for manual testing
npm run audit # security auditIntegration tests require Python 3.12+, pdf-edit-engine, and reportlab (pip install pdf-edit-engine reportlab).
CI runs in two stages: unit tests (TypeScript validation and security) → integration tests (Python bridge with generated fixtures).