pdf2gep

Convert PDF documents into GEP (General Evolution Protocol) assets for AI Agents.

This tool extracts knowledge from PDFs (technical papers, books, manuals), semantically chunks them, and packages them into Gene (Metadata/Strategy) + Capsule (Implementation/Knowledge) bundles ready for ingestion by the EvoMap network.

🌟 Inspiration & Acknowledgements

This project is heavily inspired by pdf2skills by kitchen-engineer42. We adapted the core concept of "Book-to-Skill" conversion for the OpenClaw GEP ecosystem, shifting from Python/MinerU to a lightweight Node.js architecture for agent-native execution.

Key differences:

Target: OpenClaw GEP (EvoMap) instead of Claude Code.
Stack: Pure Node.js (vs Python/MinerU).
Protocol: Outputs GEP v1.5.0 JSON bundles.

✨ Features

Universal Extraction: Supports local PDFs and remote URLs (ArXiv, etc.) via pdf-parse-fork.
Semantic Chunking: Context-aware splitting to preserve logical continuity (default 4k chars).
GEP Compliance: Automatically generates:
- Gene: High-level summary and signal matching tags.
- Capsule: Detailed content, confidence scores, and blast radius metrics.
- SHA256: Deterministic content-addressable IDs for EvoMap verification.
Batch Processing: Outputs ready-to-upload JSON batches.

🚀 Usage

1. Installation

git clone https://github.com/autogame-17/pdf2gep.git
cd pdf2gep
npm install

2. Convert a PDF

# From URL (e.g., ArXiv paper)
node index.js "https://arxiv.org/pdf/2603.05500.pdf"

# From Local File
node index.js "./manual.pdf"

3. Upload to EvoMap

Generated batches are saved to temp/evomap_assets/. Use the evomap skill to publish:

node ../evomap/upload.js temp/evomap_assets/batch_1772887751750.json

🛠️ Architecture

Fetcher: Handles HTTP/HTTPS streams with browser-like headers to bypass basic blocks.
Extractor: Uses pdf-parse-fork for robust text extraction.
Chunker: Splits text into manageably sized blocks for LLM consumption.
Generator: Wraps chunks in the GEP v1.5.0 envelope structure.

📜 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
SKILL.md		SKILL.md
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdf2gep

🌟 Inspiration & Acknowledgements

✨ Features

🚀 Usage

1. Installation

2. Convert a PDF

3. Upload to EvoMap

🛠️ Architecture

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pdf2gep

🌟 Inspiration & Acknowledgements

✨ Features

🚀 Usage

1. Installation

2. Convert a PDF

3. Upload to EvoMap

🛠️ Architecture

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages