GitHub - yifanfeng97/Hyper-Extract: Transform unstructured text into structured knowledge with LLMs. Graphs, hypergraphs, and spatio-temporal extractions — with one command.

Smart Knowledge Extraction CLI

Transform documents into structured knowledge with one command.

"Stop reading. Start understanding."
"告别文档焦虑，让信息一目了然"

Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed Knowledge Abstracts. It effortlessly extracts information into a wide spectrum of formats—ranging from simple Collections (Lists/Sets) and Pydantic Models, to complex Knowledge Graphs, Hypergraphs, and even Spatio-Temporal Graphs.

✨ Core Features

🔷 8 Auto-Types: From basic AutoModel/AutoList to advanced AutoGraph, AutoHypergraph, and AutoSpatioTemporalGraph.
🧠 10+ Extraction Engines: Out-of-the-box support for cutting-edge retrieval paradigms like GraphRAG, LightRAG, Hyper-RAG, and KG-Gen.
📝 Declarative YAML Templates: Zero-code extraction definition. Includes 80+ presets across 6 domains.
🔄 Incremental Evolution: Feed new documents on the fly to continuously map out and expand the extracted knowledge.

⚡ Quick Start

1. Installation

For CLI Users (install he command globally):

uv tool install hyperextract

For Python Developers (use as library):

uv pip install hyperextract

2. The Command Line Way

Extract, search, and manage directly from CLI.

By default, the CLI uses gpt-4o-mini and text-embedding-3-small.

# Configure OpenAI API Key
he config init -k YOUR_OPENAI_API_KEY

# Extract knowledge
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en

# Query the knowledge abstract
he search ./output/ "What are Tesla's major achievements?"

# Visualize the knowledge graph
he show ./output/

# Incrementally supplement knowledge
he feed ./output/ examples/en/tesla_question.md

# Show the updated knowledge graph
he show ./output/

🐍 The Python API Way (click to expand)

Installation

# Clone the repository
git clone https://github.com/yifanfeng97/hyper-extract.git
cd hyper-extract

# Install dependencies
uv sync

Configuration

# Copy the example env file
cp .env.example .env

# Edit .env with your API key and base URL
# OPENAI_API_KEY=your-api-key
# OPENAI_BASE_URL=https://api.openai.com/v1

Usage

import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

from hyperextract import Template

# Create a template
ka = Template.create("general/biography_graph")

# Parse a document
with open("examples/en/tesla.md", "r", encoding="utf-8") as f:
    text = f.read()
result = ka.parse(text)

# Visualize the knowledge graph
ka.show(result)

# Incrementally supplement knowledge
with open("examples/en/tesla_question.md", "r", encoding="utf-8") as f:
    new_text = f.read()
ka.feed(result, new_text)

# Show the updated knowledge graph
ka.show(result)

🔗 For complete examples, see examples/en

Installation Comparison:

Use Case	Command	Purpose
CLI Tool	`uv tool install hyperextract`	Install `he` command globally
Python Library	`uv pip install hyperextract`	Use in Python code

🧩 Deep Dive: The 8 Auto-Types

Our framework embraces complexity without making you write boilerplate code.

Example: AutoGraph Visualization

Here is the knowledge graph visualization after AutoGraph extraction:

🛠️ Architecture Overview

Hyper-Extract follows a three-layer architecture:

Auto-Types define the data structures for knowledge extraction. With 8 strong-typed structures (AutoModel, AutoList, AutoSet, AutoGraph, AutoHypergraph, AutoTemporalGraph, AutoSpatialGraph, AutoSpatioTemporalGraph), they serve as the output format for all extractions.
Methods provide extraction algorithms built on Auto-Types. This includes Typical methods (KG-Gen, iText2KG, iText2KG*) and RAG-based methods (GraphRAG, LightRAG, Hyper-RAG, HypergraphRAG, Cog-RAG).
Templates offer domain-specific configurations with ready-to-use prompts and data structures. Covering 6 domains (Finance, Legal, Medical, TCM, Industry, General) with 80+ preset templates, users can extract knowledge without dealing with Auto-Types or Methods directly.

Use via CLI (he parse, he search, he show...) or Python API (Template.create()).

📚 Related Documentation

Preset Templates: Browse 80+ ready-to-use templates across 6 domains
Design Guide: Learn how to create custom templates

📋 Template Structure Example (Graph Type)

Here's a complete YAML template example for Graph type extraction (entity-relationship extraction):

language: en

name: Knowledge Graph
type: graph
tags: [general]

description: 'Extract entities and their relationships to construct a knowledge graph.'

output:
  entities:
    fields:
    - name: name
      type: str
      description: 'Entity name'
    - name: type
      type: str
      description: 'Entity type: e.g., person, organization, event'
    - name: description
      type: str
      description: 'Entity description'
  relations:
    fields:
    - name: source
      type: str
      description: 'Source entity'
    - name: target
      type: str
      description: 'Target entity'
    - name: type
      type: str
      description: 'Relation type: e.g., invention, collaboration, competition'
    - name: description
      type: str
      description: 'Relation description'

guideline:
  target: 'Extract entities and their relationships from the text.'
  rules_for_entities:
    - 'Extract meaningful entities'
    - 'Maintain consistent naming'
  rules_for_relations:
    - 'Create relations only when explicitly expressed in the text'

identifiers:
  entity_id: name
  relation_id: '{source}|{type}|{target}'
  relation_members:
    source: source
    target: target

display:
  entity_label: '{name} ({type})'
  relation_label: '{type}'

📈 Comparison with Other Libraries

Feature	GraphRAG	LightRAG	KG-Gen	ATOM	Hyper-Extract
Knowledge Graph	✅	✅	✅	✅	✅
Temporal Graph	✅	❌	❌	✅	✅
Spatial Graph	❌	❌	❌	❌	✅
Hypergraph	❌	❌	❌	❌	✅
Domain Templates	❌	❌	❌	❌	✅
CLI Tool	✅	❌	❌	❌	✅
Multi-language	✅	❌	❌	❌	✅

📚 Related Documentation

Documentation - Complete documentation site
CLI Guide - Command-line interface
Template Gallery - Available templates
Example Code - Working examples

🤝 Contributing & License

Contributions are welcome! Please submit Issues and PRs. Licensed under Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
.github		.github
docs		docs
examples		examples
hyperextract-skills		hyperextract-skills
hyperextract		hyperextract
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
docs_hooks.py		docs_hooks.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Core Features

⚡ Quick Start

1. Installation

2. The Command Line Way

Installation

Configuration

Usage

🧩 Deep Dive: The 8 Auto-Types

Example: AutoGraph Visualization

🛠️ Architecture Overview

📚 Related Documentation

📈 Comparison with Other Libraries

📚 Related Documentation

🤝 Contributing & License

⭐ Star History

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ Core Features

⚡ Quick Start

1. Installation

2. The Command Line Way

Installation

Configuration

Usage

🧩 Deep Dive: The 8 Auto-Types

Example: AutoGraph Visualization

🛠️ Architecture Overview

📚 Related Documentation

📈 Comparison with Other Libraries

📚 Related Documentation

🤝 Contributing & License

⭐ Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages