Knowledge graph visualization and AI-powered analysis for Datacore.
- Workspace UI: Browser-based editor with file tree, Claude Code terminal, and knowledge graph
- Graph Visualization: Interactive D3.js force-directed graph with node labels
- Temporal Pulses: Snapshot graph state over time
- Multi-Space: Configure which Datacore spaces to include
- Metrics: Degree centrality, PageRank, Louvain clustering
- AI Extensions: Semantic search, link suggestions, gap detection, Q&A
cd ~/Data/1-datafund/2-projects/datacortex
python3 -m venv .venv
source .venv/bin/activate
pip install -e .# 1. Compute embeddings (first time, ~5-10 min)
datacortex embed
# 2. Start server and open workspace
datacortex serve
# Open http://localhost:8765/workspace.html
# 3. Get AI-powered suggestions
datacortex digest # Link suggestions
datacortex gaps # Knowledge gaps
datacortex insights # Cluster analysis
datacortex search "query" # Question answering
datacortex opportunities # Low-hanging fruit for researchThe workspace provides a browser-based interface for working with your knowledge base alongside Claude Code.
┌──────────────────────────────────────────────────────────────────────────┐
│ [Search: Ctrl+P] [Links ▼] [Graph] │
├───────────────┬──────────────────────────────────┬───────────────────────┤
│ File Tree │ Markdown Editor (CodeMirror 6) │ Knowledge Graph (D3) │
│ ├── 0-personal│ [Ask Claude] [Save] [Discard] │ - Click node to open │
│ └── 1-datafund│ │ - Synced with editor │
├───────────────┴──────────────────────────────────┴───────────────────────┤
│ Claude Code Terminal (xterm.js) [Clear] [Reconnect] │
└──────────────────────────────────────────────────────────────────────────┘
Features:
- File Tree: Browse and open files from your Datacore spaces
- Markdown Editor: CodeMirror 6 with syntax highlighting, save/discard
- Claude Code Terminal: WebSocket PTY bridge to Claude Code (new session per connection)
- Knowledge Graph: D3.js force graph with labels, docked on right panel (toggle with Graph button)
- Synced Views: Click a file → highlights in tree + zooms in graph; click graph node → opens file
- Search: Ctrl+P to search by filename or content
- Links: Dropdown showing outgoing wiki-links and backlinks
Access:
datacortex serve --port 8765
open http://localhost:8765/workspace.htmlOr use /datacortex workspace from Claude Code.
# Graph generation
datacortex generate --spaces personal,datafund
datacortex stats
# Pulse snapshots
datacortex pulse generate
datacortex pulse list
datacortex pulse diff 2025-01-01 2025-01-15
# AI Extensions
datacortex embed [--space NAME] [--force]
datacortex digest [--threshold 0.8] [--top-n 20]
datacortex gaps [--min-score 0.3]
datacortex insights [--cluster N] [--top 5]
datacortex search "query" [--top 10] [--no-expand]
datacortex opportunities [--top 15]
# Server
datacortex serve [--port 8765]Use from Claude Code for AI-synthesized insights. These commands run the CLI tools and have Claude synthesize natural language recommendations from the results.
| Command | Model | Purpose |
|---|---|---|
/datacortex |
- | Launch visualization server and open browser |
/datacortex-digest |
haiku | Link suggestions based on semantic similarity |
/datacortex-gaps |
haiku | Bridge suggestions between knowledge clusters |
/datacortex-insights |
sonnet | Deep cluster analysis with themes and patterns |
/datacortex-ask [question] |
haiku | Answer questions from your knowledge base (RAG) |
/datacortex-opportunities |
haiku | Find low-hanging fruit for research |
Model assignments: Commands use ## Model hints to tell Claude Code which model to use. Haiku is fast/cheap for suggestions; Sonnet provides deeper analysis for insights.
┌─────────────────────────────────────────────────────────────┐
│ DATACORTEX SERVER │
│ - Embeddings (sentence-transformers, cached in SQLite) │
│ - Vector similarity (cosine, matrix computation) │
│ - Graph metrics (NetworkX, Louvain clustering) │
│ - Compact output (TSV/markdown, ~60% smaller than JSON) │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ CLAUDE CODE SESSION │
│ - Natural language synthesis │
│ - Link suggestions with reasoning │
│ - Bridge recommendations │
│ - Question answering with citations │
└─────────────────────────────────────────────────────────────┘
datacortex/
├── src/datacortex/
│ ├── core/ # Models, config, database
│ ├── indexer/ # Graph building from zettel_db
│ ├── metrics/ # Centrality, clustering
│ ├── pulse/ # Temporal snapshots
│ ├── ai/ # Embeddings, similarity, cache
│ ├── digest/ # Daily link suggestions
│ ├── gaps/ # Knowledge gap detection
│ ├── insights/ # Cluster analysis
│ ├── qa/ # Question answering (RAG)
│ ├── api/ # FastAPI backend
│ │ └── routes/ # API endpoints (graph, files, terminal)
│ └── cli/ # Click commands
├── frontend/
│ ├── index.html # Graph visualization
│ └── workspace.html # Workspace UI (editor, terminal, graph)
├── config/ # YAML configuration
└── docs/ # Documentation
Datacortex includes 5 AI-powered features that work together. The server computes embeddings, similarity, and metrics; Claude Code synthesizes natural language insights from the results.
Compute semantic embeddings for all documents using local sentence-transformers (no API keys needed).
- Model:
sentence-transformers/all-mpnet-base-v2(768 dimensions, high quality) - Content: Title + first 500 characters (balanced quality/speed)
- Cache: SQLite with content hash invalidation (only recomputes changed docs)
- Speed: ~25 docs/sec on M1 Mac
datacortex embed # Incremental (only new/changed)
datacortex embed --force # Recompute all
datacortex embed --space personal # Single spaceFind documents that should be linked based on semantic similarity but aren't yet connected.
- Similar pairs: Documents with cosine similarity > 0.75 that have no existing link
- Scoring:
similarity * 0.5 + recency * 0.3 + centrality * 0.2 - Orphans: Documents with no incoming links (candidates for integration)
- Output: Compact TSV format for Claude Code to synthesize recommendations
datacortex digest --threshold 0.8 --top-n 20Detect sparse areas between knowledge clusters that need bridge notes.
- Cluster centroids: Mean embedding of all documents in each Louvain cluster
- Gap score:
semantic_similarity - link_density(high similarity but few links = gap) - Boundary nodes: Documents that link to both clusters (potential bridges)
- Bridge suggestions: Topics that could connect the clusters
datacortex gaps --min-score 0.3Analyze knowledge clusters to identify themes, hubs, and patterns.
- Cluster stats: Size, density, average centrality
- Hub documents: Top 10 by PageRank centrality (most connected/influential)
- Tag frequency: Top 10 tags revealing cluster themes
- Content samples: Excerpts from top docs for context
datacortex insights --cluster 3 # Single cluster detail
datacortex insights --top 5 # Top 5 clusters by sizeRAG (Retrieval-Augmented Generation) pipeline for "What do I know about X?" queries.
- Pipeline: Embed query → vector search top 10 → graph expansion (1-hop neighbors) → re-rank
- Re-ranking:
vec_score * 0.6 + recency * 0.2 + centrality * 0.2 - Direct match boost: 1.2x for original vector search hits
- Full content: Complete document text included for Claude to synthesize answers
datacortex search "data tokenization" --top 10
datacortex search "Data pilot" --no-expand # Skip graph expansionFind "low-hanging fruit" - stubs to fill, orphans to integrate, underlinked content to connect.
- High-value stubs: Stub notes with many references but no content (concepts your KB expects but hasn't defined)
- Integration candidates: Orphan documents with real content (100+ words) but no links
- Underlinked content: Substantial documents (300+ words) with only 1-2 connections
- Stub-heavy clusters: Topic areas where most notes are stubs (need research)
datacortex opportunities # Find research opportunities
datacortex opportunities --top 20 # More results per category
datacortex opportunities --space datafund # Single spaceExample output:
## HIGH_VALUE_STUBS
Fair Data Economy | 16 refs | 0.402 | stub, needs-content
Bootstrap Liquidity Fund | 11 refs | 0.237 | stub, needs-content
## INTEGRATION_CANDIDATES
The fund in Datafund | 9166w | unknown | research/The fund in Datafund.md
SemantiCord - Technical Overview | 3700w | unknown | research/SemantiCord.md
## UNDERLINKED_CONTENT
ChainLink | 3507w | 1 links | page
Investment Thesis | 3337w | 2 links | page
## STUB_HEAVY_CLUSTERS
Cluster 89 | 129 nodes | 127 stubs | 98% | Datahaven; Roam Network; Swarmy.cloud
Cluster 1 | 41 nodes | 30 stubs | 73% | Triple-Sided Marketplace; AI Agents
Use with /datacortex-opportunities in Claude Code - it presents the list and offers to research/fill selected items.
Create config/datacortex.local.yaml to override defaults:
spaces:
- personal
- datafund
server:
port: 8765
graph:
include_stubs: false
compute_clusters: true
ai:
embedding_model: sentence-transformers/all-mpnet-base-v2
content_length: 500
qa_model: claude-3-haiku-20240307pip install -e ".[dev]"
pytest- v0.2.0 - Workspace UI with file browser and terminal integration
- v0.1.0 - Initial release: Graph visualization, embeddings, digest, gaps, insights, Q&A, opportunities, multi-hop search
MIT