Skip to content

datafund/datacortex

Repository files navigation

Datacortex

Knowledge graph visualization and AI-powered analysis for Datacore.

Features

  • Workspace UI: Browser-based editor with file tree, Claude Code terminal, and knowledge graph
  • Graph Visualization: Interactive D3.js force-directed graph with node labels
  • Temporal Pulses: Snapshot graph state over time
  • Multi-Space: Configure which Datacore spaces to include
  • Metrics: Degree centrality, PageRank, Louvain clustering
  • AI Extensions: Semantic search, link suggestions, gap detection, Q&A

Installation

cd ~/Data/1-datafund/2-projects/datacortex
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

Quick Start

# 1. Compute embeddings (first time, ~5-10 min)
datacortex embed

# 2. Start server and open workspace
datacortex serve
# Open http://localhost:8765/workspace.html

# 3. Get AI-powered suggestions
datacortex digest              # Link suggestions
datacortex gaps                # Knowledge gaps
datacortex insights            # Cluster analysis
datacortex search "query"      # Question answering
datacortex opportunities       # Low-hanging fruit for research

Workspace UI

The workspace provides a browser-based interface for working with your knowledge base alongside Claude Code.

┌──────────────────────────────────────────────────────────────────────────┐
│ [Search: Ctrl+P]                                  [Links ▼] [Graph]      │
├───────────────┬──────────────────────────────────┬───────────────────────┤
│ File Tree     │  Markdown Editor (CodeMirror 6)  │ Knowledge Graph (D3)  │
│ ├── 0-personal│  [Ask Claude] [Save] [Discard]   │ - Click node to open  │
│ └── 1-datafund│                                  │ - Synced with editor  │
├───────────────┴──────────────────────────────────┴───────────────────────┤
│ Claude Code Terminal (xterm.js) [Clear] [Reconnect]                      │
└──────────────────────────────────────────────────────────────────────────┘

Features:

  • File Tree: Browse and open files from your Datacore spaces
  • Markdown Editor: CodeMirror 6 with syntax highlighting, save/discard
  • Claude Code Terminal: WebSocket PTY bridge to Claude Code (new session per connection)
  • Knowledge Graph: D3.js force graph with labels, docked on right panel (toggle with Graph button)
  • Synced Views: Click a file → highlights in tree + zooms in graph; click graph node → opens file
  • Search: Ctrl+P to search by filename or content
  • Links: Dropdown showing outgoing wiki-links and backlinks

Access:

datacortex serve --port 8765
open http://localhost:8765/workspace.html

Or use /datacortex workspace from Claude Code.

CLI Commands

# Graph generation
datacortex generate --spaces personal,datafund
datacortex stats

# Pulse snapshots
datacortex pulse generate
datacortex pulse list
datacortex pulse diff 2025-01-01 2025-01-15

# AI Extensions
datacortex embed [--space NAME] [--force]
datacortex digest [--threshold 0.8] [--top-n 20]
datacortex gaps [--min-score 0.3]
datacortex insights [--cluster N] [--top 5]
datacortex search "query" [--top 10] [--no-expand]
datacortex opportunities [--top 15]

# Server
datacortex serve [--port 8765]

Datacore Commands

Use from Claude Code for AI-synthesized insights. These commands run the CLI tools and have Claude synthesize natural language recommendations from the results.

Command Model Purpose
/datacortex - Launch visualization server and open browser
/datacortex-digest haiku Link suggestions based on semantic similarity
/datacortex-gaps haiku Bridge suggestions between knowledge clusters
/datacortex-insights sonnet Deep cluster analysis with themes and patterns
/datacortex-ask [question] haiku Answer questions from your knowledge base (RAG)
/datacortex-opportunities haiku Find low-hanging fruit for research

Model assignments: Commands use ## Model hints to tell Claude Code which model to use. Haiku is fast/cheap for suggestions; Sonnet provides deeper analysis for insights.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    DATACORTEX SERVER                         │
│  - Embeddings (sentence-transformers, cached in SQLite)     │
│  - Vector similarity (cosine, matrix computation)           │
│  - Graph metrics (NetworkX, Louvain clustering)             │
│  - Compact output (TSV/markdown, ~60% smaller than JSON)    │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│                    CLAUDE CODE SESSION                       │
│  - Natural language synthesis                                │
│  - Link suggestions with reasoning                          │
│  - Bridge recommendations                                    │
│  - Question answering with citations                        │
└─────────────────────────────────────────────────────────────┘

Project Structure

datacortex/
├── src/datacortex/
│   ├── core/           # Models, config, database
│   ├── indexer/        # Graph building from zettel_db
│   ├── metrics/        # Centrality, clustering
│   ├── pulse/          # Temporal snapshots
│   ├── ai/             # Embeddings, similarity, cache
│   ├── digest/         # Daily link suggestions
│   ├── gaps/           # Knowledge gap detection
│   ├── insights/       # Cluster analysis
│   ├── qa/             # Question answering (RAG)
│   ├── api/            # FastAPI backend
│   │   └── routes/     # API endpoints (graph, files, terminal)
│   └── cli/            # Click commands
├── frontend/
│   ├── index.html      # Graph visualization
│   └── workspace.html  # Workspace UI (editor, terminal, graph)
├── config/             # YAML configuration
└── docs/               # Documentation

AI Extensions

Datacortex includes 5 AI-powered features that work together. The server computes embeddings, similarity, and metrics; Claude Code synthesizes natural language insights from the results.

Phase 1: Embeddings (datacortex embed)

Compute semantic embeddings for all documents using local sentence-transformers (no API keys needed).

  • Model: sentence-transformers/all-mpnet-base-v2 (768 dimensions, high quality)
  • Content: Title + first 500 characters (balanced quality/speed)
  • Cache: SQLite with content hash invalidation (only recomputes changed docs)
  • Speed: ~25 docs/sec on M1 Mac
datacortex embed              # Incremental (only new/changed)
datacortex embed --force      # Recompute all
datacortex embed --space personal  # Single space

Phase 2: Daily Digest (datacortex digest)

Find documents that should be linked based on semantic similarity but aren't yet connected.

  • Similar pairs: Documents with cosine similarity > 0.75 that have no existing link
  • Scoring: similarity * 0.5 + recency * 0.3 + centrality * 0.2
  • Orphans: Documents with no incoming links (candidates for integration)
  • Output: Compact TSV format for Claude Code to synthesize recommendations
datacortex digest --threshold 0.8 --top-n 20

Phase 3: Knowledge Gaps (datacortex gaps)

Detect sparse areas between knowledge clusters that need bridge notes.

  • Cluster centroids: Mean embedding of all documents in each Louvain cluster
  • Gap score: semantic_similarity - link_density (high similarity but few links = gap)
  • Boundary nodes: Documents that link to both clusters (potential bridges)
  • Bridge suggestions: Topics that could connect the clusters
datacortex gaps --min-score 0.3

Phase 4: Insight Extraction (datacortex insights)

Analyze knowledge clusters to identify themes, hubs, and patterns.

  • Cluster stats: Size, density, average centrality
  • Hub documents: Top 10 by PageRank centrality (most connected/influential)
  • Tag frequency: Top 10 tags revealing cluster themes
  • Content samples: Excerpts from top docs for context
datacortex insights --cluster 3    # Single cluster detail
datacortex insights --top 5        # Top 5 clusters by size

Phase 5: Question Answering (datacortex search)

RAG (Retrieval-Augmented Generation) pipeline for "What do I know about X?" queries.

  • Pipeline: Embed query → vector search top 10 → graph expansion (1-hop neighbors) → re-rank
  • Re-ranking: vec_score * 0.6 + recency * 0.2 + centrality * 0.2
  • Direct match boost: 1.2x for original vector search hits
  • Full content: Complete document text included for Claude to synthesize answers
datacortex search "data tokenization" --top 10
datacortex search "Data pilot" --no-expand  # Skip graph expansion

Phase 6: Research Opportunities (datacortex opportunities)

Find "low-hanging fruit" - stubs to fill, orphans to integrate, underlinked content to connect.

  • High-value stubs: Stub notes with many references but no content (concepts your KB expects but hasn't defined)
  • Integration candidates: Orphan documents with real content (100+ words) but no links
  • Underlinked content: Substantial documents (300+ words) with only 1-2 connections
  • Stub-heavy clusters: Topic areas where most notes are stubs (need research)
datacortex opportunities              # Find research opportunities
datacortex opportunities --top 20     # More results per category
datacortex opportunities --space datafund  # Single space

Example output:

## HIGH_VALUE_STUBS
Fair Data Economy | 16 refs | 0.402 | stub, needs-content
Bootstrap Liquidity Fund | 11 refs | 0.237 | stub, needs-content

## INTEGRATION_CANDIDATES
The fund in Datafund | 9166w | unknown | research/The fund in Datafund.md
SemantiCord - Technical Overview | 3700w | unknown | research/SemantiCord.md

## UNDERLINKED_CONTENT
ChainLink | 3507w | 1 links | page
Investment Thesis | 3337w | 2 links | page

## STUB_HEAVY_CLUSTERS
Cluster 89 | 129 nodes | 127 stubs | 98% | Datahaven; Roam Network; Swarmy.cloud
Cluster 1 | 41 nodes | 30 stubs | 73% | Triple-Sided Marketplace; AI Agents

Use with /datacortex-opportunities in Claude Code - it presents the list and offers to research/fill selected items.

Configuration

Create config/datacortex.local.yaml to override defaults:

spaces:
  - personal
  - datafund

server:
  port: 8765

graph:
  include_stubs: false
  compute_clusters: true

ai:
  embedding_model: sentence-transformers/all-mpnet-base-v2
  content_length: 500
  qa_model: claude-3-haiku-20240307

Development

pip install -e ".[dev]"
pytest

Version History

  • v0.2.0 - Workspace UI with file browser and terminal integration
  • v0.1.0 - Initial release: Graph visualization, embeddings, digest, gaps, insights, Q&A, opportunities, multi-hop search

License

MIT

About

Knowledge graph visualization for Datacore

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •