Skip to content

Skill Evolution

Varun Pratap Bhardwaj edited this page Apr 14, 2026 · 2 revisions

Skill Evolution

Track, analyze, and evolve your AI agent skills automatically.
v3.4.10 (tracking) + v3.4.11 (evolution engine) — Website | Dashboard


Overview

Skill Evolution turns SuperLocalMemory from a passive memory system into an active learning engine. It tracks how your skills perform and helps them improve over time.

The problem: AI agent skills are static. A skill installed today runs the same way six months from now — even if it failed 50 times, even if a better approach was discovered.

The solution: SLM observes every skill invocation, builds execution traces, computes performance metrics, and surfaces insights so skills can evolve based on real data.


How It Works

Your session
  │
  │ SLM hook captures every tool call
  │ (enriched: input, output, session, project, secret-scrubbed)
  ▼
tool_events table (rich execution data)
  │
  │ SkillPerformanceMiner runs during consolidation (Step 10)
  ▼
Per-skill metrics + behavioral assertions + skill entities
  │
  │ Next session's soft prompts include skill routing
  ▼
Smarter skill selection, session by session

Data Sources

Source What It Captures Cost
SLM Hook (primary) Every tool call: input/output (500 chars), session ID, project path. Secret scrubbing built-in. Zero
ECC Integration (optional) Rich observations via slm ingest --source ecc Zero
Consolidation Pipeline Mines tool_events for patterns, creates assertions Zero

Per-Skill Metrics

Metric Description
Invocation count Total uses across sessions
Effective score Approximate success rate from execution trace analysis
Session count Sessions that used this skill
Skill correlations Skills frequently used together

Outcome Heuristic

SLM uses conservative, approximate signals to determine if a skill invocation was effective:

Signal Type Meaning
Productive tools follow (Edit, Write, successful Bash) Positive Skill likely helped
Same skill re-invoked within 5 minutes Negative Likely retry = failure
Bash errors in next 3 tool events Negative Something went wrong
Session continues 10+ events Weak positive User stayed engaged

These are labeled approximate everywhere. They inform soft prompt routing but don't trigger automatic changes without review.


Dashboard

The dedicated Skill Evolution tab shows:

  • Overview cards — Total skill events, unique skills, performance assertions, skill correlations
  • Evolution Engine — Status, backend detection, enable/disable, manual run trigger
  • Skill Lineage DAG — Visual graph of skill evolution history (parent → child)
  • Lineage table — Click any row to highlight in the DAG
  • Skill performance cards — Per-skill effective score, invocation count, confidence
  • Skill correlations — Which skills work well together

Access: http://localhost:8765 → Skill Evolution tab in sidebar.


IDE Compatibility

IDE Status How
Claude Code Supported Hook auto-registered via slm init
Any IDE API available POST to /api/v3/tool-event
Cursor Planned Adapter in development
Windsurf Planned Adapter in development
VS Code / JetBrains Planned Extension adapter

The backend (API, miner, database, dashboard) is fully IDE-agnostic. The shipped hook is optimized for Claude Code.

API Endpoint

POST http://localhost:8765/api/v3/tool-event
Content-Type: application/json

{
  "tool_name": "Skill",
  "event_type": "complete",
  "input_summary": "{\"skill\": \"my-skill-name\"}",
  "output_summary": "{\"success\": true}",
  "session_id": "your-session-id",
  "project_path": "/path/to/project"
}

All fields except tool_name are optional for backward compatibility.


ECC Integration

Everything Claude Code (ECC) provides continuous learning and deep observation for Claude Code sessions. SLM integrates directly:

slm ingest --source ecc           # Import ECC observations
slm ingest --source ecc --dry-run # Preview without writing

Reads from ~/.claude/homunculus/projects/*/observations.jsonl and preserves full input/output data.

ECC is optional. SLM is fully self-sufficient — its own hook captures all needed data.


Configuration

Skill Tracking (always on)

Skill tracking is enabled by default when the SLM hook is registered. Zero-cost, zero-LLM.

slm status          # Check hook registration
slm consolidate --cognitive  # Trigger manual consolidation (includes skill mining)

Evolution Engine (opt-in)

The Evolution Engine generates improved skill versions using LLM calls. Off by default.

slm config set evolution.enabled true   # Enable
slm config set evolution.backend auto   # Auto-detect LLM backend
slm setup                               # Interactive wizard includes evolution opt-in
Setting Default Description
evolution.enabled false Master switch
evolution.backend auto auto, claude, ollama, anthropic, openai
evolution.max_evolutions_per_cycle 3 Budget cap per cycle

MCP Tools

Tool Description
evolve_skill Manually trigger evolution for a skill
skill_health Get health metrics for skills
skill_lineage Get evolution lineage tree

Thresholds

Parameter Default Description
MIN_INVOCATIONS 5 Minimum uses before creating assertions
MIN_CONFIDENCE 0.5 Minimum confidence for soft prompt injection
TRACE_WINDOW 10 Tool events analyzed after each Skill call
RETRY_WINDOW 300s Re-invocation within this window = potential retry

Entity Explorer Integration

Each tracked skill becomes a browsable entity of type skill in the Entity Explorer:

  • Lightning icon + purple accent border
  • Knowledge summary with performance facts
  • Recompile button to refresh compiled truth
  • Search/filter by type

Research Foundations

Paper Key Finding
EvoSkills (HKUDS, 2026) Co-evolutionary verification: +30pp from information isolation
OpenSpace (HKUDS, MIT) 3-trigger evolution system, anti-loop guards, version DAG
SkillsBench (2026) Self-generated skills = zero benefit without verification
SoK: Agent Skills (2026) Skills and MCP are orthogonal layers

Roadmap

  • IDE Adapters — Cursor, Windsurf, VS Code Copilot, JetBrains skill tracking support
  • Lineage visualization — Richer DAG with performance history overlay

Links

Clone this wiki locally