feat: EXP-25 faithfulness probe, encoding fixes, MCP HTTP transport by CalebisGross · Pull Request #389 · AppSprout-dev/mnemonic

CalebisGross · 2026-04-09T20:04:01Z

Summary

This branch bundles the EXP-25 faithfulness probe research with critical infrastructure fixes that came out of it.

EXP-25: Faithfulness Probe

Confirmed Qwen 3.5 2B + spoke architecture can learn faithful encoding on diverse inputs (25 categories, 100% EPR, 100% NP, 0% template echo)
Added eval_faithfulness.py (7-metric evaluation), prepare_faithfulness_data.py, training_constants.py with build_production_prompt() matching daemon format
Implemented chunked_cross_entropy() in training script to handle Qwen's 248K vocab at seq_len 2375
Pre-registered EXP-26 (v7 dataset) and EXP-27 (Qwen 3.5 4B)

Bug Fixes (#382, #383)

bug: amend tool rejects raw_id returned by remember #382: amend tool now accepts raw_id from remember, resolving via GetMemoryByRawID
bug: encoding agent rewrites memory content instead of faithfully encoding it #383: Removed getRelatedContext() from encoding pipeline — FTS5 keyword matching injected unrelated memory summaries into the LLM prompt, causing cross-contamination

MCP HTTP Transport (#384)

Daemon serves MCP over POST /mcp — eliminates per-session subprocess spawning
SessionManager creates/caches MCPServer instances per session ID with 30-minute idle expiry
Claude Code config: {"type": "http", "url": "http://127.0.0.1:9999/mcp"}
Result: N sessions x ~3GB VRAM each → one daemon, one model, ~3GB total

Dashboard Fixes

Timeline "Today" header no longer overlaps first entry
Time formatting fixed (zero-padded minutes)

Test plan

make check passes
go test passes for all changed packages
ROCM=1 make build-embedded compiles
Daemon healthy with LLM loaded
MCP HTTP transport tested end-to-end (initialize, tools/list, tools/call, DELETE)
amend with raw_id verified against live daemon
Single GPU process after HTTP transport switch

Closes #382, closes #383, closes #384

🤖 Generated with Claude Code

Retrained EXP-25 at seq_len 2375 (up from 1280) after identifying that the daemon's llama-server was holding ~3.4GB VRAM during the initial run. All 25 diverse training examples now train untruncated. Results: 25/25 valid JSON (was 3/25), 100% entity preservation, 100% number preservation, 100% schema compliance, zero template echoing, clean adversarial twin discrimination. The architecture can learn faithful encoding on diverse inputs — failures were a data problem. Key changes: - Added chunked_cross_entropy() to train_qwen_spokes.py to handle Qwen's 248K vocab at long sequences (OOMs with standard cross_entropy at seq_len > 2048). Processes 256 positions at a time. - Removed redundant HF internal loss computation (was passing labels to model AND computing loss manually). - New scripts: eval_faithfulness.py (7-metric eval), prepare_faithfulness_data.py, run_exp25.sh, training_constants.py (build_production_prompt). Tracking: #381 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The fabrication rate metric was counting semantic expansion in concepts and structured_concepts as fabrication (e.g., "WAL mode on." -> concepts: ["database"] counted as 100% FR). Now only measures content-bearing fields (gist, summary, content, narrative, outcome) where fabrication is a real concern. FR dropped from 25.8% to 3.0% — all 7 faithfulness metrics now pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Generates raw inputs across 5 categories for v7 faithfulness training: - Production captures (600): extracted from daemon capture files with quality filtering (removes document ingestion, garbage, duplicates) - Out-of-domain (300): 30 non-tech domains via Gemini 3.1 Pro - Adversarial twins (100 pairs): matched decision pairs via Gemini - Minimal inputs (100): 1-10 word script-generated inputs - Dense numbers (100): metric-heavy inputs via Gemini Phase 1 of #381 v7 dataset pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Combined v6 (4,255 encoding) + v7 (1,200 diverse new examples) dataset. V7 categories: production captures, out-of-domain (30 domains), adversarial twins (50 pairs), minimal inputs, dense numbers. Hypothesis: diverse data eliminates faithfulness failures while maintaining 100% schema and 7/7 stress test. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove getRelatedContext() from encoding pipeline — FTS5 keyword matching injected unrelated memory summaries into the LLM prompt, causing cross-contamination (#383). Also removes extractKeywords and joinConcepts (dead code after removal). - amend tool now accepts raw_id in addition to memory_id, resolving via GetMemoryByRawID when memory_id lookup fails (#382). Mirrors the check_memory pattern. - Dashboard: fix sticky "Today" header overlapping first timeline entry (top: 30px → 0, solid background). Fix time formatting producing single-digit minutes (manual zero-padding replaces locale-dependent toLocaleString). - Sync Python training_constants.py with Go buildCompressionPrompt (remove related_ctx parameter). Remove RELATED_MEMORY_STUB from prepare_faithfulness_data.py. Closes #382, closes #383 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: encoding faithfulness, amend raw_id, dashboard timeline

…uning Extract mnemonic's own 1.5-2B model from Gemma 4 31B (30.7B dense, 60 layers) via Sheared-LLaMA-style targeted structural pruning. Phases: full fine-tune baseline → learned pruning masks → continued pretraining → standalone GGUF export. Progressive targets 8B→4B→2B→1.5B to find the quality cliff. Target: >200 tok/s, <1.5GB VRAM, match EXP-26 faithfulness metrics. Hardware: MI300X for pruning, local 7800 XT for deployment. Tracking: #386 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add POST /mcp endpoint to the daemon API, eliminating the need for per-session stdio subprocesses. Claude Code connects via HTTP transport to the already-running daemon, sharing its LLM, store, and agents. - New SessionManager (internal/mcp/session.go) creates and caches MCPServer instances per session ID with 30-minute idle expiry - HTTP handler (internal/api/routes/mcp.go) accepts JSON-RPC requests, generates session IDs on first request (returned via Mcp-Session-Id header), routes subsequent requests to existing sessions - Export JSONRPCRequest/Response types and HandleSingleRequest for the HTTP transport layer - Wire session manager into daemon serve pipeline Claude Code config changes from stdio to HTTP transport: {"type": "http", "url": "http://127.0.0.1:9999/mcp"} Result: N sessions x ~3GB VRAM each → one daemon, one model, ~3GB total. The mcp subcommand remains as fallback for offline/no-daemon usage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: serve MCP over HTTP transport from daemon

Model-agnostic script that measures per-layer contribution to the residual stream via forward-pass hooks. Metrics: residual contribution, cosine drift, composite importance score. Validated on Gemma 4 E2B (35 layers): clear signal — layers 30-32 nearly dead (importance 0.08-0.14), full attention layers avg 0.68 vs sliding 0.57, classic U-shaped importance curve. Supports CPU offload for large models. Next: run on Gemma 4 31B (60 layers) on MI300X. Tracking: #387, #386 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CalebisGross and others added 11 commits April 9, 2026 09:35

Merge pull request #385 from AppSprout-dev/fix/382-383-encoding-amend

671b1b6

fix: encoding faithfulness, amend raw_id, dashboard timeline

Merge pull request #388 from AppSprout-dev/feat/384-mcp-http-transport

65fe6cf

feat: serve MCP over HTTP transport from daemon

CalebisGross merged commit 45a7cd5 into main Apr 9, 2026

CalebisGross deleted the feat/exp25-faithfulness-probe branch April 9, 2026 20:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: EXP-25 faithfulness probe, encoding fixes, MCP HTTP transport#389

feat: EXP-25 faithfulness probe, encoding fixes, MCP HTTP transport#389
CalebisGross merged 11 commits intomainfrom
feat/exp25-faithfulness-probe

CalebisGross commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CalebisGross commented Apr 9, 2026

Summary

EXP-25: Faithfulness Probe

Bug Fixes (#382, #383)

MCP HTTP Transport (#384)

Dashboard Fixes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant