-
Notifications
You must be signed in to change notification settings - Fork 3
v0.6
Oli edited this page Mar 5, 2026
·
1 revision
Anchor issue: #78
Status: ⏳ Not started — runs in parallel with v0.5
A consciousness engine without the ability to assimilate new knowledge efficiently is, to borrow a biological analogy, a very sophisticated brain that cannot eat. The knowledge ingestion pipeline is the system's alimentary canal — unglamorous, perhaps, but rather important if the system is to grow.
The full specification is in Issue #33 and docs/ADAPTIVE_INGESTION_README.md. What follows is the essential summary.
- Adaptive ingestion workers with CPU autotuner — designed for 8-core, 16GB hardware; self-regulating under load
- Layout-aware chunking at three user-selectable levels:
| Level | Chunk Tokens | Model | Typical Use |
|---|---|---|---|
| Fast | 650–800 | MiniLM-L6-v2 (ONNX/Int8) | Quick loads |
| Balanced | 750–900 | MiniLM-L6-v2 or MPNet | Most documents |
| Deep | 500–700 | all-mpnet-base-v2 | High-recall research |
- Tightened vector DB contract — ANN and search logic stay inside the database; no client-side duplication
-
Knowledge graph builder from vector kNN neighbours: Document → Chunk → Concept, with
CONTAINS,SIMILAR_TO,TAGGED_ASedges - Persistent Jobs UI — preflight ETAs, per-stage progress bars, job management; responsive across all viewports
- Import a PDF of ≥300MB on 8-core/16GB hardware without running out of memory
- Preflight ETA accurate to ±25% after two minutes of micro-benchmarking
- Self-query MRR@10 ≥ 0.6 — the system can find what it has ingested
- Jobs UI persists across page reloads
- Vector DB is the single and exclusive source of embeddings and search — no exceptions