v0.6

v0.6 — Adaptive Knowledge Ingestion Pipeline

Anchor issue: #78
Status: ⏳ Not started — runs in parallel with v0.5

A consciousness engine without the ability to assimilate new knowledge efficiently is, to borrow a biological analogy, a very sophisticated brain that cannot eat. The knowledge ingestion pipeline is the system's alimentary canal — unglamorous, perhaps, but rather important if the system is to grow.

The full specification is in Issue #33 and docs/ADAPTIVE_INGESTION_README.md. What follows is the essential summary.

Core Deliverables

Adaptive ingestion workers with CPU autotuner — designed for 8-core, 16GB hardware; self-regulating under load
Layout-aware chunking at three user-selectable levels:

Level	Chunk Tokens	Model	Typical Use
Fast	650–800	MiniLM-L6-v2 (ONNX/Int8)	Quick loads
Balanced	750–900	MiniLM-L6-v2 or MPNet	Most documents
Deep	500–700	all-mpnet-base-v2	High-recall research

Tightened vector DB contract — ANN and search logic stay inside the database; no client-side duplication
Knowledge graph builder from vector kNN neighbours: Document → Chunk → Concept, with CONTAINS, SIMILAR_TO, TAGGED_AS edges
Persistent Jobs UI — preflight ETAs, per-stage progress bars, job management; responsive across all viewports

Acceptance Criteria

Import a PDF of ≥300MB on 8-core/16GB hardware without running out of memory
Preflight ETA accurate to ±25% after two minutes of micro-benchmarking
Self-query MRR@10 ≥ 0.6 — the system can find what it has ingested
Jobs UI persists across page reloads
Vector DB is the single and exclusive source of embeddings and search — no exceptions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6

v0.6 — Adaptive Knowledge Ingestion Pipeline

Core Deliverables

Acceptance Criteria

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally