Mnemonic is a semantic search middleware that transforms raw web data into a vectorized knowledge stream. It leverages local AI to understand intent, recalibrate results based on feedback, and synthesize findings into actionable intelligence.
Dense, multi-panel search workstation with Synthesis Workspace.
- Semantic Memory: Uses LanceDB and Sentence-Transformers (
all-MiniLM-L6-v2) to store and retrieve search results based on 384-dimensional query embeddings. - Synthesis Workspace: Pin search results to a side canvas and use a local LLM (via Ollama) to generate summarized insights and drafts.
- Bento Box UI: A gapless, length-driven grid system that scales card sizes based on content volume, built with Tailwind CSS and HTMX.
- Vector Recalibration: A self-correcting feedback loop. Rejecting a result applies a negative penalty to the query vector, shifting the search focus away from irrelevant clusters.
- Live Telemetry: A terminal-style console providing real-time system logs and engine metrics via Server-Sent Events (SSE).
- Admin Dashboard: Secure management portal (
/admin) to monitor cache performance, view stored queries, and perform factory resets.
- Aggregator: Parallel multi-engine fetching (currently via DuckDuckGo, extensible to Google/Bing).
- Refinement: URL normalization, de-duplication, and semantic re-ranking using BM25 + Cosine Similarity.
- Memory: Vector database with configurable TTL, distance thresholds, and rejection-based conflict resolution.
- Synthesis: Local LLM integration (Llama 3/Mistral) for zero-latency context summarization.
- Python 3.10+
- Ollama (Optional, for synthesis features)
pip install -r requirements.txt
Mnemonic is highly configurable. You can use either a .env file or a config.json file in the root directory.
cp .env.example .env
# OR
cp config.json.example config.jsonKey Configs:
MNEMONIC_ADMIN_TOKEN: Secure token for the admin dashboard.OLLAMA_MODEL: The model name to use for synthesis (e.g.,llama3).MAX_RESULTS_PER_ENGINE: Number of nodes to pull per search pass.CACHE_TTL_DAYS: How long results remain in semantic memory.
Mnemonic is fully containerized. To build and start the environment:
docker compose up -dVisit http://localhost:8000 to start querying.
Note on Ollama & Docker: If you are running Ollama on your host machine, the default OLLAMA_BASE_URL in docker-compose.yml is set to http://host.docker.internal:11434. This ensures the container can reach your local AI models for synthesis.
# Start the FastAPI server
export PYTHONPATH=$PYTHONPATH:.
python3 src/mnemonic/api/main.pyVisit http://localhost:8000 to start searching.
- Admin Access: Protected by token-based
HttpOnlycookie authentication. - Privacy First: Mnemonic acts as a pass-through processor; no external AI APIs are used. All synthesis happens locally on your hardware.
The following ideas represent a steady evolution of the core search and synthesis experience.
- Engine Expansion: Integration of additional search providers (Brave Search, Bing, Google API) for higher result diversity.
- Export to Markdown: One-click download of your synthesized findings and pinned references into a clean document.
- Advanced Filtering: UI controls to filter results by domain, date, or content category (e.g., Code, News, Discussion).
- Custom Synthesis Styles: Choice between different summarization modes (e.g., Deep Research, Quick Summary, Bullet Points).
- Admin Analytics: Improved dashboard to visualize semantic memory trends and vector cluster patterns.
- UI/UX Polish: Continued aesthetic refinements, smoother transitions, and deeper mobile optimization.
MIT License - See LICENSE for details.
