Local Memory MCP Server (Project Memory Bank)

🇬🇧 English Description

🌟 Design Philosophy & Motivation

"Your memory belongs to you, not the cloud."

I built this project with a simple yet powerful goal: Total Data Sovereignty. In an era of subscription-based AI services and cloud dependencies, I wanted a solution that is:

100% Local & Private: No data ever leaves your machine. No API fees, no privacy risks.
Permanent: As long as your hard drive exists, your AI's memory exists. No fear of service shutdowns.
Infinite Capacity: The only limit is your local disk space.
High Performance: Utilizing local GPU acceleration (TensorRT/CUDA) for lightning-fast embedding and retrieval.

This is a Memory Context Protocol (MCP) server that gives your AI (like Gemini CLI, Claude Desktop) a persistent, searchable, and evolving long-term memory.

✨ Key Features

Hybrid Search Architecture: Combines LanceDB (Vector Search for semantic understanding) and SQLite FTS5 (Full-Text Search for exact keyword matching) for high-precision recall.
Hardware Acceleration: Powered by ONNX Runtime with TensorRT/CUDA execution providers for millisecond-level embedding generation.
Standard MCP Tools:
- save_memory: Store snippets, code, docs, or personal facts (with automatic duplicate detection).
- search_memory: Semantic & keyword retrieval.
- list_memories: View recent entries.
- delete_memory: Manage and clean up data.
- update_memory: Update existing memory by ID.
Lazy Loading: Optimized startup time with on-demand resource initialization.
Zero Cost: Runs entirely on your existing hardware.

🛠️ Prerequisites

OS: Windows (tested)
Python: 3.10 or higher.
Hardware: NVIDIA GPU recommended (for TensorRT/CUDA acceleration), but works on CPU.
MCP Client: Gemini CLI or Claude Desktop.Or any IDE that can be configured with MCP.

🚀 Installation & Setup

1. Clone the Repository

git clone https://github.com/YanZiBin/Local-memory-mcp.git
cd Local-memory-mcp

2. Create a Python Environment (Conda Recommended)

To ensure GPU libraries work correctly, Conda is highly recommended.

conda create -n Local-memory-mcp python=3.10
conda activate Local-memory-mcp

3. Install Dependencies

pip install fastmcp lancedb onnxruntime-gpu transformers numpy uvicorn

(Note: If you don't have a GPU, install onnxruntime instead of onnxruntime-gpu)

4. Download the Embedding Model

This project uses BAAI/bge-m3 converted to ONNXhere. You need to download the model files into the bge-m3-onnx directory.

You can use huggingface-cli or manually download these files:

config.json
model.onnx
model.onnx_data
vocab.txt
sentence_transformers.onnx
sentence_transformers.onnx_data
tokenizer_config.json
tokenizer.json

Place them inside a folder named bge-m3-onnx in the project root.

🏃‍♂️ Running the Server

Since this server uses heavy local models, we recommend the Manual Start (SSE Mode) for stability.

Start the Server: Open a terminal and run:
```
python server.py
```
Wait until you see: INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Connect your Client (e.g., Gemini CLI):

Edit your Gemini CLI configuration file (usually at ~/.gemini/settings.json or %USERPROFILE%\.gemini\settings.json on Windows):
```
{
  "mcpServers": {
    "local-memory": {
      "url": "http://localhost:8000/sse",
      "type": "sse"
    }
  }
}
```
Start using it! Open Gemini CLI and try:

"Save this memory: My project uses Python 3.10." "Search my memories for 'project'."

🗺️ Roadmap

We are currently transitioning to Phase 4 (Memory Management).

🇨🇳 中文说明

🌟 设计初衷

“你的记忆属于你，而不是云端。”

开发这个项目的初衷非常纯粹：实现完全的数据主权。在这个万物订阅制、隐私担忧日益严重的时代，我希望构建一个这样的解决方案：

完全本地化 & 隐私安全： 没有任何数据会上传云端。没有 API 调用费，没有隐私泄露风险。
永久存储： 只要你的硬盘还在，你的 AI 记忆就在。不必担心服务商倒闭或“跑路”。
无限容量： 唯一的限制是你本地硬盘的大小（相当于无限）。
极致性能： 利用本地 GPU 加速（TensorRT/CUDA），实现毫秒级的记忆存取。

这是一个 MCP (Model Context Protocol) 服务器，它为你的 AI 工具（如 Gemini CLI, Claude Desktop）提供了一个持久化、可搜索、不断进化的“外脑”。

✨ 核心功能

混合搜索架构： 结合了 LanceDB（向量搜索，理解语义）和 SQLite FTS5（全文搜索，精准匹配关键词），并通过 RRF (倒数排名融合) 算法进行智能排序，确保召回率和准确率。
硬件加速： 基于 ONNX Runtime 和 TensorRT/CUDA，充分释放本地显卡性能。
标准 MCP 工具集：
- save_memory: 保存代码片段、文档总结或个人事实（自动检测重复内容）。
- search_memory: 语义或关键词检索（支持相似度阈值过滤）。
- list_memories: 查看最近的记忆。
- delete_memory: 删除过时信息。
- update_memory: 按 ID 更新现有记忆。
懒加载设计 (Lazy Loading)： 优化启动流程，按需加载重型模型，拒绝卡顿。
零成本： 以前需要付费购买的向量存储服务，现在免费运行在你自己的电脑上。

🛠️ 环境要求

操作系统： Windows (已充分测试)
Python： 3.10 或更高版本。
硬件： 推荐使用 NVIDIA 显卡（以获得 TensorRT/CUDA 加速），但也支持 CPU 运行。
MCP 客户端： Gemini CLI 或 Claude Desktop。或者任何可以配置mcp的IDE。

🚀 安装与配置指南

1. 克隆项目

git clone https://github.com/YanZiBin/Local-memory-mcp.git
cd Local-memory-mcp

2. 创建 Python 环境 (强烈推荐 Conda)

为了避免 CUDA 依赖冲突，建议使用 Conda。

conda create -n Local-memory-mcp python=3.10
conda activate Local-memory-mcp

3. 安装依赖库

pip install fastmcp lancedb onnxruntime-gpu transformers numpy uvicorn

（注：如果你没有 NVIDIA 显卡，请将 onnxruntime-gpu 替换为 onnxruntime）

4. 下载嵌入模型 (Embedding Model)

本项目使用 BAAI/bge-m3 的 ONNX 量化版本here。你需要将模型文件下载到项目根目录下的 bge-m3-onnx 文件夹中。

你可以使用 huggingface-cli 或手动下载以下文件：

config.json
model.onnx
model.onnx_data
vocab.txt
sentence_transformers.onnx
sentence_transformers.onnx_data
tokenizer_config.json
tokenizer.json

确保它们都在 bge-m3-onnx 文件夹内。

🏃‍♂️ 运行与使用

由于本项目加载了本地大模型，为了稳定性，我们推荐使用 手动启动 (SSE 模式)。

启动服务器： 打开终端（CMD/PowerShell），运行：
```
python server.py
```
等待直到看到提示：INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
连接客户端 (以 Gemini CLI 为例)：

编辑你的 Gemini CLI 配置文件（通常位于 ~/.gemini/settings.json 或 Windows 的 %USERPROFILE%\.gemini\settings.json）：
```
{
  "mcpServers": {
    "local-memory": {
      "url": "http://localhost:8000/sse",
      "type": "sse"
    }
  }
}
```
开始体验！ 打开 Gemini CLI，直接对话：

“帮我记住：我的项目运行在 Python 3.10 环境下。” “搜索记忆：关于项目环境的信息。”

🗺️ 开发路线图 (Roadmap)

目前项目正过渡到 第四阶段：记忆管理。

License: MIT Author: [YanZiBin]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
tests		tests
.gitignore		.gitignore
Local memory.lnk		Local memory.lnk
README.md		README.md
bridge.py		bridge.py
ranking.py		ranking.py
search_engine.py		search_engine.py
server.py		server.py
start_server.bat		start_server.bat
task.txt		task.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Memory MCP Server (Project Memory Bank)

🇬🇧 English Description

🌟 Design Philosophy & Motivation

✨ Key Features

🛠️ Prerequisites

🚀 Installation & Setup

1. Clone the Repository

2. Create a Python Environment (Conda Recommended)

3. Install Dependencies

4. Download the Embedding Model

🏃‍♂️ Running the Server

🗺️ Roadmap

🇨🇳 中文说明

🌟 设计初衷

✨ 核心功能

🛠️ 环境要求

🚀 安装与配置指南

1. 克隆项目

2. 创建 Python 环境 (强烈推荐 Conda)

3. 安装依赖库

4. 下载嵌入模型 (Embedding Model)

🏃‍♂️ 运行与使用

🗺️ 开发路线图 (Roadmap)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local Memory MCP Server (Project Memory Bank)

🇬🇧 English Description

🌟 Design Philosophy & Motivation

✨ Key Features

🛠️ Prerequisites

🚀 Installation & Setup

1. Clone the Repository

2. Create a Python Environment (Conda Recommended)

3. Install Dependencies

4. Download the Embedding Model

🏃‍♂️ Running the Server

🗺️ Roadmap

🇨🇳 中文说明

🌟 设计初衷

✨ 核心功能

🛠️ 环境要求

🚀 安装与配置指南

1. 克隆项目

2. 创建 Python 环境 (强烈推荐 Conda)

3. 安装依赖库

4. 下载嵌入模型 (Embedding Model)

🏃‍♂️ 运行与使用

🗺️ 开发路线图 (Roadmap)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages