Your AI-powered knowledge assistant - A privacy-first browser extension that captures and organizes your browsing history using ML embeddings for semantic search, local RAG, and Ask Me Anything capabilities.
π Your data never leaves your device. All browsing history, page content, and ML embeddings stay 100% local in your browser.
This is a monorepo containing:
/extension: Chromium MV3 extension (Vite + React + TypeScript)/backend: FastAPI backend (Python + PostgreSQL + ClickHouse)
- π 100% Local & Private: Your data never leaves your device - all processing happens in your browser
- π€ Ask Me Anything (AMA): Query your browsing history with natural language
- π§ Local RAG: Retrieval-Augmented Generation with passage-level semantic search
- π Semantic Search: Find pages by meaning, not just keywords
- π± Cross-Device Sync: Sync history between mobile and desktop with end-to-end encryption
- πΎ Local-First Storage: All browsing history stays on your device in IndexedDB
- π― ML-Powered Embeddings: 512-dimensional vectors using Universal Sentence Encoder (runs locally)
- π‘οΈ Privacy-First: Denylist protection for 75+ sensitive domains (banking, healthcare, email)
- π Smart Chunking: Sentence-aware passage generation for long documents
- ποΈ Quality Eviction: Intelligent storage management based on recency, intent, and access patterns
cd extension
npm install
npm run dev # Development with hot reload
npm run build # Production build1. Start Signaling Server:
cd signaling-server
npm install
npm start # WebSocket server on ws://localhost:80802. Install Native Messaging Host (Desktop only):
cd native-host
npm install
node install-manifest.js YOUR_EXTENSION_ID
# Restart Chrome after installation3. Usage:
- Desktop: Click extension β Sync β Generate QR Code
- Mobile: Click extension β Sync β Scan QR Code β Camera opens
- Both devices sync automatically with E2E encryption
cd backend
docker-compose up -d # Start PostgreSQL, ClickHouse, Redis
pip install -r requirements.txt
uvicorn app.main:app --reload- DESIGN_DOC.md: Complete technical design and onboarding document
- System architecture and component details
- Feature implementations (Search, Task Continuation, Cross-Device Sync)
- Testing strategies and troubleshooting guides
- API specifications and privacy architecture
Your Data Never Leaves Your Device:
- β 100% Local Storage: All browsing history, page content, and ML embeddings stay on your device in IndexedDB
- β Zero Data Collection: We never see, store, or have access to your browsing data
- β No Cloud Sync: Your personal knowledge base is yours alone
- β Offline-First: Full functionality without internet connection (except optional deals feature)
Additional Privacy Protections:
- π‘οΈ Smart Denylist: Automatically blocks scraping of 75+ sensitive domains (banking, healthcare, email, government)
- π Encrypted Storage: AES-GCM encryption for sensitive data
- π Differential Privacy: Optional aggregated signals (for deals) use Laplace noise and k-anonymity
- π« No PII: Only anonymous IDs if you choose to use the optional deals feature
- Framework: React 18 + TypeScript
- Build: Vite 5 with MV3 plugin
- ML: TensorFlow.js + Universal Sentence Encoder
- Storage: IndexedDB (local page digests) + chrome.storage.local (preferences)
- Privacy: All data stored locally, differential privacy for signals
- API: FastAPI with async/await
- Database: PostgreSQL (deals catalog, points, signals)
- Deals: Server-side attribution for affiliate tracking
- Points: Reward system for user engagement
cd extension
npm test # Unit tests
npm run test:e2e # Brave compatibility testscd backend
pytest # Unit + integration tests
pytest --brave # Brave-specific test suite- β Phase 0: Dead Code Elimination & Foundation
- β Phase 1: IndexedDB & ML Infrastructure
- β Phase 2: On-Device Inference (TensorFlow.js + USE)
- β Phase 3: Semantic Search Feature
- β Phase 4: Task Continuation Feature
- β Phase 5: Cross-Device Sync (QR pairing, E2E encryption)
- π§ Phase 6: Testing & QA
- β³ Phase 7: LLM Integration (WebLLM, Streaming)
- β³ Phase 8: Chrome Web Store Deployment
- Local-first architecture (IndexedDB)
- Privacy-first denylist (75+ sensitive domains)
- Semantic search with hybrid ranking
- Passage-level chunking and retrieval
- AMA with extractive answers and citations
- Quality-aware storage eviction
- Cross-device sync with E2E encryption
- QR code pairing (< 30 seconds)
- Smart merge conflict resolution
- Vector compatibility validation
- WebLLM integration (Phase 7)
- Streaming token generation (Phase 7)
GNU Affero General Public License v3.0 (AGPL-3.0)
This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.
Key Points:
- β Free to use, modify, and distribute
- β Must disclose source code of modifications
- β Network use is considered distribution (SaaS clause)
- β Derivatives must also use AGPL-3.0
- β Protects user freedom and privacy
Contributions welcome! This is an open-source project focused on privacy-first knowledge management.
