Turn your browser bookmarks into an interactive, privacy-first knowledge base.
Bookmarks RAG Knowledge Assistant is a local-first Retrieval-Augmented Generation (RAG) tool designed to unlock insights from your accumulated browser bookmarks. By ingesting Netscape-format HTML exports (supported by Chrome, Firefox, Safari, Edge), it transforms static links into a queryable vector database, allowing you to chat with your saved content using state-of-the-art local LLMs.
I built this project to solve the "digital hoarding" problem: we all bookmark hundreds of interesting articles, papers, and tools, but we rarely revisit them because searching through a nested folder structure is inefficient. The hypothesis was that a RAG system could surface these forgotten gems by allowing semantic queries like "What was that article about local-first software architecture?" rather than forcing you to remember the exact title or folder.
- Privacy-First Architecture: Zero data exfiltration. Runs 100% locally using Ollama and local embedding models.
- Advanced RAG Pipeline:
- Smart Ingestion: Parses and cleans HTML content from bookmarked URLs using
BeautifulSoupandreadability-lxml. - Semantic Chunking: Intelligently splits content to preserve context for better retrieval.
- Hybrid Search: Combines vector similarity search (embeddings) with structured metadata filtering.
- Smart Ingestion: Parses and cleans HTML content from bookmarked URLs using
- High-Performance Backend: Powered by FastAPI and DuckDB for sub-millisecond vector retrieval and async processing.
- Modern Reactive UI: A polished React 19 + Vite frontend with Tailwind CSS 4 for seamless bookmark management and chat.
- Built-in Evaluation: Includes a
ragas-based evaluation framework to benchmark retrieval accuracy and generation quality. - Technical Rigor: TDD with high test coverage, type-safe Python (mypy), and automated CI pipelines.
While the technology works flawlessly, I am archiving this repository to focus on more impactful solutions.
The Pivot Insight: After building V1 and using it daily, I realized the core user journey is flawed. The friction of exporting bookmarks to an HTML file and uploading them to a separate tool is too high for casual use. People don't forget bookmarks because they lack a search tool; they forget them because the retrieval mechanism isn't integrated into their daily workflow (the browser itself).
What I Would Build Instead: If I were to restart this project today, I would build a Browser Extension that captures context at the moment of saving, indexing the page content in the background and offering a "Chat with History" sidebar directly in the browser. This removes the manual export/upload step and integrates the intelligence directly where the user works.
- DuckDB is a Vector Powerhouse: Using DuckDB for both metadata storage and vector similarity search simplified the architecture immensely compared to managing a separate Postgres + pgvector or ChromaDB instance.
- The "Context Window" Trap: Simply retrieving the top-k chunks isn't enough for high-quality answers. Implementing a re-ranking step (or hybrid search) significantly improved the relevance of answers for broad queries.
- Evaluation is Critical: Building the
ragasevaluation pipeline early allowed for objective measurement of how chunking strategy changes affected answer quality.
Challenge: Bookmarks are often "time capsules"—links saved years ago frequently point to domains that have lapsed, content that has moved, or endpoints returning 404/500 errors. Lesson & Mitigation: Trust but verify. I implemented pre-ingestion checks (HEAD/GET requests) to validate link freshness. Insight: A robust system must surface a "last alive" timestamp to the user so they understand why a specific source was skipped, rather than silently failing.
Challenge: Many high-value sites block automated crawling via robots.txt or deploy sophisticated bot-detection (CAPTCHA, IP rate-limiting, JS challenges). Aggressive scraping risks IP bans.
Lesson & Mitigation:
- Respect Standards: Strictly adhere to
robots.txtand implement polite crawl delays. - Authentication Walls: A subset of bookmarks (paywalls, social media) are unreachable by stateless scrapers. Insight: Instead of fighting anti-bot measures, the system should maintain an "allow list" of scrape-friendly domains. When a page is disallowed or gated, it is safer to log the reason and exclude it from the vector index than to risk a block.
Challenge: The current pipeline is strictly "grounded"—it only answers questions for which relevant bookmarks exist. If the answer isn't in your saved links, it replies "I don't know." While this prevents hallucination, it limits utility. Lesson & Roadmap: A future iteration would benefit from a Hybrid Generative Architecture. This would involve:
- Prioritizing bookmark embeddings in the context window.
- Falling back to the LLM's general knowledge only when bookmark retrieval confidence is low. This approach balances personal context with general utility, preventing the "empty response" frustration while maintaining high relevance.
In summary, this project reinforced the importance of ethical scraping, transparent failure modes, and the need for systems that degrade gracefully when data is imperfect.
- Framework:
FastAPI(Async web API),Uvicorn(ASGI Server) - Database:
DuckDB(In-process SQL OLAP & Vector Store) - AI & ML:
- LLM Runtime:
Ollama(Local Llama 3, Mistral, etc.) - Embeddings:
sentence-transformers(Hugging Face),nomic-embed-text
- LLM Runtime:
- Data Processing:
BeautifulSoup4,readability-lxml,httpx(Async HTTP) - Testing:
pytest,pytest-asyncio,pytest-cov
- Core:
React 19,TypeScript - Build Tool:
Vite - Styling:
Tailwind CSS 4,clsx,tailwind-merge - Icons:
Lucide React - State/Networking:
Axios, React Hooks
- CI/CD:
GitHub Actions(Automated testing & linting) - Package Management:
uv(Python),npm(Node)
- Python 3.10+
- Node.js 18+ (for frontend)
- Ollama (running locally)
- uv (recommended for Python package management)
# Clone the repository
git clone https://github.com/elessarrr/Bookmarks-RAG-Knowledge-Assistant.git
cd Bookmarks-RAG-Knowledge-Assistant
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install -r requirements.txt
# Start Ollama (ensure you have models pulled)
ollama pull llama3
ollama pull nomic-embed-textcd frontend
npm install
npm run devStart the backend API:
# In project root
uvicorn app.main:app --reloadOpen http://localhost:5173 in your browser.
# Backend tests
PYTHONPATH=. pytest
# Run smoke test (end-to-end)
PYTHONPATH=. pytest tests/smoke_test.pyTo evaluate RAG performance:
python evals/run_evals.pyResults are saved in evals/results/.
This repository demonstrates end-to-end AI product development, moving from a Product Requirements Document (PRD) to a shipping codebase with a rigorous evaluation framework. It highlights experience in:
- Architecting Local-First Systems: Designing privacy-preserving AI systems that don't rely on expensive cloud APIs.
- Implementing Production Patterns: Using
DuckDBfor vector storage,FastAPIfor async endpoints, andReactfor a responsive UI. - Validating Product Assumptions: The decision to archive this project reflects a product-first mindset—recognizing when a solution (standalone app) doesn't match the optimal user workflow (browser extension) and being willing to pivot based on that insight.
I am open to discussing the architectural decisions, the evaluation methodology, or the rationale behind the product pivot.
MIT License. See LICENSE for details.