A simple, fully local (except Gemini API) web-based PDF Question-Answering Chatbot.
Upload any PDF → the app extracts and stores its content as vector embeddings in ChromaDB → ask natural language questions → get accurate answers powered by Google Gemini using Retrieval-Augmented Generation (RAG).
Perfect for learning RAG, vector databases, and LLM integration with Node.js!
- Upload PDF directly from the browser
- Smart text extraction & cleaning using
pdf-parse - Intelligent chunking with overlap and duplicate removal
- Persistent vector storage with ChromaDB
- Simple, responsive chat interface
- Answers generated by Google Gemini 1.5 Flash (or Pro), grounded in your PDF
- No external paid vector DB needed
- Backend: Node.js + Express
- Frontend: HTML, CSS, Vanilla JS (no frameworks)
- PDF Processing:
pdf-parse - Vector Database: ChromaDB (via Docker)
- Embeddings: Chroma’s default embedding function (OpenAI-compatible)
- LLM: Google Gemini (via REST API)
- File Upload: Multer
- Env Management: dotenv
- Node.js (v18 or higher)
- npm or yarn
- Docker (Docker Desktop on Windows/Mac)
- Google Gemini API Key → get it free at Google AI Studio
git clone https://github.com/unlimitedcode07/Pdf_reader_RAG_Using_ChromaDb.git
cd Pdf_reader_RAG_Using_ChromaDb