A complete, production-quality implementation of Retrieval-Augmented Generation
Live Demo โข Architecture โข Quick Start โข Documentation
RagXGen is a portfolio project that demonstrates deep understanding of Retrieval-Augmented Generation by:
- Explaining it โ Clear, engineer-friendly documentation of how RAG works
- Visualizing it โ Interactive pipeline diagrams showing each step
- Implementing it โ Real, working code with production patterns
- Evaluating it โ Honest discussion of trade-offs and failure modes
This isn't a mock demo. It's a fully functional RAG system you can test with your own documents.
- ๐ Document Upload โ Upload PDFs and TXT files for processing
- ๐ Semantic Search โ FAISS-powered vector similarity search
- ๐ฌ Chat Interface โ Natural language Q&A with your documents
- ๐ Transparency โ See retrieved chunks, similarity scores, and sources
- โ๏ธ Configurable โ Adjust chunk size, overlap, and top-K retrieval
- ๐จ Modern UI โ Dark mode, smooth animations, responsive design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FRONTEND (Next.js) โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ Landing โ โ What is โ โArchitect-โ โ Live Demo โ โ
โ โ Page โ โ RAG? โ โ ure โ โ (Chat + Upload) โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ REST API
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BACKEND (FastAPI) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ RAG Pipeline โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โ Documentโโ โ Chunking โโ โEmbeddingโโ โFAISS Store โ โ โ
โ โ โ Upload โ โ(1000 char) โ โ(OpenAI) โ โ(Similarity)โ โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โ โ Query โโ โ Retrieve โโ โ Augmentโโ โ Generate โ โ โ
โ โ โ Input โ โ Top-K (4) โ โ Prompt โ โ(GPT-4o-mini)โ โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- Node.js 18+
- Python 3.10+
- OpenAI API Key
git clone https://github.com/yourusername/ragxgen.git
cd ragxgen# Navigate to backend
cd backend
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
.\venv\Scripts\activate
# macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create .env file
cp .env.example .env
# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-key-here
# Start the backend server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000The backend will be available at http://localhost:8000.
API documentation at http://localhost:8000/docs.
# Open new terminal and navigate to frontend
cd frontend
# Install dependencies
npm install
# Create env file
cp .env.example .env.local
# Start the development server
npm run devThe frontend will be available at http://localhost:3000.
ragxgen/
โโโ backend/ # Python FastAPI Backend
โ โโโ app/
โ โ โโโ __init__.py
โ โ โโโ main.py # FastAPI application entry
โ โ โโโ config.py # Configuration management
โ โ โโโ models/
โ โ โ โโโ __init__.py
โ โ โ โโโ schemas.py # Pydantic models
โ โ โโโ routers/
โ โ โ โโโ __init__.py
โ โ โ โโโ documents.py # Document upload endpoints
โ โ โ โโโ query.py # RAG query endpoint
โ โ โโโ services/
โ โ โโโ __init__.py
โ โ โโโ embeddings.py # OpenAI embeddings service
โ โ โโโ vector_store.py # FAISS vector store
โ โ โโโ rag_pipeline.py # Core RAG implementation
โ โโโ requirements.txt
โ โโโ .env.example
โ
โโโ frontend/ # Next.js Frontend
โ โโโ src/
โ โ โโโ app/
โ โ โ โโโ layout.tsx # Root layout
โ โ โ โโโ page.tsx # Landing page
โ โ โ โโโ globals.css # Global styles
โ โ โ โโโ what-is-rag/
โ โ โ โ โโโ page.tsx # What is RAG explanation
โ โ โ โโโ architecture/
โ โ โ โ โโโ page.tsx # Interactive architecture
โ โ โ โโโ demo/
โ โ โ โ โโโ page.tsx # Live RAG demo
โ โ โ โโโ evaluation/
โ โ โ โ โโโ page.tsx # Trade-offs & failure cases
โ โ โ โโโ about/
โ โ โ โโโ page.tsx # Case study
โ โ โโโ components/
โ โ โ โโโ layout/
โ โ โ โโโ Navigation.tsx
โ โ โ โโโ Footer.tsx
โ โ โโโ lib/
โ โ โโโ api.ts # API client
โ โ โโโ utils.ts # Utility functions
โ โโโ package.json
โ โโโ tailwind.config.ts
โ โโโ tsconfig.json
โ
โโโ README.md
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | (required) |
CHUNK_SIZE |
Characters per document chunk | 1000 |
CHUNK_OVERLAP |
Overlap between chunks | 200 |
TOP_K |
Default chunks to retrieve | 4 |
MODEL_NAME |
LLM model for generation | gpt-4o-mini |
EMBEDDING_MODEL |
Embedding model | text-embedding-3-small |
CORS_ORIGINS |
Allowed CORS origins | http://localhost:3000 |
| Variable | Description | Default |
|---|---|---|
NEXT_PUBLIC_API_URL |
Backend API URL | http://localhost:8000 |
| Method | Endpoint | Description |
|---|---|---|
POST |
/documents/upload |
Upload and process a document |
GET |
/documents/session/{id} |
Get session information |
DELETE |
/documents/session/{id} |
Delete a session |
POST |
/documents/session/create |
Create new session |
| Method | Endpoint | Description |
|---|---|---|
POST |
/query/ |
Execute RAG query |
GET |
/query/config |
Get RAG configuration |
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check |
Traditional LLMs have three key limitations:
- Hallucinations โ They confidently make up information
- Stale Knowledge โ Training cutoff means outdated information
- No Source Attribution โ Can't verify where answers come from
RAG addresses these by retrieving relevant context from your documents:
User Question
โ
Embed question โ Search vector store โ Retrieve top-K chunks
โ
Inject chunks into prompt โ Generate grounded answer
โ
Answer with sources
| Component | Purpose | This Project |
|---|---|---|
| Document Chunking | Split docs into searchable pieces | RecursiveCharacterTextSplitter |
| Embeddings | Convert text to vectors | OpenAI text-embedding-3-small |
| Vector Store | Store and search vectors | FAISS (in-memory) |
| Retrieval | Find relevant chunks | Cosine similarity, Top-K |
| Generation | Produce final answer | GPT-4o-mini |
- Smaller (200-500): Higher precision, less context per chunk
- Larger (1000-2000): More context, may include irrelevant info
- Recommendation: Start with 1000, adjust based on results
- Lower (1-2): Focused, but may miss relevant info
- Higher (6-10): Comprehensive, but noisier
- Recommendation: Default to 4, adjust per use case
- Session data is stored in memory (lost on restart)
- No persistent storage of vector indices
- Limited to PDF and TXT files
- Single-turn conversations (no memory)
If deploying to production, consider:
- Add reranking with cross-encoder
- Implement hybrid search (semantic + keyword)
- Use HyDE for better retrieval
- Use managed vector DB (Pinecone, Weaviate)
- Add Redis caching
- Implement connection pooling
- Add comprehensive error handling
- Implement retry with exponential backoff
- Set up monitoring and alerting
- Stream responses for faster perceived latency
- Add conversation history
- Support more file formats
- Mature ecosystem with good documentation
- Built-in text splitters optimized for RAG
- Easy integration with various LLMs and vector stores
- No external dependencies (runs locally)
- Fast similarity search
- Good enough for demo/prototype scale
- High performance async Python
- Automatic OpenAPI documentation
- Excellent Pydantic integration
- Modern React patterns (Server Components)
- Built-in routing and layouts
- Great developer experience
This project is for educational and portfolio purposes.
Built by a software engineer passionate about AI/ML systems.
- GitHub: github.com/pananon
- LinkedIn: linkedin.com/in/harimangalp
- Email: contact@mangalcore.com
- Website: mangalcore.com
โญ Star this repo if you found it helpful!