This repository features an end-to-end Retrieval-Augmented Generation (RAG) system built with a focus on clean software architecture and production-readiness.
Rather than relying on abstracted, heavy-weight orchestration frameworks, this project utilizes a custom-built, highly modular Python backend. It securely ingests unstructured PDF documents, processes them using token-aware semantic chunking, and leverages Qdrant Cloud and OpenAI's GPT models to deliver grounded, context-aware AI responses.
This application is designed with strict Separation of Concerns (SoC), making the codebase highly scalable, testable, and maintainable. The backend pipeline is decoupled into specialized micro-services:
ingestion.py: Handles robust document parsing and extraction usingPyPDF2.embeddings.py: Executes token-optimized semantic chunking usingtiktokento maximize OpenAI API efficiency and generate high-dimensional vector embeddings.qdrant_client.py: Manages secure cloud connections and vector indexing within the Qdrant database.retrieval.py&generation.py: Orchestrates similarity search and precise prompt engineering to feed context-rich data togpt-3.5-turbo/gpt-4, effectively mitigating LLM hallucinations.
- AI & Embeddings: OpenAI API,
tiktoken - Vector Database: Qdrant Cloud
- Backend: Python,
numpy,python-dotenv - Data Processing:
PyPDF2
This project demonstrates my ability to not only implement cutting-edge Generative AI concepts but to engineer them using scalable, enterprise-grade software design patterns.