A scalable and modular RESTful API built with FastAPI for performing intelligent search over .pdf and .docx documents. It extracts, indexes, and retrieves list of most relevant documents using BM25 algorithm.
- 🔍 Information retrival using BM25 ranking algorithm.
- 🧠 NLP preprocessing pipeline for enhanced matching.
- 🗃️ Document indexing for efficient retrieval.
- 📁 Support for
.pdfand.docxfiles. - ⚡ FastAPI backend with clean, modular architecture.
- 📥 Download of documents supported.
git clone https://github.com/your-username/Document_Search_Api.git
cd Document_Search_Apipython -m venv venv
source venv/bin/activate (On Mac)
or
cd venv\Scripts\activate (on Windows)pip install -r requirements.txtuvicorn app:app --reload| Method | Endpoint | Description |
|---|---|---|
| Get | /search | Returns list of relevant documents |
| Get | /download/{filename} | Returns file mentioned as path variable |
- FastAPI - for building the RESTful API.
- PyMuPDF / python-docx - for extracting text from documents.
- spaCy - for NLP preprocessing.
This project is open-source and available under the MIT License.
Feel free to contact on linkedin or email at zaidkhatri.work@gmail.com