Skip to content

zaidkhatri-dev/Document_Search_API

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 Document-Based Information Retrieval API

A scalable and modular RESTful API built with FastAPI for performing intelligent search over .pdf and .docx documents. It extracts, indexes, and retrieves list of most relevant documents using BM25 algorithm.


🚀 Features

  • 🔍 Information retrival using BM25 ranking algorithm.
  • 🧠 NLP preprocessing pipeline for enhanced matching.
  • 🗃️ Document indexing for efficient retrieval.
  • 📁 Support for .pdf and .docx files.
  • ⚡ FastAPI backend with clean, modular architecture.
  • 📥 Download of documents supported.

⚙️ Installation and Setup

1. Clone the repository

git clone https://github.com/your-username/Document_Search_Api.git
cd Document_Search_Api

2. Create a virtual environment (Recommended)

python -m venv venv
source venv/bin/activate (On Mac)
    or
cd venv\Scripts\activate (on Windows)

3. Install requirements

pip install -r requirements.txt

4. Run the API

uvicorn app:app --reload

📥 API Endpoints

Method Endpoint Description
Get /search Returns list of relevant documents
Get /download/{filename} Returns file mentioned as path variable

🧠 How it works

flowchart



📌 Technologies Used

  • FastAPI - for building the RESTful API.
  • PyMuPDF / python-docx - for extracting text from documents.
  • spaCy - for NLP preprocessing.

📄 License

This project is open-source and available under the MIT License.


📬 Contact

Feel free to contact on linkedin or email at zaidkhatri.work@gmail.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages