📄 Document-Based Information Retrieval API

A scalable and modular RESTful API built with FastAPI for performing intelligent search over .pdf and .docx documents. It extracts, indexes, and retrieves list of most relevant documents using BM25 algorithm.

🚀 Features

🔍 Information retrival using BM25 ranking algorithm.
🧠 NLP preprocessing pipeline for enhanced matching.
🗃️ Document indexing for efficient retrieval.
📁 Support for .pdf and .docx files.
⚡ FastAPI backend with clean, modular architecture.
📥 Download of documents supported.

⚙️ Installation and Setup

1. Clone the repository

git clone https://github.com/your-username/Document_Search_Api.git
cd Document_Search_Api

2. Create a virtual environment (Recommended)

python -m venv venv
source venv/bin/activate (On Mac)
    or
cd venv\Scripts\activate (on Windows)

3. Install requirements

pip install -r requirements.txt

4. Run the API

uvicorn app:app --reload

📥 API Endpoints

Method	Endpoint	Description
Get	/search	Returns list of relevant documents
Get	/download/{filename}	Returns file mentioned as path variable

🧠 How it works

📌 Technologies Used

FastAPI - for building the RESTful API.
PyMuPDF / python-docx - for extracting text from documents.
spaCy - for NLP preprocessing.

📄 License

This project is open-source and available under the MIT License.

📬 Contact

Feel free to contact on linkedin or email at zaidkhatri.work@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
docs		docs
.gitignore		.gitignore
README.md		README.md
extraction.py		extraction.py
main.py		main.py
pre_pipeline.py		pre_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Document-Based Information Retrieval API

🚀 Features

⚙️ Installation and Setup

1. Clone the repository

2. Create a virtual environment (Recommended)

3. Install requirements

4. Run the API

📥 API Endpoints

🧠 How it works

📌 Technologies Used

📄 License

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 Document-Based Information Retrieval API

🚀 Features

⚙️ Installation and Setup

1. Clone the repository

2. Create a virtual environment (Recommended)

3. Install requirements

4. Run the API

📥 API Endpoints

🧠 How it works

📌 Technologies Used

📄 License

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages