An intelligent assistant that allows users to query PDF and Excel documents using natural language. Built using React + Vite + TypeScript on the frontend and Flask + LangChain + OpenAI on the backend. The system supports contextual retrieval, optional reasoning, and web search.
- ✅ Natural language queries on PDFs and Excel files
- 🔍 Internal document retrieval using vector search (FAISS)
- 🧠 Optional LLM-based reasoning
- 🌐 Optional real-time web search via OpenAI
- 📎 Source file traceability and download support
- 🧾 Feedback system for likes/dislikes with logging
- 🎙️ Voice-to-text input (speech recognition)
- 📁 Modular ingestion: Add new file types easily
DO33\_Final/
├── backend/ # All backend code
│ ├── app.py # Flask API server
│ ├── config.py # Global config and flags
│ ├── ingestion/ # Data ingestion logic
│ │ ├── ingest\_pdf.py
│ │ ├── ingest\_excel.py
│ ├── vectorstore/ # Stored FAISS indexes
│ ├── retrieval/
│ ├── reasoning/
│ ├── websearch/
│ ├── llm\_response/
│ ├── utils/
│ └── data/ # Embedded document metadata (pkl, json)
├── Data/ # Raw files
│ ├── PDF/
│ └── EXCEL/
├── frontend/ # React frontend
│ ├── src/
│ ├── public/
│ ├── package.json
│ ├── vite.config.ts
│ └── tailwind.config.ts
├── .env
├── .gitignore
└── requirements.txt
python -m venv do33_env
source do33_env/bin/activate # or do33_env\Scripts\activate on Windowspip install -r requirements.txtCreate a .env file in the backend root:
OPENAI_API_KEY=your_openai_api_keypython data_ingestion/generate_embeddings.pypython backend\app.pyServer will run at: http://localhost:5001
cd frontend
npm install
npm run devFrontend will run at: http://localhost:8080
- Ask: "What is the problem with the XYZ component and how was it solved?"
- Enable reasoning or web search (optional)
- Receive a structured answer with source file buttons
- Click source buttons to open documents
- Leave feedback via 👍 / 👎
Inside backend/config.py:
ENABLE_PDF = True
ENABLE_EXCEL = True- Frontend: React, Vite, TypeScript, Tailwind CSS, Shadcn-UI
- Backend: Flask, OpenAI, LangChain, FAISS, Pandas
- Embeddings: BAAI/bge-base-en-v1.5 via HuggingFace
- LLM: GPT-4o or GPT-3.5-turbo (OpenAI)
- Role-based access and authentication
- Add support for CSV/Docx ingestion
- Summary and analytics dashboard
- Row-level preview for Excel rows used in answers