Skip to content

A production-ready document Q&A system with semantic search and LLM integration - ingest, embed, and query your knowledge base.

Notifications You must be signed in to change notification settings

accupara/document_qa_assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document QA Assistant

Python License: MIT Code style: black

A robust pipeline for document ingestion, embedding, and question-answering powered by ChromaDB and OpenAI-compatible LLMs.

System Diagram (Example: Add your architecture diagram here)

Features

  • Multi-format Support: Process Markdown, PDF, Word (.doc/.docx)
  • Smart Chunking: Configurable text splitting with overlap
  • Semantic Search: HNSW-powered vector similarity (ChromaDB)
  • LLM Integration: Streaming responses with citation tracking
  • Production Ready:
    • Environment variable configuration
    • Structured logging (file + stdout)
    • Type hints & PEP8 compliance
    • PyInstaller executable support

Quick Start

# 1. Clone repo
git clone https://github.com/yourusername/document-qa-system.git
cd document-qa-system

# 2. Set up environment (Linux/macOS)
make install-dev
cp .env.example .env  # Edit with your API keys

# 3. Add documents to ./documents/
# 4. Run!
make run

Configuration

Edit .env file

# Document Processing
CHUNK_SIZE=512      # Token size per chunk
CHUNK_OVERLAP=50    # Context overlap between chunks

# Vector DB
EMBEDDING_MODEL=all-MiniLM-L6-v2  # Sentence Transformer model

# LLM (OpenAI-compatible)
OPENAI_API_KEY=your-key-here
OPENAI_MODEL=gpt-3.5-turbo

Usage

  1. Place documents in ./documents/
  2. Launch the interactive Q&A interface:
make run
  1. Enter questions when prompted:
Enter your question: What's the capital of France?

About

A production-ready document Q&A system with semantic search and LLM integration - ingest, embed, and query your knowledge base.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published