Skip to content

sdafni/one_doc_ex

Repository files navigation

Medical Expert AI Chat

An AI-powered medical question answering service with async task processing, multi-LLM support, and real-time statistics.

Features

  • Async Architecture: Built with FastAPI and asyncio for high-concurrency I/O-bound workloads
  • Multi-LLM Support: Works with Claude (Anthropic), DeepSeek, OpenAI, and Mock provider
  • Concurrent Task Processing: Semaphore-based concurrency limiting (max = CPU cores)
  • Retry Logic: Exponential backoff for failed LLM calls
  • Real-time Statistics: Live metrics dashboard with task and performance monitoring
  • PostgreSQL Storage: Persistent message and log storage
  • Docker Ready: Complete containerization with docker-compose

Architecture

Components

  1. FastAPI Backend

    • REST API for message submission and retrieval
    • Statistics endpoint for real-time metrics
    • Static file serving for frontend
  2. Async Task Processing System

    • FastAPI BackgroundTasks spawn async tasks on-demand
    • Semaphore limits concurrent tasks to CPU cores (configurable)
    • Tasks process messages asynchronously via asyncio
    • Graceful shutdown handling
  3. LLM Provider Abstraction

    • Unified interface for multiple LLM providers
    • Support for Anthropic Claude, DeepSeek, OpenAI
    • Mock provider for testing without API keys
  4. PostgreSQL Database

    • Message storage with status tracking (pending → processing → completed/failed)
    • Application logging to stdout for operational debugging
    • Efficient indexing for performance
  5. Frontend

    • Simple HTML + Vanilla JS interface
    • Polling-based message status updates
    • Live statistics dashboard
    • Responsive design

Architecture Decision: AsyncIO Task Model

This system uses asyncio tasks with semaphore-based concurrency limiting instead of a traditional worker pool:

Why This Approach:

  • I/O-Bound Workload: 95%+ of time spent waiting for LLM API responses, not CPU computation
  • Lightweight: Async tasks use KB vs MB per process, enabling higher concurrency
  • Fast Scaling: Can handle hundreds of concurrent requests efficiently
  • Simpler Implementation: No inter-process communication complexity, easier debugging
  • Natural Fit: FastAPI and LLM SDKs (Anthropic, OpenAI) are already async-native

How It Works:

  1. User submits question via POST /chat
  2. Message saved to database with status='pending'
  3. FastAPI BackgroundTasks spawns an async task via asyncio.create_task()
  4. Task acquires semaphore (blocks if at max_workers limit)
  5. Task calls LLM with retry logic, updates database
  6. Task releases semaphore on completion

Concurrency Limiting:

  • Semaphore set to MAX_WORKERS (defaults to CPU core count per spec)
  • Tasks block on semaphore acquisition when limit reached
  • No explicit queue - tasks wait on semaphore internally

Trade-offs:

  • ✅ Better performance for I/O-bound tasks than multiprocessing
  • ✅ Lower memory footprint (~10-100KB per task vs ~10MB per process)
  • ✅ Simpler codebase, easier to maintain
  • ❌ No individual "worker" objects to track lifecycle
  • ❌ No concept of "idle workers" (tasks either exist/active or don't exist)
  • ❌ No visible queue depth (blocking happens at semaphore level)

Quick Start

Option 1: Docker (Recommended)

# 1. Clone/navigate to project directory
cd one_doc_ex

# 2. Create .env file from example
cp .env.example .env

# 3. (Optional) Add your API keys to .env
nano .env  # Or your favorite editor

# 4. Start services
docker-compose up --build

# 5. Open browser to http://localhost:8000

Option 2: Local Development

# 1. Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Set up PostgreSQL
# Install PostgreSQL 15+ and create database:
createdb medchat
createuser medchat_user -P  # Set password: medchat_password

# 4. Create .env file
cp .env.example .env
# Edit .env with your database URL and API keys

# 5. Initialize database
python scripts/init_db.py

# 6. Run application
uvicorn app.main:app --reload --port 8000

# 7. Open browser to http://localhost:8000

Configuration

All configuration via environment variables (.env file):

Server

  • SERVER_PORT: HTTP server port (default: 8000)
  • HOST: Bind host (default: 0.0.0.0)

Database

  • DATABASE_URL: PostgreSQL connection string

LLM Provider

  • LLM_PROVIDER: Provider to use (anthropic, deepseek, openai, mock)
  • LLM_MODEL: Model name (e.g., claude-3-5-sonnet-20241022)
  • LLM_TEMPERATURE: Temperature setting (default: 0.7)
  • LLM_MAX_TOKENS: Max tokens per request (default: 2000)

API Keys

  • ANTHROPIC_API_KEY: Anthropic API key (for Claude)
  • DEEPSEEK_API_KEY: DeepSeek API key
  • OPENAI_API_KEY: OpenAI API key

Worker Configuration

  • MAX_WORKERS: Max concurrent workers (default: CPU count)
  • WORKER_IDLE_TIMEOUT: Seconds before idle worker exits (default: 60)

Retry Configuration

  • RETRY_DELAY: Base delay between retries in seconds (default: 2)
  • MAX_RETRIES: Maximum retry attempts (default: 3)

Queue

  • QUEUE_MAXSIZE: Maximum queue size (default: 1000)

API Endpoints

POST /api/chat

Submit a new medical question.

Request:

{
  "question": "What are the symptoms of iron deficiency?"
}

Response:

{
  "messageId": "uuid-string"
}

GET /api/chat/{messageId}

Get status and response for a message.

Responses:

Pending (waiting for worker):

{
  "status": "pending"
}

Processing (actively being processed):

{
  "status": "processing"
}

Completed:

{
  "status": "completed",
  "response": "Iron deficiency commonly causes..."
}

Failed:

{
  "status": "failed",
  "error": "LLM request failed after retries"
}

GET /api/statistics

Get real-time system statistics.

Response:

{
  "messagesProcessed": 120,
  "messagesSucceeded": 110,
  "messagesFailed": 10,
  "totalRetries": 27,
  "averageProcessingTimeMs": 850,
  "averageTokensPerMessage": 430,
  "totalTokensUsed": 51600,
  "activeWorkers": 4
}

Note on Statistics:

  • activeWorkers: Shows current number of active async tasks processing messages

GET /health

Health check endpoint.

Testing

Run Tests

# Install dev dependencies
pip install -r requirements.txt

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov-report=html

# Run specific test file
pytest tests/test_llm_providers/test_mock_provider.py

# Run with output
pytest -v -s

Test Structure

tests/
├── conftest.py              # Shared fixtures
├── test_api/                # API endpoint tests
├── test_services/           # Service layer tests
├── test_workers/            # Worker system tests
└── test_llm_providers/      # LLM provider tests

Development

Project Structure

one_doc_ex/
├── app/
│   ├── api/              # API endpoints
│   ├── llm_providers/    # LLM provider implementations
│   ├── models.py         # Database models
│   ├── prompts/          # System prompts
│   ├── services/         # Business logic services
│   ├── workers/          # Async worker system
│   ├── config.py         # Configuration
│   ├── database.py       # Database connection
│   ├── main.py           # FastAPI application
│   └── schemas.py        # Pydantic schemas
├── static/               # Frontend files
├── tests/                # Test suite
├── scripts/              # Utility scripts
└── docker-compose.yml    # Docker orchestration

Adding a New LLM Provider

  1. Create new provider class in app/llm_providers/:
from app.llm_providers.base import BaseLLMProvider, LLMResponse

class MyProvider(BaseLLMProvider):
    async def generate(self, prompt: str, system_prompt: str) -> LLMResponse:
        # Implement API call
        pass

    def validate_config(self) -> bool:
        # Validate configuration
        pass
  1. Register in app/services/llm_service.py:
provider_map = {
    "my_provider": MyProvider,
    # ... existing providers
}
  1. Add configuration in app/config.py if needed

  2. Update .env.example with new provider option

Monitoring

View Logs

# Docker logs
docker-compose logs -f app

# Message status
docker-compose exec postgres psql -U medchat_user -d medchat \
  -c "SELECT message_id, status, created_at FROM messages ORDER BY created_at DESC LIMIT 10;"

Statistics Dashboard

Access the built-in statistics dashboard at http://localhost:8000

Metrics updated every 5 seconds:

  • Messages processed, succeeded, failed
  • Success rate
  • Average response time
  • Queue length
  • Active/idle workers
  • Token usage

Troubleshooting

Database Connection Issues

# Check PostgreSQL is running
docker-compose ps postgres

# Check connection
docker-compose exec postgres psql -U medchat_user -d medchat -c "SELECT 1;"

# Reset database
docker-compose down -v
docker-compose up -d postgres
python scripts/init_db.py

LLM Provider Issues

# Test with mock provider (no API key needed)
LLM_PROVIDER=mock docker-compose up

# Verify API key is set
docker-compose exec app env | grep API_KEY

# Check provider logs
docker-compose logs app | grep -i "llm\|provider\|anthropic\|deepseek"

Worker Not Processing

# Check worker pool status via API
curl http://localhost:8000/api/statistics

# Check queue length
docker-compose exec postgres psql -U medchat_user -d medchat \
  -c "SELECT status, COUNT(*) FROM messages GROUP BY status;"

# Restart workers
docker-compose restart app

Performance Tuning

Adjusting Worker Count

# Increase for higher concurrency (I/O-bound workload)
MAX_WORKERS=16 docker-compose up

# Note: Can exceed CPU count for I/O-bound tasks

Adjusting Retry Settings

# Faster retries
RETRY_DELAY=1 MAX_RETRIES=5 docker-compose up

# Slower, more patient retries
RETRY_DELAY=5 MAX_RETRIES=3 docker-compose up

Database Performance

-- Add indexes if queries are slow
CREATE INDEX IF NOT EXISTS idx_messages_status_created
  ON messages(status, created_at DESC);

-- Analyze query performance
EXPLAIN ANALYZE SELECT * FROM messages
  WHERE status = 'processing'
  ORDER BY created_at DESC;

Security Notes

  • API Keys: Never commit .env file with real API keys
  • Database: Change default credentials in production
  • HTTPS: Use reverse proxy (nginx) with SSL in production
  • Rate Limiting: Add rate limiting for public deployments
  • Authentication: Add auth middleware for production use

License

This project is for educational/assignment purposes.

Contributing

  1. Fork the repository
  2. Create feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open Pull Request

Acknowledgments

  • FastAPI for the excellent async web framework
  • Anthropic, DeepSeek, and OpenAI for LLM APIs
  • PostgreSQL for robust data storage

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors