A privacy-focused, AI-powered medical intake system that helps patients organize their health information before doctor visits. Built with a zero-persistence architecture ensuring complete data privacy.
Live Demo: https://caazzi-securemed.hf.space/
SecureMed Chat is an intelligent medical anamnesis assistant that generates contextual questions based on patient symptoms, helping them prepare comprehensive health summaries for their healthcare providers. The system uses RAG (Retrieval-Augmented Generation) with medical knowledge to ensure relevant and medically-informed questioning.
βββββββββββββββββββ
β Gradio Frontendβ βββββββΊ HuggingFace Spaces
ββββββββββ¬βββββββββ
β HTTPS + API Key
βΌ
βββββββββββββββββββ
β FastAPI Backendβ βββββββΊ GCP Cloud Run (2Gi Memory)
ββββββββββ¬βββββββββ - Auto-scaling with min 1 instance
β - VPC Connector for secure DB access
ββββββββββββ
βΌ βΌ
βββββββββββββββ ββββββββββββββββ
β Vertex AI β βChromaDB Vectorβ βββΊ GCP VM Instance
β LLM Models β β Store β (Internal Network Only)
βββββββββββββββ ββββββββββββββββ
- Backend: FastAPI with async/await patterns
- LLM: Google Vertex AI (Gemini 2.5 Flash Lite)
- Embeddings: Gemini Embedding Model
- Vector Store: ChromaDB for medical knowledge retrieval
- Frontend: Gradio with internationalization (EN/PT)
- PDF Generation: ReportLab (in-memory generation)
- Deployment:
- API: GCP Cloud Run (Serverless)
- Vector DB: GCP Compute Engine VM
- UI: HuggingFace Spaces
-
No Data Storage:
- All patient information exists only in memory during the session
- No database records of patient data
- No file system persistence
-
In-Memory PDF Generation:
# PDFs are generated in memory and streamed directly buffer = io.BytesIO() # ... PDF generation ... pdf_bytes = buffer.getvalue() buffer.close()
-
Structured Logging Without PII:
# Logs track operations but never patient data logging.info(f"Streaming initial questions for new session (lang={lang}).") # Never: logging.info(f"Patient complaint: {complaint}")
- API Key Authentication: All endpoints protected with X-API-KEY header
- Input Sanitization: All user inputs stripped and validated
- Network Isolation: ChromaDB accessible only via internal VPC
- Secret Management: Using GCP Secret Manager for API keys
- TLS/HTTPS: All communications encrypted in transit
- Session-Based Processing: Data exists only for request duration
- No User Accounts: No registration or login required
- Explicit Disclaimers: Clear messaging that output is not medical advice
- Data Minimization: Only essential information collected (age bracket, not exact age)
- User Input β Gradio interface collects symptoms
- Question Generation β RAG retrieves relevant medical context
- Streaming Response β Questions streamed to user in real-time
- Answer Collection β User provides detailed responses
- Summarization β LLM structures information into medical format
- PDF Generation β In-memory PDF creation and immediate download
- Session End β All data cleared from memory
gcloud run deploy securemed-chat-service \
--source . \
--project=securemed-chat \
--region=southamerica-east1 \
--vpc-connector=api-to-db-connector \
--memory=2Gi \
--min-instances=1 \
--service-account=securemed-cr-sa@securemed-chat.iam.gserviceaccount.com \
--set-env-vars=CHROMA_HOST=securemed-chat.southamerica-east1-a.c.securemed-chat.internal,CHROMA_PORT=8000 \
--set-secrets=SECUREMED_API_KEY=SECUREMED_API_KEY:latest- Lazy Loading: Models initialized only on first request
- MMR Retrieval: Using Maximum Marginal Relevance for diverse context
- Streaming Responses: Real-time question delivery
- Optimized Workers: Gunicorn with 2 workers for optimal concurrency
- Multi-stage Docker: Minimized container size (~200MB)
| Endpoint | Purpose | Privacy Consideration |
|---|---|---|
/api/initial-questions-stream |
Generate symptom questions | No data persistence |
/api/follow-up-questions-stream |
Generate medical history questions | Context exists only in request |
/api/summarize-and-generate-pdf |
Create medical summary PDF | In-memory generation, immediate disposal |
The system supports multiple languages with complete UI and content translation:
- English (en): Default language
- Portuguese (pt): Full translation including PDF output
- Language auto-detected from browser settings
- Principle of Least Privilege: Service accounts with minimal permissions
- Defense in Depth: Multiple security layers (API key, VPC, IAM)
- Input Validation: Pydantic models with field constraints
- Error Handling: Graceful degradation without exposing internals
- Rate Limiting: Built-in Cloud Run throttling
- Secure Defaults: No default API keys in production
- Structured logging for operational insights
- No PII in logs or metrics
- Cloud Run automatic metrics (latency, errors, traffic)
- Health check endpoint at root path
We welcome contributions! Please:
- Test the live demo: https://caazzi-securemed.hf.space/
- Review the code for security and privacy improvements
- Suggest enhancements via issues or pull requests
- Additional language support
- Enhanced medical knowledge base
- Accessibility improvements (WCAG compliance)
- Performance optimizations
- Security audit findings
- Documentation improvements
- Not Medical Advice: System explicitly disclaims medical advisory capacity
- Data Protection: Designed with GDPR/LGPD principles (no data retention)
- Healthcare Integration: Not intended for direct EHR integration
- Age Verification: System designed for adult users (18+)
When reviewing the code, please pay special attention to:
- Privacy Leaks: Any inadvertent data persistence
- Security Vulnerabilities: Input validation, injection attacks
- Performance Bottlenecks: Async operations, memory usage
- Error Handling: Graceful failures, user experience
- Internationalization: Translation completeness and accuracy
- Why RAG over Fine-tuning?: Maintains flexibility and avoids training on patient data
- Why ChromaDB?: Lightweight, efficient for medical document retrieval
- Why Vertex AI?: HIPAA-compliant infrastructure, regional deployment
- Why In-Memory Processing?: Absolute privacy guarantee
- Cold start: ~3-5 seconds (mitigated by min-instances=1)
- Question generation: <2 seconds
- PDF generation: <1 second
- Memory footprint: ~500MB per concurrent request
For questions about the architecture or to report security concerns, please open an issue with the appropriate label:
- π
security- Security vulnerabilities (use responsible disclosure) - π
privacy- Privacy concerns or improvements - ποΈ
architecture- Architectural suggestions - π
documentation- Documentation improvements
Remember: This system is designed for informational purposes only and should not replace professional medical consultation. Always consult with qualified healthcare providers for medical advice.