A knowledge graph-driven RAG system for clinical decision support in rare muscular dystrophies, focusing on scenario-based differential diagnosis and management recommendations.
-
Clinical Scenario Processor
- Accepts comprehensive patient presentations
- Extracts phenotypes and clinical features
- Maps to HPO (Human Phenotype Ontology) terms
-
Knowledge Graph Engine
- Neo4j-based graph database
- Monarch Initiative knowledge graph (1.3M+ nodes, 14.7M+ relationships)
- Stores gene-disease-phenotype relationships using Biolink Model schema
- Enables complex clinical reasoning queries
- Falls back to in-memory store if Neo4j unavailable
-
RAG System
- Processes clinical guidelines and literature
- Provides evidence-based recommendations
- Maintains citation traceability
-
Differential Diagnosis Module
- Phenotype-based disease ranking
- Contextual variant interpretation
- Probability scoring with explanations
-
New Patient Presentation
- Input: Clinical features, lab values, family history
- Output: Differential diagnosis, recommended tests, initial management
-
Variant Interpretation
- Input: Genetic variant + clinical context
- Output: Pathogenicity assessment, phenotype prediction, treatment eligibility
-
Management Planning
- Input: Confirmed diagnosis + patient status
- Output: Age-appropriate surveillance, treatment options, prognostic counseling
- Duchenne Muscular Dystrophy (DMD)
- Becker Muscular Dystrophy (BMD)
- Limb-Girdle MD Type R1 (LGMDR1/LGMD2A)
- LAMA2-Related Congenital MD (MDC1A)
- Backend: FastAPI (Python)
- Graph DB: Neo4j
- Vector DB: ChromaDB/Pinecone
- LLM: OpenAI GPT-4 / Claude
- RAG: LangChain
- Frontend: React + Next.js
├── backend/
│ ├── api/ # FastAPI endpoints
│ ├── core/ # Core business logic
│ │ ├── scenario_processor.py
│ │ ├── differential_diagnosis.py
│ │ └── variant_interpreter.py
│ ├── knowledge_graph/ # Neo4j integration
│ ├── rag/ # RAG pipeline
│ └── data/ # Data ingestion scripts
│
├── frontend/
│ ├── components/ # React components
│ ├── pages/ # Next.js pages
│ └── utils/ # Helper functions
│
├── data/
│ ├── guidelines/ # Clinical guidelines
│ ├── gene_data/ # Genetic databases
│ └── scenarios/ # Test scenarios
│
├── notebooks/
│ └── prototype.ipynb # Development notebook
│
├── pyproject.toml # Project dependencies (uv/pip)
├── uv.lock # Locked dependencies (uv)
└── .env # Neo4j configuration (not in git)
backend/knowledge_graph/monarch_service.py– queries Monarch using Biolink schemabackend/knowledge_graph/monarch_mapper.py– maps Biolink labels/relationships to project schematest_monarch_integration.py– smoke test script to verify Monarch connectivity (uv run python test_monarch_integration.py)backend/knowledge_graph/seed_data.py– fallback only when Neo4j/Monarch is unavailable
- Python 3.9+
- Node.js 16+ (for frontend, when implemented)
- Neo4j 4.4+ (Neo4j Desktop recommended)
- uv (recommended Python package manager)
Using uv (Recommended):
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate # On macOS/Linux
# .venv\Scripts\activate # On Windows
# Install project dependencies (from pyproject.toml)
uv sync
# Or install dependencies directly
uv pip install neo4j pydantic python-dotenv
# Install with dev dependencies (for notebooks)
uv sync --extra devAlternative: Using pip:
python -m venv venv
source venv/bin/activate
pip install neo4j pydantic python-dotenvThe project uses Neo4j with the Monarch Initiative knowledge graph database.
Option A: Neo4j Desktop (Recommended)
- Download and install Neo4j Desktop
- Create a new database instance (or use existing)
- Import the Monarch Initiative dump into a database named
monarch - Start the database
Option B: Docker
docker run -p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/your-password \
neo4j:latestCreate a .env file in the project root:
NEO4J_URI=neo4j://127.0.0.1:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password
NEO4J_DATABASE=monarchImportant:
- Add
.envto.gitignore(already included) to avoid committing credentials - The
uv.lockfile (generated byuv sync) should be committed to ensure reproducible builds
# Activate virtual environment
source .venv/bin/activate
# Test connection to Monarch database
python test_monarch_database.py
# Or test general connection
python test_neo4j_connection.py
# Optional: run Monarch integration smoke tests
python test_monarch_integration.pyNote: The system automatically falls back to an in-memory knowledge store if Neo4j is not available or not configured.
{
"patient": {
"age": "7 years",
"sex": "male"
},
"symptoms": [
"Progressive proximal muscle weakness",
"Gowers sign positive",
"Calf pseudohypertrophy"
],
"labs": {
"CK": "15000 U/L"
},
"question": "What is the diagnosis and management?"
}- Project planning and architecture
- Knowledge graph schema design
- Neo4j integration with Monarch Initiative database
- Clinical scenario processor
- In-memory knowledge store (fallback)
- Adapter layer for Monarch schema integration
- RAG pipeline implementation
- API development
- Frontend interface
- Testing with real scenarios
- Birnkrant DJ, et al. Diagnosis and management of Duchenne muscular dystrophy. Lancet Neurol. 2018
- TREAT-NMD Standards of Care Guidelines
- ACMG/AMP Variant Interpretation Guidelines