This repository provides a self-hosted Data Warehouse designed to ingest, validate, and store patient and organizational data. It serves as the primary source of truth for the Texas Hearing Institute, with a schema optimized for direct integration with Power BI for clinical reporting and advanced analytics.
Once the system is deployed, visit the Web Portal at: http://localhost
- Local Access: http://localhost
- Organizational Access: Replace
localhostwith the server's local IP address (e.g.,http://10.0.0.50). - Documentation & Monitoring:
- API Specs:
http://localhost/api/docs - Task Queue:
http://localhost:15672(User:guest/ Pass:guest)
- API Specs:
The system is composed of seven specialized microservices orchestrated via Docker. This architecture ensures high availability, data integrity, and background processing capabilities.
- Gateway (thi-proxy): An Nginx-based reverse proxy that handles all incoming traffic on Port 80, routing requests to either the Frontend or the API.
- Frontend (thi-frontend): A Next.js web application for data management, file uploads, and warehouse monitoring.
- API (thi-backend): A FastAPI server that orchestrates metadata, handles file registry logic, and communicates with the task queue.
- Worker (thi-celery-worker): A dedicated Python worker that performs the "heavy lifting" of the ETL (Extract, Transform, Load) process, including schema validation and SQL generation.
- Warehouse (thi-db): A PostgreSQL 16 database instance optimized for analytical queries and Power BI connectivity.
- Queue (thi-rabbitmq): An AMQP message broker that ensures reliable communication between the API and the background workers.
- Storage (thi-seaweedfs): An S3-compatible object storage layer used for archiving raw data assets before they are transformed into the relational warehouse.
The following specifications are recommended for stable production operation within an organizational network.
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 2 Cores | 4 Cores+ |
| RAM | 4 GB | 8 GB+ |
| Storage | 10 GB | 50 GB+ (SSD preferred) |
- Standard Operation: The idle stack consumes approximately 1.3 GB of RAM.
- Peak Requirements: During the Next.js build phase or large-scale data ingestion, memory usage may temporarily increase to 3-4 GB.
The system is delivered as a containerized stack orchestrated by the Make utility. Ensure that Docker Desktop (or OrbStack) and the Make utility are installed on the host machine.
For a production-ready background deployment, execute:
make deployThis command builds the required images, initializes all microservices, executes database migrations, and verifies the health of the API layer.
- Service Status: Run
make psto view the uptime and health status of all containers. - Resource Usage: Run
docker compose topto view real-time CPU and Memory consumption across the stack. - Live Logs: Run
docker compose logs -ffor a combined stream of all application events.
Compose substitutes values from the repo-root .env (from .env.example) and from your shell. The tables below list the most common host-visible settings, their defaults, and which part of the stack uses them. For full detail (Better Auth, client/server files, image build args), see Environment variables at the end of this document.
PUBLIC_PORT=8080 DB_PORT=5433 make deploy| Variable | Component | Description | Default |
|---|---|---|---|
PUBLIC_PORT |
Proxy | Host port mapped to nginx (website and same-origin /api in the browser). |
80 |
API_SUBPATH |
Proxy / frontend / backend | URL prefix for the API (nginx strips this and forwards to FastAPI). | /api |
API_PORT |
Backend | Port FastAPI listens on inside the backend container; nginx reaches the backend on this port. | 8000 |
| Variable | Component | Description | Default |
|---|---|---|---|
DB_PORT |
Postgres | Host port for the database (e.g. Power BI, tools on the machine). | 5432 |
DB_NAME |
Postgres | Primary database name. | postgres |
DB_USER |
Postgres | Database user. | postgres |
DB_PASSWORD |
Postgres | Database password. | password |
RABBITMQ_PORT |
RabbitMQ | Host port for AMQP. | 5672 |
RABBITMQ_MGMT_PORT |
RabbitMQ | Host port for the management UI. | 15672 |
STORAGE_S3_PORT |
SeaweedFS | Host port for S3-compatible access. | 8333 |
If port 80 is already in use, point the stack at a different host port:
- Open or create the repo-root
.env(copy from.env.exampleif needed). - Set
PUBLIC_PORT(e.g.PUBLIC_PORT=8080). Compose maps${PUBLIC_PORT:-80}:80on the proxy service—you do not need to editdocker-compose.ymlfor this. - Set
BETTER_AUTH_URLto the URL users will type in the browser (include the non-default port), e.g.http://localhost:8080orhttp://10.0.0.50:8080. - Run
make deploy(or restart the stack) so containers pick up the new values.
The warehouse is optimized for direct connectivity with Power BI Desktop or Service.
- Open Power BI Desktop.
- Navigate to Get Data > PostgreSQL Database.
- Provide the following connection parameters:
| Parameter | Recommended Value |
|---|---|
| Server | localhost (Or the server's local IP address) |
| Database | postgres (Or the configured DB_NAME) |
| Authentication | Select the Database tab |
| Port | 5432 (Or the configured DB_PORT) |
| Username | postgres (Or the configured DB_USER) |
| Password | password (Or the configured DB_PASSWORD) |
Before deploying to a production organizational environment, the default credentials MUST be overridden inside the docker-compose.yml file.
The system automatically synchronizes credentials across the following service layers:
- Database Cluster:
DB_USERandDB_PASSWORDare shared between the core database, the API, and the processing workers. - Message Broker:
RABBITMQ_USERandRABBITMQ_PASSare shared between the broker and its clients. - Storage Layer:
STORAGE_KEYandSTORAGE_SECRETare shared between the file server and the ingestion engine.
make install- Full Stack:
make dev(Executes the entire stack within Docker). - Hybrid Development:
make dev-local(Executes the database and queue in Docker while running application code on the host machine).
make test| File | When |
|---|---|
.env at repo root |
Docker Compose and make dev (copy from .env.example). |
client/.env.local |
Next.js running on your machine (copy from client/.env.example). |
server/.env |
FastAPI / Celery on your machine (copy from server/.env.example; keys match server/core/config.py). |
| Variable | Meaning |
|---|---|
PUBLIC_PORT |
Host port for the website (nginx → port 80 in the container). |
API_SUBPATH |
URL prefix for the API (default /api). Nginx and the browser both use this path. |
API_PORT |
Port FastAPI listens on inside Docker; nginx sends /api traffic here. |
API_PUBLISH_HOST |
Which host address the API port is bound to on the machine (default 127.0.0.1). |
API_PUBLISH_PORT |
Which host port reaches FastAPI; make dev points Next at this. |
| Variable | Meaning |
|---|---|
DB_USER, DB_PASSWORD, DB_NAME |
Postgres user, password, and database name for the stack. |
DB_PORT |
Postgres on the host (e.g. Power BI). Next may use this when building a DB URL. |
DATABASE_URL |
Optional; full Postgres URL for Better Auth in thi-frontend. If unset, Next builds a URL from DB_*. |
| Variable | Meaning |
|---|---|
BETTER_AUTH_SECRET |
Signing secret; required when building the frontend image and when running it. |
BETTER_AUTH_URL |
Public site URL without a path (e.g. http://localhost). Default uses PUBLIC_PORT. |
NEXT_PUBLIC_BETTER_AUTH_URL |
Optional; overrides the browser auth client base. Empty = same origin as the page. |
| Variable | Meaning |
|---|---|
RABBITMQ_USER, RABBITMQ_PASS |
Broker login; backend and worker connect with these. |
RABBITMQ_PORT, RABBITMQ_MGMT_PORT |
AMQP and management UI on the host. |
STORAGE_KEY, STORAGE_SECRET |
Credentials for SeaweedFS / S3-style access. |
STORAGE_S3_PORT, STORAGE_MASTER_PORT, STORAGE_FILER_PORT |
SeaweedFS services on the host. |
You normally do not put these in .env; Compose or the Dockerfile sets them.
| Variable | Role |
|---|---|
INTERNAL_API_URL |
On thi-frontend: base URL for server-side calls to FastAPI (http://backend:… + API_SUBPATH). |
NEXT_PUBLIC_API_URL |
On thi-frontend: same value as API_SUBPATH for the browser. |
RUNNING_IN_DOCKER |
On thi-frontend: 1 so SSR uses INTERNAL_API_URL. |
DB_HOST |
On thi-frontend: Postgres hostname (default db). |
BACKEND_PORT |
In nginx template env: same as API_PORT. |
Backend user / password / host / port / dbname |
SQLAlchemy env for API and worker; Compose sets host=db and maps user/db from DB_*. |
broker_url |
AMQP URL for API and worker; Compose builds it from RABBITMQ_*. |
ORIGIN_URL |
On API/worker containers: CORS-related (config.py reads origin_url). |
USE_S3, S3_ENDPOINT, S3_KEY, S3_SECRET, S3_BUCKET |
Object storage; in Compose S3_ENDPOINT is the SeaweedFS service. |
API_SERVER_URL |
On celery: base URL to call the API (http://backend:${API_PORT}). |
Optional: set DOCKER_CONTAINER=1 instead of relying on RUNNING_IN_DOCKER for the same SSR API behavior in HttpDataService.
| Variable | Meaning |
|---|---|
NEXT_PUBLIC_API_URL |
Browser path to the API (e.g. /api). |
NEXT_DEV_PROXY_API_ORIGIN |
Full URL to FastAPI (e.g. http://127.0.0.1:8000) when the path above is relative. |
NEXT_PUBLIC_BACKEND_ORIGIN |
Full URL to FastAPI for SSE (EventSource). |
All keys are read in server/core/config.py. Common ones:
| Variable | Meaning |
|---|---|
user, password, host, port, dbname |
Postgres for SQLAlchemy. |
broker_url |
RabbitMQ URL. |
origin_url |
CORS (Settings.ORIGIN_URL). |
USE_S3, S3_ENDPOINT, S3_KEY, S3_SECRET, S3_BUCKET, S3_REGION |
Object storage. |
API_SERVER_URL |
Worker → API (default http://backend:8000). |
DLT_DESTINATION, DLT_DATASET, DUCKDB_TEMP_DIR |
ETL / DLT. |
| Variable | Meaning |
|---|---|
BETTER_AUTH_SECRET |
Build ARG and runtime env (see above). |
NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_ANON_KEY |
Build ARGs; baked into the client bundle unless you override at build time. |
NEXT_PUBLIC_API_URL |
Build-time default /api; Compose overwrites at runtime for the container. |
NEXT_JS_DISABLE_ESLINT, NEXT_TELEMETRY_DISABLED |
Builder-only. |
NODE_ENV, PORT, HOSTNAME |
Runtime Node process (production, 3000, 0.0.0.0). |