Interactive web dashboard for analyzing Amazon CloudFront logs across two Indiana University platforms:
- HRA (humanatlas.io) — Human Reference Atlas tool usage analytics
- CNS (cns.iu.edu) — Cyberinfrastructure for Network Science website analytics
Live: hra-viz.vercel.app
| Layer | Technology |
|---|---|
| Frontend | Next.js 16 (App Router) + TypeScript + Tailwind CSS v4 |
| Charts | Apache ECharts 6 via echarts-for-react |
| Data Processing | DuckDB (SQL on Parquet) + Python |
| ML Pipeline | Prophet, scikit-learn, NLP clustering |
| External Data | PubMed API, GitHub API (cns-iu/cns-website repo) |
| Deployment | Vercel (static export) |
app/
page.tsx # Landing page (pick HRA or CNS)
hra/ # HRA dashboard (7 pages)
page.tsx # Overview
tools/page.tsx # Usage + Reliability
features/page.tsx # Tool Behaviour
geo/page.tsx # Geography
journeys/page.tsx # Journeys & Opportunities
insights/page.tsx # Data-Driven Insights
ml/page.tsx # ML Lab
cns/ # CNS dashboard (6 pages)
page.tsx # Overview
traffic/page.tsx # Traffic Trends
content/page.tsx # Content & Documents
geo/page.tsx # Geography
errors/page.tsx # Errors + Security
referrers/page.tsx # Referrer Analysis
components/ # Shared components
charts/ # 70+ chart components
Navbar.tsx # Site toggle + nav links
lib/chartTheme.ts # Colors, tooltip styles, helpers
data_processing/
generate_hra_data.py # HRA: DuckDB SQL -> 51 JSON files
hra_ml_insights.py # HRA: Prophet forecasts, clustering, churn
fetch_hra_publications.py # HRA: PubMed API -> publications.json
generate_cns_data.py # CNS: DuckDB SQL -> 25 JSON files
fetch_cns_github.py # CNS: GitHub API -> pubs, events, funding, news
run_all.sh # Run entire pipeline
requirements.txt # Python dependencies
tests/
test_data_integrity.py # 58 pytest tests (data shapes, cross-checks, file existence)
data/ # Place parquet files here (auto-detected by scripts)
hra/ # HRA CloudFront parquet logs
cns/ # CNS CloudFront parquet logs
public/data/
hra/ # HRA JSON output (generated by pipeline)
cns/ # CNS JSON output (generated by pipeline)
# Install
npm install
pip install -r data_processing/requirements.txt
# Run full pipeline (data + build)
./data_processing/run_all.sh
# Or selectively
./data_processing/run_all.sh --hra-only
./data_processing/run_all.sh --cns-only
./data_processing/run_all.sh --skip-fetch # skip PubMed/GitHub API calls
# Development
npm run dev
# Testing (58 data integrity tests)
pytest tests/ -v
pytest tests/ -k "hra" # HRA only
pytest tests/ -k "cns" # CNS only| Source | Script | Output |
|---|---|---|
| HRA CloudFront logs (15.8M rows) | generate_hra_data.py |
51 JSON files |
| HRA ML pipeline | hra_ml_insights.py |
10 JSON files |
| PubMed (NCBI E-utilities) | fetch_hra_publications.py |
54 publications |
| CNS CloudFront logs (16.4M rows) | generate_cns_data.py |
25 JSON files |
| cns-iu/cns-website GitHub repo | fetch_cns_github.py |
405 pubs, 999 events, 81 grants, 187 news |