Skip to content

deka27/hra-viz

Repository files navigation

HRA + CNS Analytics Dashboard

Interactive web dashboard for analyzing Amazon CloudFront logs across two Indiana University platforms:

  • HRA (humanatlas.io) — Human Reference Atlas tool usage analytics
  • CNS (cns.iu.edu) — Cyberinfrastructure for Network Science website analytics

Live: hra-viz.vercel.app

Tech Stack

Layer Technology
Frontend Next.js 16 (App Router) + TypeScript + Tailwind CSS v4
Charts Apache ECharts 6 via echarts-for-react
Data Processing DuckDB (SQL on Parquet) + Python
ML Pipeline Prophet, scikit-learn, NLP clustering
External Data PubMed API, GitHub API (cns-iu/cns-website repo)
Deployment Vercel (static export)

Project Structure

app/
  page.tsx                 # Landing page (pick HRA or CNS)
  hra/                     # HRA dashboard (7 pages)
    page.tsx               #   Overview
    tools/page.tsx         #   Usage + Reliability
    features/page.tsx      #   Tool Behaviour
    geo/page.tsx           #   Geography
    journeys/page.tsx      #   Journeys & Opportunities
    insights/page.tsx      #   Data-Driven Insights
    ml/page.tsx            #   ML Lab
  cns/                     # CNS dashboard (6 pages)
    page.tsx               #   Overview
    traffic/page.tsx       #   Traffic Trends
    content/page.tsx       #   Content & Documents
    geo/page.tsx           #   Geography
    errors/page.tsx        #   Errors + Security
    referrers/page.tsx     #   Referrer Analysis
  components/              # Shared components
    charts/                #   70+ chart components
    Navbar.tsx             #   Site toggle + nav links
  lib/chartTheme.ts        # Colors, tooltip styles, helpers

data_processing/
  generate_hra_data.py     # HRA: DuckDB SQL -> 51 JSON files
  hra_ml_insights.py       # HRA: Prophet forecasts, clustering, churn
  fetch_hra_publications.py # HRA: PubMed API -> publications.json
  generate_cns_data.py     # CNS: DuckDB SQL -> 25 JSON files
  fetch_cns_github.py      # CNS: GitHub API -> pubs, events, funding, news
  run_all.sh               # Run entire pipeline
  requirements.txt         # Python dependencies

tests/
  test_data_integrity.py   # 58 pytest tests (data shapes, cross-checks, file existence)

data/                        # Place parquet files here (auto-detected by scripts)
  hra/                     # HRA CloudFront parquet logs
  cns/                     # CNS CloudFront parquet logs

public/data/
  hra/                     # HRA JSON output (generated by pipeline)
  cns/                     # CNS JSON output (generated by pipeline)

Quick Start

# Install
npm install
pip install -r data_processing/requirements.txt

# Run full pipeline (data + build)
./data_processing/run_all.sh

# Or selectively
./data_processing/run_all.sh --hra-only
./data_processing/run_all.sh --cns-only
./data_processing/run_all.sh --skip-fetch   # skip PubMed/GitHub API calls

# Development
npm run dev

# Testing (58 data integrity tests)
pytest tests/ -v
pytest tests/ -k "hra"       # HRA only
pytest tests/ -k "cns"       # CNS only

Data Sources

Source Script Output
HRA CloudFront logs (15.8M rows) generate_hra_data.py 51 JSON files
HRA ML pipeline hra_ml_insights.py 10 JSON files
PubMed (NCBI E-utilities) fetch_hra_publications.py 54 publications
CNS CloudFront logs (16.4M rows) generate_cns_data.py 25 JSON files
cns-iu/cns-website GitHub repo fetch_cns_github.py 405 pubs, 999 events, 81 grants, 187 news

About

HRA Analytics Dashboard

Resources

Stars

Watchers

Forks

Contributors