Skip to content
View drkaushiksarkar's full-sized avatar
💭
Building foundation models and multi-agent systems at scale
💭
Building foundation models and multi-agent systems at scale

Block or report drkaushiksarkar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
drkaushiksarkar/README.md

Kaushik Sarkar

AI and product leader building intelligent systems across health, development finance, and climate sectors. Two decades scaling technology-driven products across 17 countries.

Foundation models. Multi-agent orchestration. Billion-row data infrastructure.

LinkedIn X


What I build

Agentic AI -- Multi-agent orchestration with MCP servers, tool-calling agents, autonomous task decomposition, and edge deployment for low-connectivity field surveillance

Foundation models -- End-to-end training: continued pretraining, SFT, DPO, mixture-of-experts routing. Distributed training on multi-GPU clusters with DeepSpeed and FSDP

Data at scale -- Apache Iceberg lakehouse federating 1.78B rows from 85 organizations (WHO, World Bank, NOAA, IHME, OECD). 268M vector embeddings. 33M knowledge graph triples

Production systems -- Disease early warning platforms operational in national health programs. Malaria forecasting with Terraform-managed infrastructure. Real-time surveillance dashboards


Technical surface

AI/ML and Foundation Models

PyTorch LLM Fine-Tuning LoRA/QLoRA DPO/RLHF DeepSpeed vLLM ONNX Hugging Face Weights & Biases scikit-learn XGBoost TensorFlow MLflow

Agentic AI and RAG

MCP Servers LangChain LangGraph LlamaIndex RAG Multi-Agent n8n OpenSearch FAISS Pinecone

Cloud and Infrastructure

AWS SageMaker Bedrock Glue Athena Fargate Lambda Terraform Docker Kubernetes GitHub Actions

Data Engineering

Apache Iceberg Apache Airflow Apache Spark Apache Kafka PostgreSQL Redis DuckDB Pandas Polars dbt

Languages and Frameworks

Python TypeScript FastAPI Next.js React Node.js GraphQL R

MLOps and Governance

MLOps AI Governance Responsible AI Model Monitoring Prometheus Grafana


GitHub activity

Contribution Activity
Commits Issues Pull requests Repositories
Commits Issues PRs Repos

Organizations

IMACS   Foundation models, multi-agent platforms, SAGE engine -- 22 repositories

FHF   Climate-informed disease forecasting and early warning -- 30 repositories, 8 contributors

UDH   Digital health infrastructure and open-source tooling


Selected work

Platform and AI systems

Repository Description
MCP imacs-sage SAGE -- multi-agent orchestration platform for global health intelligence
FM imacs-sage-playground Foundation model playground with 7B parameter MoLE expert routing
Agents AI-Sandbox RAG pipelines, multi-agent prototyping, tool-calling experimentation
ML malaria-intelligence-platform Multi-country malaria analytics with climate-driven forecasting

Data infrastructure and production

Repository Description
Data sage-warehouse-master Analytical warehouse with MCP server, enterprise API, and FM training pipeline
MCP spectra-enterprise Health intelligence with multi-agent orchestration and autonomous data fusion
TS disease-surveillance-platform Full-stack autonomous surveillance with agent-driven anomaly detection
Terraform malaria-forecasting-system Autonomous pipeline orchestration with self-healing deployment
ERA5 climate-disease-forecast Agent-based ensemble modeling with ERA5 reanalysis integration

Research

Three tracks targeting NeurIPS 2026:

Paper Focus
EGDA Novel training paradigm for domain-specialized foundation models
CHIB Multi-dimensional evaluation benchmark for health AI
MoLE Expert routing in mixture-of-experts architectures

Prior work established AI-driven early warning systems for infectious disease outbreaks across low and middle-income countries -- now operational in multiple national health programs.


Scale

1.78B data rows under management 268M vector embeddings
33M knowledge graph triples 85 source organizations federated
40,000+ indicators catalogued 58,000+ geographic entities
17 countries served 1807--2100 temporal coverage

Delhi, India

Pinned Loading

  1. climate-disease-forecast climate-disease-forecast Public

    Climate-disease correlation engine linking environmental data to health outcomes

    Python

  2. global-ews-catalogue global-ews-catalogue Public

    Systematic global catalogue of 79 early warning systems with capability mapping, coverage analysis, and gap identification

    Python

  3. spectra-enterprise spectra-enterprise Public

    High-performance spectral analysis platform for real-time disease pattern detection and forecasting

    Python

  4. medical-imaging-dx medical-imaging-dx Public

    Enterprise medical imaging diagnostics with automated triage, CNN-based classification, and clinical decision support integration

    Python

  5. disease-surveillance-platform disease-surveillance-platform Public

    Distributed disease surveillance system with real-time alerting and geospatial analytics

    Python

  6. health-analytics-studio health-analytics-studio Public

    Interactive analytics studio for health data exploration, visualization and insight generation

    Python