Skip to content
View iemAnshuman's full-sized avatar

Highlights

  • Pro

Block or report iemAnshuman

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
iemAnshuman/README.md

Hi, I'm Anshuman Agrawal

HPC & Deep Learning Systems Researcher
Optimizing the "plumbing" of AI — from kernels to clusters.


Current Focus

I research low-level optimization for Deep Learning workloads, focusing on bridging the gap between high-level PyTorch APIs and hardware reality. My work involves:

  • Kernel Optimization: Writing custom OpenAI Triton kernels to beat eager execution (Fused Attention, Softmax).
  • Quantization: Implementing 4-bit/INT8 inference pipelines (AWQ/GPTQ) for deploying 7B+ models on consumer GPUs.
  • Distributed Systems: Analyzing NCCL communication primitives and distributed training bottlenecks (DDP/FSDP).

Tech Stack

Domain Tools & Frameworks
HPC & Kernels OpenAI Triton · CUDA (Concepts) · NVIDIA Nsight Compute · TensorRT
Deep Learning PyTorch · HuggingFace (Transformers/PEFT) · AutoGPTQ · ONNX Runtime
Infrastructure Docker · Linux (Kernel/eBPF) · Bash · Slurm
Core Python (AsyncIO) · C++ · PostgreSQL · NumPy

Active Experiments

  • high-performance-deep-learning: My primary research repo containing custom Triton kernels, quantization benchmarks, and distributed system simulations.
  • Neuro-Hedge: A vectorized Monte Carlo simulation engine for Reinforcement Learning.

EmailResearch Blog

Popular repositories Loading

  1. neuro-ranker-distill neuro-ranker-distill Public

    Python 8 1

  2. Todoist-CLI Todoist-CLI Public

    Python 2

  3. Data_Structures Data_Structures Public

    C 1

  4. AI-Projects AI-Projects Public

    Jupyter Notebook 1

  5. oop-uni-codes oop-uni-codes Public

    Java 1

  6. f1-podium-predictor f1-podium-predictor Public

    1