Anshuman iemAnshuman

Hi, I'm Anshuman Agrawal

HPC & Deep Learning Systems Researcher
Optimizing the "plumbing" of AI — from kernels to clusters.

Current Focus

I research low-level optimization for Deep Learning workloads, focusing on bridging the gap between high-level PyTorch APIs and hardware reality. My work involves:

Kernel Optimization: Writing custom OpenAI Triton kernels to beat eager execution (Fused Attention, Softmax).
Quantization: Implementing 4-bit/INT8 inference pipelines (AWQ/GPTQ) for deploying 7B+ models on consumer GPUs.
Distributed Systems: Analyzing NCCL communication primitives and distributed training bottlenecks (DDP/FSDP).

Tech Stack

Domain	Tools & Frameworks
HPC & Kernels	`OpenAI Triton` · `CUDA (Concepts)` · `NVIDIA Nsight Compute` · `TensorRT`
Deep Learning	`PyTorch` · `HuggingFace (Transformers/PEFT)` · `AutoGPTQ` · `ONNX Runtime`
Infrastructure	`Docker` · `Linux (Kernel/eBPF)` · `Bash` · `Slurm`
Core	`Python (AsyncIO)` · `C++` · `PostgreSQL` · `NumPy`

Active Experiments

high-performance-deep-learning: My primary research repo containing custom Triton kernels, quantization benchmarks, and distributed system simulations.
Neuro-Hedge: A vectorized Monte Carlo simulation engine for Reinforcement Learning.

Email • Research Blog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anshuman iemAnshuman

Achievements

Achievements

Highlights

Block or report iemAnshuman

Hi, I'm Anshuman Agrawal

Current Focus

Tech Stack

Active Experiments

Popular repositories Loading

Uh oh!