Skip to content
View aryanputta's full-sized avatar

Highlights

  • Pro

Block or report aryanputta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
aryanputta/README.md


// work

transformer-servex
Production KV cache optimization · MoE routing · IO-aware attention for long-context LLMs

cuda-netopt
ML-driven TCP/UDP packet scheduling · DQN network routing · CUDA queue scoring

AeroMimic
Behavior cloning from expert pilots · real-time MAV autonomy · onboard inference stack

aerosurrogate-control-stack
CFD surrogate modeling · constrained optimization · robustness replacing FEM solvers



// stack



// metrics


Pinned Loading

  1. KVCacheX KVCacheX Public

    Memory-aware LLM inference optimizer for KV cache compression, eviction, and scheduling.

    Python

  2. adaptive-compute-runtime adaptive-compute-runtime Public

    Adaptive C++/CUDA runtime that profiles workloads at submission time and dynamically routes to CPU, GPU, or batched execution based on arithmetic intensity and transfer cost

    C++

  3. Helios Helios Public

    Hardware-aware compute runtime in C++ and CUDA for real sparse, dense, and graph workloads.

    C++

  4. IBM/aiu-trace-analyzer IBM/aiu-trace-analyzer Public

    A tool to post-process json trace files for IBM-AIU performance analysis. It enhances the traces with additional statistics extracted from the trace data itself and (optionally) by combining it wit…

    Python 11 6