Skip to content
View leonado10000's full-sized avatar
Take a break
Take a break

Block or report leonado10000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
leonado10000/README.md

Rahul Jangra

Applied Machine Learning Engineer focused on LLM training dynamics, evaluation, and inference efficiency.

I work on the parts of ML systems that usually break first:
training stability, memory bottlenecks, retrieval failures, and evaluation blind spots.


What I Actually Work On

Large Language Models

  • Instruction tuning (SFT, DPO) and failure analysis
  • Attention behavior, KV caching, and long-context issues
  • Dataset quality, collapse modes, and reward hacking

ML Systems & Performance

  • PyTorch internals (autograd, checkpointing, mixed precision)
  • Distributed training (DDP, FSDP, memory tradeoffs)
  • Inference optimization (vLLM-style serving, quantization, latency vs throughput)

Evaluation (Underrated, Critical)

  • Task-specific evaluation harnesses
  • Regression detection across model versions
  • Retrieval quality diagnostics for RAG systems

Selected Work

Instruction Tuning & Model Behavior Analysis

  • Analyzed instruction collapse during fine-tuning on noisy datasets
  • Compared SFT vs DPO using task-specific win-rate metrics
  • Identified normalization and RoPE scaling effects on training stability

Retrieval-Augmented Generation (RAG)

  • Designed retrieval pipelines with hybrid search (dense + sparse)
  • Evaluated hallucination sources caused by embedding drift and chunking
  • Built retrieval-quality metrics beyond end-task accuracy

Training & Inference Optimization

  • Reduced GPU memory usage using gradient checkpointing and mixed precision
  • Benchmarked inference latency improvements via KV cache reuse and quantization
  • Investigated tradeoffs between throughput, batch size, and response time

Technical Stack (Weighted)

Core

  • Python, PyTorch (deep internals focus)

LLMs

  • Transformers, attention variants, KV caching
  • SFT, DPO, reward modeling concepts

Systems

  • Distributed training (DDP, FSDP)
  • Mixed precision, memory optimization

Inference

  • Quantization (INT8 / GPTQ-style)
  • Serving and performance tuning

Evaluation

  • Custom metrics, regression testing
  • Dataset and retrieval diagnostics

Philosophy

Most ML failures are not caused by models being too small.
They come from poor evaluation, unstable training, and unexamined assumptions.

I care about:

  • Why something fails
  • How to detect it early
  • How to fix it without scaling blindly

Notes

  • Repositories here are focused experiments, not tutorials
  • Each project prioritizes analysis, metrics, and failure cases
  • I value depth over breadth and signal over noise

Pinned Loading

  1. CodeBunny CodeBunny Public

    Python

  2. Portfolio Portfolio Public

    JavaScript

  3. POS-Webapp POS-Webapp Public

    private invoice app

    JavaScript

  4. Research Research Public

    Jupyter Notebook

  5. SchoolMate SchoolMate Public

    HTML