I'm an AI/BI Engineer at Bluecascade and an active researcher in multi-modal learning, currently a 4th-semester BS Artificial Intelligence student (CGPA 3.81/4.0) at Emerson University, Multan. I build systems that don't just predict β they reason, explain, and generalize.
My work spans the full AI lifecycle: from novel architecture design and model training to MLOps pipelines, BI dashboards, workflow automation, and production deployment. I've contributed to PhD-level multi-modal AI research integrating vision, language, and audio β and I'm currently pursuing work on conflict-aware continual multi-modal learning for medical AI.
"The best model is not the one with the most parameters β it's the one that generalizes to the real world."
Full-time | Real-world production systems
- Architecting and deploying end-to-end AI and Business Intelligence solutions for clients
- Automated an intra-departmental decision-making pipeline using n8n + Airtable + JavaScript, slashing decision turnaround time from ~60 seconds to under 20 seconds β a >65% latency reduction in a live production environment
- Building BI dashboards and data flows that surface actionable intelligence from raw operational data
- Managing ML model deployment with MLOps practices including CI/CD integration and containerization
Independent | Advanced AI Research
- Contributed to a PhD-level research project focused on multi-modal learning combining vision and NLP
- Designed and implemented a novel Transformer-based architecture for dual-image medical report generation featuring:
- Bilinear Cross-View Fusion β outer-product feature interaction between two image views for automatic importance weighting, replacing naive concatenation
- Gated Cross-Attention Decoder β a learnable scalar gate (
Ο(w)) per decoder layer that dynamically balances self-attention vs. cross-attention contributions, allowing the model to learn how much to attend to image features vs. prior generated tokens - Three independent Transformer decoders (indication / findings / impression) sharing a single fused image encoder β enabling section-aware, clinically structured report generation
- Dual decoding strategies β greedy decoding for speed and beam search (configurable beam size) for quality, evaluated against BLEU, METEOR, ROUGE-1/L, CIDEr, and RadGraph-F1 (clinical correctness metric)
- Integrated CheXNet (DenseNet121) pretrained on chest X-rays as the vision backbone with support for ImageNet weights, custom CheXNet weights, and fine-tuning modes
- Implemented Grad-CAM explainability for visual grounding of model attention on X-ray images
- Used Bio_ClinicalBERT tokenizer for clinical text tokenization, with masked loss and masked accuracy to handle variable-length padded sequences correctly
- Built ML/DL pipelines on real-world datasets covering financial anomaly detection, medical image cancer classification, and NLP-based text classification
- Deployed models via Flask REST APIs and optimized data preprocessing pipelines for production throughput
- Worked on stock market anomaly detection using time-series modeling
- Built and deployed ML models for real-time restaurant data analysis
- Improved accuracy of recommendation systems using regression and classification algorithms with feature engineering on structured tabular data
- Developed a QA chatbot and language translation models using NLP pipelines
- Created a prompt-to-music generation tool leveraging generative AI
- Implemented real-time object detection using YOLO architecture
Investigating how multi-modal AI models (vision + language) can learn sequentially from new medical data without catastrophically forgetting prior knowledge β with a specific focus on conflict detection between modalities during continual learning. This targets a core open problem in clinical AI: how to keep models up-to-date in dynamic hospital environments without full retraining.
Multi-Modal Deep Learning | Medical AI | NLP
A full Transformer-based system that takes two X-ray views (frontal + lateral) and generates structured radiology reports (indication, findings, impression) with clinical correctness evaluation.
Architectural novelties:
- Bilinear cross-view image fusion (outer-product interaction)
- Gated cross-attention with per-layer learnable Ξ± gate
- Triple-decoder architecture for section-aware generation
- Evaluated with RadGraph-F1 for entity-relation clinical accuracy
TensorFlow Bio_ClinicalBERT CheXNet Grad-CAM Beam Search RadGraph
VSCode Extension | NLP | Developer Tools
A VSCode extension that reviews code in real-time using Hugging Face models for linting suggestions and optimization hints β bringing AI pair-programming directly into the editor.
JavaScript Python VS Code API Hugging Face NLP
Flask Web App | NLP | Education AI
An interactive web application that solves and explains math problems from basic arithmetic to advanced calculus, and a chatbot interface.
Flask HTML Data Visualization NLP Transfer Learning
| Certification | Issuer |
|---|---|
| Artificial Intelligence, Deep Learning & Communication | NAVTTC |
| AI Agents & Transformers | Hugging Face |
- π¬ Conflict-Aware Continual Multi-Modal Learning β medical AI research (active)
- π Releasing new open-source projects on GitHub soon
- π¦ Deepening MLOps practices β model versioning, monitoring, and deployment pipelines
- π§© Exploring agentic AI systems using LLMs with tool-use and memory