This repository contains completed assignments for the Visual Learning and Recognition (VLR) course, covering various computer vision and deep learning topics.
The repository contains three comprehensive assignments:
-
Assignment 1: Object Detection using FCOS (Fully Convolutional One-Stage Object Detection)
- Implementation of the FCOS detector with FPN (Feature Pyramid Network)
- Tasks include feature extraction, box regression, and centerness prediction
- Code for training and testing the detector on the PASCAL VOC dataset
-
Assignment 2: Generative Models
- GAN implementation with different loss functions (Original GAN, LSGAN, WGAN-GP)
- VAE implementation with different latent space dimensions and beta values
- Diffusion models with DDPM and DDIM sampling techniques
-
Assignment 3: Transformers for Vision
- Transformer for image captioning using the COCO dataset
- Vision Transformer (ViT) implementation for image classification
Please refer to the individual README files in each assignment directory for:
- Detailed setup instructions and dependencies
- Dataset preparation and requirements
- Implementation details
- Training and evaluation procedures