Garbage Classification using CNN

A deep learning project for classifying garbage images into 6 categories: cardboard, glass, metal, paper, plastic, and trash. The model is built using PyTorch Lightning with a custom CNN architecture optimized through Bayesian hyperparameter search.

Dataset

Classes: 6 (cardboard, glass, metal, paper, plastic, trash)
Image Size: 128×128 pixels
Split: train/validation/test

Model Architecture

Custom SimpleCNN with the following configurable components:

Convolutional layers with flexible feature strategies (same, doubling, halving)
Batch normalization support
Multiple activation functions (ReLU, GELU, SiLU, Mish)
Dropout regularization
Max pooling
Dense layers with configurable neurons

Best Model Configuration:

Total Parameters: 547,334 (all trainable)
Model Size: 2.19 MB
Base Features: 64
Layers: 5 convolutional layers
Filter Size: 5×5
Strategy: same (constant feature maps across layers)
Conv Activation: Mish
Dense Neurons: 128
Dense Activation: ReLU
Batch Normalization: Enabled
Dropout: 0.2

Hyperparameter Optimization

Search Strategy

Bayesian optimization with early termination (Hyperband) to efficiently explore the hyperparameter space.

Swept Hyperparameters:

base_features: [16, 32, 64]
strategy: ['same', 'doubling', 'halving']
filter_size: [3, 5]
conv_activation: ['relu', 'gelu', 'silu', 'mish']
dense_activation: ['relu', 'gelu', 'silu', 'mish']
num_dense: [64, 128, 256, 512]
include_batchnorm: [true, false]
dropout: [0.2, 0.3, 0.4, 0.5]
lr: log-uniform [0.0001, 0.01]
batch_size: [16, 32, 64]
use_augmentation: [true, false]

Fixed Parameters:

Max epochs: 20
Early stopping: Hyperband (min_iter=5)
Augmentation: Finetuning*

Optimization Strategy

Bayesian Optimization: Instead of random or grid search, used Bayesian optimization to intelligently sample the hyperparameter space based on previous results
Early Termination: Implemented Hyperband to stop poorly performing runs after 5 epochs, reducing computational cost
Run Cap: Limited to 50 runs to balance exploration and resource usage
Metric: Maximized validation accuracy

Correlation Analysis

Hyperparameter	Correlation with val_accuracy
lr	-0.330
dropout	-0.239
num_dense	-0.100
base_features	+0.011

Key Insights:

Learning Rate: Strong negative correlation (-0.33) suggests that lower learning rates (around 0.0001) perform better for this task
Dropout: Moderate negative correlation (-0.24) indicates that lower dropout rates are preferred, with 0.2 being optimal
Dense Neurons: Weak negative correlation suggests diminishing returns beyond 128 neurons
Base Features: Near-zero correlation indicates that model capacity at the convolutional level is less critical than other factors

Results

Best Run: `expert-sweep-2`

Training Performance (Epoch 19):

Train Accuracy: 95.98%
Train Loss: 0.134

Per-Class Training F1-Scores:

Cardboard: 0.973
Paper: 0.980
Plastic: 0.953
Glass: 0.951
Trash: 0.944
Metal: 0.942

Validation Performance:

Validation Accuracy: 80.04%
Validation Loss: 0.667

Per-Class Validation F1-Scores:

Paper: 0.875
Cardboard: 0.872
Metal: 0.808
Trash: 0.769
Glass: 0.751
Plastic: 0.709

Test Set Performance

Metric	Value
Test Accuracy	85.46%
Test Loss	0.409

Per-Class Test Metrics:

Class	Precision	Recall	F1-Score
Cardboard	0.914	0.925	0.919
Paper	0.972	0.883	0.926
Glass	0.779	0.907	0.838
Plastic	0.790	0.865	0.826
Metal	0.853	0.725	0.784
Trash	0.769	0.690	0.727

Key Observations

1. Generalization Gap

The model shows a ~10% performance gap between training (95.98%) and validation (80.04%), indicating some overfitting. However, test accuracy (85.46%) exceeds validation accuracy, suggesting the model generalizes reasonably well.

2. Class-Specific Performance

Best Performance: Paper (F1: 0.926) and Cardboard (F1: 0.919) are classified most accurately
Challenging Classes: Trash (F1: 0.727) and Metal (F1: 0.784) are harder to classify, likely due to visual similarity with other categories

3. Activation Functions

Mish activation for convolutional layers paired with ReLU for dense layers provided optimal performance, suggesting that smoother activations benefit feature extraction while standard ReLU suffices for classification.

4. Architecture Design

The 'same' strategy (constant feature maps) with 64 base features proved sufficient, indicating that aggressive feature expansion is not necessary for this dataset size.

5. Regularization Trade-offs

Lower dropout (0.2) performed best despite the generalization gap, suggesting that the model benefits from retaining more information during training rather than aggressive regularization.

Scientific Report & Explainability

For a deep dive into the inner workings of our Convolutional Neural Networks, please refer to the detailed Scientific Report.

Key highlights include:

Feature Map Analysis: Visualizing how the first layer detects edges and boundaries.
Guided Backpropagation: Understanding which parts of an image contribute most to classification decisions (Model Explainability).
Training Dynamics: A comparative study of hyperparameter effects on both scratch and fine-tuned models using parallel coordinates and accuracy distributions.

> Read the Full Scientific Report

Finetuning with Pretrained Models

We further improved performance by finetuning a pre-trained MobileNetV3 Small model.

Best Run: `wild-sweep-6`

Validation Accuracy: 98.17%
Hyperparameters:
- lr: 0.000398
- batch_size: 64
- img_size: 224
- freeze_strategy: finetune_all
- backbone_lr_factor: 0.1

Test Set Performance (Finetuned)

Metric	Value
Test Accuracy	98.41%
Test Loss	0.075

Per-Class Test Metrics:

Class	Precision	Recall	F1-Score
Cardboard	1.000	1.000	1.000
Paper	0.984	1.000	0.992
Glass	0.980	1.000	0.990
Metal	0.987	0.975	0.981
Plastic	0.969	0.990	0.979
Trash	1.000	0.828	0.906

Model Comparison

Model	Test Accuracy	Best F1 Class	Worst F1 Class
SimpleCNN (Scratch)	85.46%	Paper (0.93)	Trash (0.73)
MobileNetV3 (Finetuned)	98.41%	Cardboard (1.00)	Trash (0.91)

Sample Predictions

10×3 grid showing sample test images with ground truth and predictions. Green titles indicate correct predictions, red indicates misclassifications.

Future Work

Experiment with Larger Models (ResNet50, EfficientNetV2)
Advanced data augmentation strategies
Ensemble methods
Addressing class imbalance if present

Requirements

pytorch-lightning
torch
torchvision
wandb
scikit-learn
matplotlib
pandas
torchinfo

Usage

Training

wandb sweep config.yaml
wandb agent 23f2001173-indian-institute-of-technology-madras/garbage_clf-src_scripts/rzpn3bhn --count 50

Inference

from src.lightning.model import ModelLightning
from src.models.simple_cnn import SimpleCNN

model = SimpleCNN(...)
pl_model = ModelLightning.load_from_checkpoint(
    'checkpoints/best_model_epoch=12_val_accuracy=0.8049.ckpt',
    model=model,
    idx_to_class=idx_to_class
)

Project Structure

garbage_clf/
├── config.yaml              # Hyperparameter sweep configuration
├── checkpoints/             # Model checkpoints
├── data/                    # Dataset (train/val/test)
├── notebooks/              # Analysis and visualization notebooks
├── src/
│   ├── data/               # Data loading and preprocessing
│   ├── lightning/          # PyTorch Lightning modules
│   ├── models/             # Model architectures
│   └── scripts/            # Training scripts
└── wandb/                  # Weights & Biases logs

License

MIT

Developed using PyTorch Lightning and optimized with Weights & Biases

Performance Benchmark

We compared the inference performance of the best fine-tuned model across different environments and optimization levels.

Benchmark Setup

Batch Size: 64
Image Size: 224x224
Precision: FP32 (PyTorch/ONNX), FP16 (ONNX/TensorRT), INT8 (TensorRT)
Device: NVIDIA GPU (RTX 4050 Laptop) vs CPU

Results

Model	Latency (ms/batch)	Throughput (img/sec)	Speedup vs PyTorch GPU
PyTorch (CPU)	323.84	198	0.07x
PyTorch (GPU)	23.78	2692	1.0x (Baseline)
ONNX FP32	22.77	2810	1.04x
ONNX FP16	11.70	5468	2.03x
TensorRT FP16	9.27	6905	2.56x
TensorRT INT8	9.44	6778	2.51x

Key Observations

TensorRT Acceleration: TensorRT FP16 provides the best performance, achieving a 2.5x speedup over optimized PyTorch GPU inference. It is highly recommended for deployment.
ONNX Runtime: Converting to ONNX and using FP16 mode yields a significant 2x speedup over PyTorch, making it a portable and fast alternative if TensorRT is not available.
Quantization Impact:
- FP16: Maintained high accuracy (~98%) while doubling throughput.
- INT8: While theoretically faster, on this specific hardware/batch size configuration, it did not provide a speedup over FP16 and suffered a significant accuracy drop (to ~64%). Therefore, FP16 is preferred for this model.
CPU vs GPU: GPU inference is approximately 13-35x faster than CPU inference, highlighting the necessity of hardware acceleration for real-time processing.

Deployment Recommendation

For optimal performance and reliability, deploy the TensorRT FP16 engine. If TensorRT is not feasible, ONNX FP16 is a robust runner-up.

GitHub Repository: https://github.com/nevrohelios/garbage_clf

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
notebooks		notebooks
src		src
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
config_finetune.yaml		config_finetune.yaml
report.md		report.md

Folders and files

Latest commit

History

Repository files navigation

Garbage Classification using CNN

Table of Contents

Dataset

Model Architecture

Hyperparameter Optimization

Search Strategy

Optimization Strategy

Correlation Analysis

Results

Best Run: expert-sweep-2

Test Set Performance

Key Observations

1. Generalization Gap

2. Class-Specific Performance

3. Activation Functions

4. Architecture Design

5. Regularization Trade-offs

Scientific Report & Explainability

Finetuning with Pretrained Models

Best Run: wild-sweep-6

Test Set Performance (Finetuned)

Model Comparison

Sample Predictions

Future Work

Requirements

Usage

Training

Inference

Project Structure

License

Performance Benchmark

Benchmark Setup

Results

Key Observations

Deployment Recommendation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Best Run: `expert-sweep-2`

Best Run: `wild-sweep-6`

Packages