A deep learning project for classifying garbage images into 6 categories: cardboard, glass, metal, paper, plastic, and trash. The model is built using PyTorch Lightning with a custom CNN architecture optimized through Bayesian hyperparameter search.
- Dataset
- Model Architecture
- Hyperparameter Optimization
- Results
- Key Observations
- Scientific Report & Explainability
- Finetuning with Pretrained Models
- Performance Benchmark
- Sample Predictions
- Future Work
- Requirements
- Usage
- Project Structure
- License
- Classes: 6 (cardboard, glass, metal, paper, plastic, trash)
- Image Size: 128×128 pixels
- Split: train/validation/test
Custom SimpleCNN with the following configurable components:
- Convolutional layers with flexible feature strategies (same, doubling, halving)
- Batch normalization support
- Multiple activation functions (ReLU, GELU, SiLU, Mish)
- Dropout regularization
- Max pooling
- Dense layers with configurable neurons
Best Model Configuration:
- Total Parameters: 547,334 (all trainable)
- Model Size: 2.19 MB
- Base Features: 64
- Layers: 5 convolutional layers
- Filter Size: 5×5
- Strategy: same (constant feature maps across layers)
- Conv Activation: Mish
- Dense Neurons: 128
- Dense Activation: ReLU
- Batch Normalization: Enabled
- Dropout: 0.2
Bayesian optimization with early termination (Hyperband) to efficiently explore the hyperparameter space.
Swept Hyperparameters:
base_features: [16, 32, 64]strategy: ['same', 'doubling', 'halving']filter_size: [3, 5]conv_activation: ['relu', 'gelu', 'silu', 'mish']dense_activation: ['relu', 'gelu', 'silu', 'mish']num_dense: [64, 128, 256, 512]include_batchnorm: [true, false]dropout: [0.2, 0.3, 0.4, 0.5]lr: log-uniform [0.0001, 0.01]batch_size: [16, 32, 64]use_augmentation: [true, false]
Fixed Parameters:
- Max epochs: 20
- Early stopping: Hyperband (min_iter=5)
- Augmentation: Finetuning*
- Bayesian Optimization: Instead of random or grid search, used Bayesian optimization to intelligently sample the hyperparameter space based on previous results
- Early Termination: Implemented Hyperband to stop poorly performing runs after 5 epochs, reducing computational cost
- Run Cap: Limited to 50 runs to balance exploration and resource usage
- Metric: Maximized validation accuracy
| Hyperparameter | Correlation with val_accuracy |
|---|---|
| lr | -0.330 |
| dropout | -0.239 |
| num_dense | -0.100 |
| base_features | +0.011 |
Key Insights:
- Learning Rate: Strong negative correlation (-0.33) suggests that lower learning rates (around 0.0001) perform better for this task
- Dropout: Moderate negative correlation (-0.24) indicates that lower dropout rates are preferred, with 0.2 being optimal
- Dense Neurons: Weak negative correlation suggests diminishing returns beyond 128 neurons
- Base Features: Near-zero correlation indicates that model capacity at the convolutional level is less critical than other factors
Training Performance (Epoch 19):
- Train Accuracy: 95.98%
- Train Loss: 0.134
Per-Class Training F1-Scores:
- Cardboard: 0.973
- Paper: 0.980
- Plastic: 0.953
- Glass: 0.951
- Trash: 0.944
- Metal: 0.942
Validation Performance:
- Validation Accuracy: 80.04%
- Validation Loss: 0.667
Per-Class Validation F1-Scores:
- Paper: 0.875
- Cardboard: 0.872
- Metal: 0.808
- Trash: 0.769
- Glass: 0.751
- Plastic: 0.709
| Metric | Value |
|---|---|
| Test Accuracy | 85.46% |
| Test Loss | 0.409 |
Per-Class Test Metrics:
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Cardboard | 0.914 | 0.925 | 0.919 |
| Paper | 0.972 | 0.883 | 0.926 |
| Glass | 0.779 | 0.907 | 0.838 |
| Plastic | 0.790 | 0.865 | 0.826 |
| Metal | 0.853 | 0.725 | 0.784 |
| Trash | 0.769 | 0.690 | 0.727 |
The model shows a ~10% performance gap between training (95.98%) and validation (80.04%), indicating some overfitting. However, test accuracy (85.46%) exceeds validation accuracy, suggesting the model generalizes reasonably well.
- Best Performance: Paper (F1: 0.926) and Cardboard (F1: 0.919) are classified most accurately
- Challenging Classes: Trash (F1: 0.727) and Metal (F1: 0.784) are harder to classify, likely due to visual similarity with other categories
Mish activation for convolutional layers paired with ReLU for dense layers provided optimal performance, suggesting that smoother activations benefit feature extraction while standard ReLU suffices for classification.
The 'same' strategy (constant feature maps) with 64 base features proved sufficient, indicating that aggressive feature expansion is not necessary for this dataset size.
Lower dropout (0.2) performed best despite the generalization gap, suggesting that the model benefits from retaining more information during training rather than aggressive regularization.
For a deep dive into the inner workings of our Convolutional Neural Networks, please refer to the detailed Scientific Report.
Key highlights include:
- Feature Map Analysis: Visualizing how the first layer detects edges and boundaries.
- Guided Backpropagation: Understanding which parts of an image contribute most to classification decisions (Model Explainability).
- Training Dynamics: A comparative study of hyperparameter effects on both scratch and fine-tuned models using parallel coordinates and accuracy distributions.
> Read the Full Scientific Report
We further improved performance by finetuning a pre-trained MobileNetV3 Small model.
- Validation Accuracy: 98.17%
- Hyperparameters:
lr: 0.000398batch_size: 64img_size: 224freeze_strategy:finetune_allbackbone_lr_factor: 0.1
| Metric | Value |
|---|---|
| Test Accuracy | 98.41% |
| Test Loss | 0.075 |
Per-Class Test Metrics:
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Cardboard | 1.000 | 1.000 | 1.000 |
| Paper | 0.984 | 1.000 | 0.992 |
| Glass | 0.980 | 1.000 | 0.990 |
| Metal | 0.987 | 0.975 | 0.981 |
| Plastic | 0.969 | 0.990 | 0.979 |
| Trash | 1.000 | 0.828 | 0.906 |
| Model | Test Accuracy | Best F1 Class | Worst F1 Class |
|---|---|---|---|
| SimpleCNN (Scratch) | 85.46% | Paper (0.93) | Trash (0.73) |
| MobileNetV3 (Finetuned) | 98.41% | Cardboard (1.00) | Trash (0.91) |
10×3 grid showing sample test images with ground truth and predictions. Green titles indicate correct predictions, red indicates misclassifications.
- Experiment with Larger Models (ResNet50, EfficientNetV2)
- Advanced data augmentation strategies
- Ensemble methods
- Addressing class imbalance if present
pytorch-lightning
torch
torchvision
wandb
scikit-learn
matplotlib
pandas
torchinfo
wandb sweep config.yaml
wandb agent 23f2001173-indian-institute-of-technology-madras/garbage_clf-src_scripts/rzpn3bhn --count 50from src.lightning.model import ModelLightning
from src.models.simple_cnn import SimpleCNN
model = SimpleCNN(...)
pl_model = ModelLightning.load_from_checkpoint(
'checkpoints/best_model_epoch=12_val_accuracy=0.8049.ckpt',
model=model,
idx_to_class=idx_to_class
)garbage_clf/
├── config.yaml # Hyperparameter sweep configuration
├── checkpoints/ # Model checkpoints
├── data/ # Dataset (train/val/test)
├── notebooks/ # Analysis and visualization notebooks
├── src/
│ ├── data/ # Data loading and preprocessing
│ ├── lightning/ # PyTorch Lightning modules
│ ├── models/ # Model architectures
│ └── scripts/ # Training scripts
└── wandb/ # Weights & Biases logs
MIT
Developed using PyTorch Lightning and optimized with Weights & Biases
We compared the inference performance of the best fine-tuned model across different environments and optimization levels.
- Batch Size: 64
- Image Size: 224x224
- Precision: FP32 (PyTorch/ONNX), FP16 (ONNX/TensorRT), INT8 (TensorRT)
- Device: NVIDIA GPU (RTX 4050 Laptop) vs CPU
| Model | Latency (ms/batch) | Throughput (img/sec) | Speedup vs PyTorch GPU |
|---|---|---|---|
| PyTorch (CPU) | 323.84 | 198 | 0.07x |
| PyTorch (GPU) | 23.78 | 2692 | 1.0x (Baseline) |
| ONNX FP32 | 22.77 | 2810 | 1.04x |
| ONNX FP16 | 11.70 | 5468 | 2.03x |
| TensorRT FP16 | 9.27 | 6905 | 2.56x |
| TensorRT INT8 | 9.44 | 6778 | 2.51x |
- TensorRT Acceleration: TensorRT FP16 provides the best performance, achieving a 2.5x speedup over optimized PyTorch GPU inference. It is highly recommended for deployment.
- ONNX Runtime: Converting to ONNX and using FP16 mode yields a significant 2x speedup over PyTorch, making it a portable and fast alternative if TensorRT is not available.
- Quantization Impact:
- FP16: Maintained high accuracy (~98%) while doubling throughput.
- INT8: While theoretically faster, on this specific hardware/batch size configuration, it did not provide a speedup over FP16 and suffered a significant accuracy drop (to ~64%). Therefore, FP16 is preferred for this model.
- CPU vs GPU: GPU inference is approximately 13-35x faster than CPU inference, highlighting the necessity of hardware acceleration for real-time processing.
For optimal performance and reliability, deploy the TensorRT FP16 engine. If TensorRT is not feasible, ONNX FP16 is a robust runner-up.
GitHub Repository: https://github.com/nevrohelios/garbage_clf

