Skip to content

MitvaGami/RoadScan-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Project Objective

This project demonstrates a complete computer vision pipeline for identifying potholes in road images that have been heavily corrupted with multiple types of noise. The goal is to take a single, unclean image and perform two key tasks:

  1. Object Detection: Identify the precise location of the pothole and draw a bounding box around it.

  2. Image Classification: Classify the entire image as either containing a "Pothole" or "Normal".

This is achieved through a multi-stage process involving advanced image restoration, enhancement, segmentation, and finally, training and deploying a Convolutional Neural Network (CNN).

Project Pipeline Overview

The workflow is divided into four main stages:

  1. Image Restoration: A sophisticated pipeline to remove aggressive, combined noise (Salt & Pepper, Gaussian, Speckle, Poisson) from the input image.

  2. Image Enhancement: After cleaning, the image contrast and sharpness are improved to make features more distinct and visually clear.

  3. Segmentation & Object Detection: Traditional computer vision techniques are applied to the restored image to segment the pothole from the road and draw a bounding box.

  4. AI-Based Classification: A deep learning model (CNN) is trained on a clean dataset to learn the features of potholes. This trained model is then used to classify our final, enhanced image.

Key Technologies & Libraries

  • OpenCV (cv2): The core library for all image processing tasks.

    • Usage: Reading/writing images, color space conversions (RGB, HSV), filtering (medianBlur, GaussianBlur), morphological transformations (morphologyEx), and contour detection (findContours).

    • Reason: It is the industry-standard, high-performance library for classical computer vision.

  • BM3D (bm3d): A specialized, state-of-the-art denoising library.

    • Usage: Employed as the main algorithm to remove Gaussian-like and complex statistical noise.

    • Reason: BM3D (Block-matching and 3D filtering) is significantly more powerful than standard filters like Gaussian blur or even Non-Local Means. It excels at preserving textures and edges while removing heavy noise, making it ideal for this challenging restoration task.

  • NumPy: The fundamental library for numerical operations in Python.

    • Usage: Manipulating image arrays, performing mathematical operations (log transform), and preparing data for the ML model.

    • Reason: It provides the efficient, multi-dimensional array objects that image data is stored in.

  • Matplotlib: A comprehensive library for creating static, animated, and interactive visualizations.

    • Usage: Displaying images at each stage of the pipeline to visually analyze the results.

    • Reason: Essential for debugging, analysis, and presenting the final results in the report.

  • TensorFlow / Keras: A high-level deep learning framework.

    • Usage: Building, training, and evaluating the Convolutional Neural Network (CNN) for the classification task.

    • Reason: It provides an intuitive and powerful API for creating complex neural network architectures.

  • Scikit-learn (sklearn): A machine learning library providing various utility tools.

    • Usage: Splitting the dataset (train_test_split), calculating class weights for imbalanced data, and generating final evaluation reports (confusion_matrix, classification_report).

    • Reason: It offers crucial, pre-built functions that simplify the machine learning workflow.

Part 1: Image Restoration & Enhancement

The initial noisy image is unusable for any analysis. The restoration pipeline is the most critical step to recover meaningful information.

Techniques Used:

  1. Adaptive Median Filter: The first step targets Salt & Pepper noise. This filter is robust to extreme outliers (pure black/white pixels) and removes them effectively without significant blurring.

  2. Log Transform: This mathematical transformation stabilizes the image's variance and helps convert multiplicative noise (like Speckle noise) into additive noise, which is easier for subsequent filters like BM3D to handle.

  3. BM3D Denoising: This is the core of the restoration. It operates in the log domain to aggressively remove the combined Gaussian, Speckle, and Poisson noise.

  4. Inverse Log Transform: Converts the image back to its original intensity scale.

  5. Morphological Operations (in HSV): To preserve color, the image is converted to the HSV (Hue, Saturation, Value) space. Opening and Closing operations are applied only to the V (Value/Intensity) channel to remove any final small noise artifacts and smooth object boundaries without affecting color.

  6. Contrast Limited Adaptive Histogram Equalization (CLAHE): A sophisticated contrast enhancement technique that improves local contrast, preventing the over-amplification of noise that standard histogram equalization can cause.

  7. Unsharp Masking: A final sharpening step that increases the acuity of edges by subtracting a blurred version of the image from itself.

How it helped the final prediction: Without this entire pipeline, the input to the AI model would be an ambiguous, noisy mess. The restoration and enhancement process clarifies the defining features of a pothole—its sharp edges, dark interior, and texture—making it possible for the CNN to learn and later recognize these patterns. A model trained on noisy data would have extremely poor performance.

Part 2: AI/ML Model for Classification

A Convolutional Neural Network (CNN) was designed to classify images as "Pothole" or "Normal".

Model Architecture

The model is a sequential stack of layers designed to learn increasingly complex features from the images:

  • Convolutional Blocks (x3): Each block consists of:

    • Conv2D layers with ReLU (Rectified Linear Unit) activation function. ReLU is used for its efficiency and ability to mitigate the vanishing gradient problem. Filters increase from 32 to 64 to 128 to learn more complex patterns in deeper layers.

    • BatchNormalization: Stabilizes and accelerates the training process by normalizing the inputs to each layer.

    • MaxPooling2D: Downsamples the feature maps, reducing computational load and making the model more robust to variations in the position of features.

    • Dropout: A regularization technique that randomly deactivates a fraction of neurons during training to prevent the model from becoming overly specialized (overfitting).

  • Fully Connected Layers:

    • Flatten: Converts the 2D feature maps from the convolutional blocks into a 1D vector.

    • Dense layers with ReLU activation for high-level reasoning.

    • A final Dense layer with a single neuron and a Sigmoid activation function. Sigmoid is essential for binary classification as it outputs a value between 0 and 1, representing the probability of the image belonging to the positive class ("Pothole").

Training Strategy

  • Optimizer: Adam with a low learning rate (0.0001) was used for stable convergence.

  • Loss Function: binary_crossentropy is the standard loss function for binary classification problems.

  • Class Weights: The dataset was slightly imbalanced. Class weights were computed to penalize misclassifications of the minority class more heavily, ensuring the model learns both classes effectively.

  • Callbacks:

    • EarlyStopping: Monitors the validation loss and stops training if it doesn't improve for 10 epochs, preventing overfitting and saving time.

    • ReduceLROnPlateau: Reduces the learning rate if the validation loss plateaus, allowing for finer-grained optimization.

Model Performance & Evaluation

The model was trained on a clean dataset and evaluated on an unseen test set. The following metrics were achieved:

  • Test Accuracy: 66.42%

  • Test Precision: 63.16% (Of all the images the model predicted as "Pothole", 63.16% actually were.)

  • Test Recall: 72.73% (The model successfully identified 72.73% of all the actual potholes in the test set.)

Confusion Matrix Analysis:

  • True Positives (Pothole correctly classified): 48

  • True Negatives (Normal correctly classified): 43

  • False Positives (Normal misclassified as Pothole): 28

  • False Negatives (Pothole misclassified as Normal): 18

The results show a moderately effective model. The recall is higher than precision, indicating the model is better at finding potholes than it is at being certain about its predictions (it makes more false alarms). The relatively high number of false positives and negatives suggests that differentiating between normal road damage and actual potholes is a challenging task.

Crucially, this 66.42% accuracy is only possible because the model was trained on clean data and the final prediction was made on a fully restored and enhanced image. If the model were to classify the original noisy image, the accuracy would be close to random chance (50%). This directly proves the value of the image processing pipeline.

How to Run the Code

The entire project is contained within a single Python script designed to be run in a Google Colab environment.

  1. Setup: Ensure your datasets (Noisy_Pothole_Combined3 and Pothole_Dataset) are uploaded to your Google Drive in the specified paths.

  2. Installation: The first code cell installs the necessary bm3d library.

  3. Execution: Run the cells sequentially. The script will:

    • Load a noisy image.

    • Perform the complete restoration and enhancement pipeline.

    • Run the segmentation-based object detection.

    • Load the clean dataset, build, and train the CNN classifier.

    • Use the trained model to classify the final enhanced image.

    • Display visualizations at each major step.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors