Project Lens

Enlightening the Vision of Computers.

Description

[ ] Content

Bounding Box Detection:

For the detection part, we have trained two models, the YoloV5 model, and PSENet model. The PseNet model is trained on just the Common Images, and the YOLOV5 model is trained on all the Images. At start we only used a YOLOV5 model, but after some experimentation, we felt we should use a better model architecture. So, we switched our focus to the PSENet Model. We experimented with a lot of different hyperparameters and Augmentations, and then generated the bounding boxes for the test images.

Non-Max Suppression:

Yolo Model (Trained from Scratch) was Conservative where all its bounding boxes were concise, but it missed some Unconventional Text Shapes and Format. Our PSENET model (Trained from Scratch) didn’t perform well as compared to Yolo but was able to predict Textboxes which was not done by Yolo. To Take advantage of both the models, we used Non-Max Suppression Technique to Combine the Bounding Boxes of Both the Models. This eliminated the chance of redundancy in boxes and improved the Quality of Bounding Boxes predicted.

Text Recognizer:

During our Analysis on the labels provided by the Host, we found that 50 unique characters had less than 3 occurrences in the whole 2,34,000 Boxes in the Training Data. On Further analysis we found that these are unnecessary and hinder the training of the Model. So, we eliminated the use of Labels which had occurrence of these characters while training. The Eliminated Data was less than 0.0007 % of the whole Data. This improved our model capability. Our CNN Model (With DenseNet Architecture) showed good performance while training and did the same on the unseen Data. We Tried Using CRNN Models too, but CNN Model performed well.

Future Goals

Our Future goal is to enhance the Recognizer model by adapting better architectures and to implement an End-End Model deployed in Mobile App which can perform this Bounding Box Detection and Text Recognition in Real Time.

Implementation Procedure

Run the main.py folder in the YOLOV5 folder to train and create the predictions and generate the output.csv file
Type in this shell command “python detector/train.py” in the PSENet folder to train the model
Type in this shell command “python detector/predict.py” in the PSENet folder to create predictions and generate the output.csv file
Run Non Max Supression on PSENET and Yolo.ipynb to apply Non Max Suppression on Bounding Boxes of Yolo and PSENET outputs.
Place Training Data and Test Data in official data folder.
Run “Code to train the Train and Predict with Recognizer.ipynb” file to train the model and predict on the test day. It contains all the necessary shell commands.

Runtime

For the detection process using a Tesla K40 GPU the inference time for the YoloV5 model for 500 images is around 3 minutes, and for PSENet model is 2 minutes. Considering the Latency while using the model real-time, we used DenseNet50V2 Architecture which is just 15 Mb in size and this can give us the ability to integrate in Edge Devices. All the Libraries used in our source code are open-sourced and standard libraries. We used GPU for Training and specifically used CPU for Predicted so that we could keep a factor on the Latency of the Model.

Experiment process

We used Mr. Andrew Ng’s iterative approach of idea ->Code ->Experiment ->Inference and that has helped us a lot in building this model. We first started with a Yolo model, and then went on to work with architectures better suited for text detection, we considered PSENet first. After that, we decided to try some other architectures like DB and EAST, but did not have enough time left to implement those Using Non-Max Suppression to efficiently combine the bounding box generated by PSENet and Yolo was key for us in getting Good Bounding Boxes. In Recognition Part, we tried various Model Architecture and found that the best model architecture for our case. It was a great experience to experiment with different approaches to Build a high-scoring Model.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Detection		Detection
recognizer		recognizer
.gitattributes		.gitattributes
.gitignore		.gitignore
Code to train the Train and Predict with Recognizer.ipynb		Code to train the Train and Predict with Recognizer.ipynb
Implementation Instruction.docx		Implementation Instruction.docx
Non Max Supression on PSENET and Yolo.ipynb		Non Max Supression on PSENET and Yolo.ipynb
Proposed Architecture.png		Proposed Architecture.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Lens

Enlightening the Vision of Computers.

Description

Bounding Box Detection:

Non-Max Suppression:

Text Recognizer:

Future Goals

Implementation Procedure

Runtime

Experiment process

Thanks for the Opportunity !

About

Uh oh!

Releases

Packages

Languages

Sooryak12/Lens

Folders and files

Latest commit

History

Repository files navigation

Project Lens

Enlightening the Vision of Computers.

Description

Bounding Box Detection:

Non-Max Suppression:

Text Recognizer:

Future Goals

Implementation Procedure

Runtime

Experiment process

Thanks for the Opportunity !

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages