Text detection in Deep Learning capstone project

Hello guys, I want to use YOLO(or other object detection algorithm) for text localisation problem in Deep Learning capstone project but I don’t know how to proceed with it. There are no proper tutorials available. Since there is no library for YOLO, do I need to implement it from scratch or there is some other work around?

One more question.
When we scale these images(image with text on it) to fit certain CNN do we also have to scale the co-ordinates of words on it. And also, do these co-ordinates represents pixel indices. If yes, the coordinates must be integers but they are in decimal(floating type).

You can refer this tutorial for YOLOv3:

For text recognition, you need to extract all the detected bounding boxes from the above, resize each cropped image to your required resolution (according to your model input size) and send it to your text recognition model.

1 Like

Hey thanks for the prompt reply.
I had a doubt regarding the dataset given to us for text detection problem. I have checked that many algos like YOLO, SSD works good with rectangular bounding box but haven’t seen any example where these algos are used to detect rotated bounding box. Since most of the text in the given dataset has a rotated bounding box, should I adjust the coordinates of those bounding box to make them rectangular or should I regress all 4 coordinates of bounding box instead of width, height and center coordinate of the bounding box as used in YOLO. Could you throw some suggestions here.

To work with slanting bounding boxes, you could try a network like EAST Text Detector:
https://bitbucket.org/tomhoag/opencv-text-detection/

Original repo: