How I do OCR – Part 2

In a previous article, I talked about how I collect data for detecting text in images.

Now, using the collected data set, let’s try to train one of the most popular networks for segmentation and object detection YOLOv5.

To do this, we will use the free Google Colab.

Connecting a drive from a Google account

from google.colab import drive

drive.mount('/content/drive', force_remount=True)

The root folder should contain an archive with training data which can be found here GitHub.

Also download and place the files in the root folder, text_segment.yaml, hyp.scratch-low.yaml,

Cloning from GitHub YOLOv5 and install

!git clone

!pip install -r /content/yolov5/requirements.txt

I chose the lightest and fastest yolov5n-seg model and changed the yolov5n-seg.yaml model configuration file to reduce the size of the model and increase the output speed.

Unpack images and markup files to /content/yolov5/datasets/text_detection

from zipfile import ZipFile

with ZipFile('/content/drive/MyDrive/', 'r') as zipObj:


with ZipFile('/content/drive/MyDrive/', 'r') as zipObj:


Everything is ready to train the network, let’s get started

%cd /content/yolov5

!python "/content/yolov5/segment/" --img 800 --batch 12 --epochs 1200 --data "/content/drive/MyDrive/text_segment.yaml" \

--hyp "/content/drive/MyDrive/hyp.scratch-low.yaml" --project '/content/drive/MyDrive/text_detect/' --name 'weights' \

--weights "/content/drive/MyDrive/"

Authors YOLOv5 recommend training the network for 300 epochs or more, training one epoch in the free version Google Colab takes approximately 1 hour.

On my home computer in Windows 10, epoch 1 training on a 12GB RTX 2060 graphics card takes about 6 minutes.

Free version problem Google Colab lack of multithreading and SSD drive.

After about a dofig time in Google Colab, we will test our model

!python "segment/" --imgsz 800 --iou-thres 0.25 --conf-thres 0.5 --hide-labels --hide-conf --line-thickness 2 --device "cpu" --weights "/content/drive/MyDrive/text_detect/weights/weights/" --source "/content/yolov5/test"

In my opinion, the result is simply excellent, considering that this is the lightest model and also cut in two.

After converting to TensorFlow Lite, the weight of the model is approximately 5 MB, the output speed on the Redmi Note 10 android phone using the GPU is 300 ms.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *