Recognizing objects on android using TensorFlow: from data preparation to launch on the device

9 min

Learning a neural network pattern recognition is a long and resource-intensive process. Especially when at hand there is only an inexpensive laptop, and not a computer with a powerful graphics card. In this case, help will come Google colaboratory, which offers completely free use of the Tesla K80 level GPU (more details)

This article describes the process of preparing data, training tensorflow model in Google Colaboratory and its launch on android device.

Data preparation

As an example, let’s try to train a neural network to recognize white dice on a black background. Accordingly, for starters, you need to create a data set sufficient for training (for now, let’s stop at ~ 100 photos).

We will use for training Tensorflow Object Detection API. We will prepare all the data necessary for training on a laptop. We need an environment and dependency management manager conda. Installation Instructions here.

Let’s create an environment for work:

conda create -n object_detection_prepare pip python=3.6

And activate it:

conda activate object_detection_prepare

Set the dependencies that we need:

pip install --ignore-installed --upgrade tensorflow==1.14
pip install --ignore-installed pandas
pip install --ignore-installed Pillow
pip install lxml
conda install pyqt=5

Create a folder object_detection, and put all our photos in a folder object_detection / images.

Google Colab has a memory limit, so you need to lower the resolution of photos before marking up the data so that you don’t encounter an error during training “tcmalloc: large alloc ….”.

Create a folder object_detection / preprocessing and add to it prepared scripts.

To resize the photo, use the script:

python ./object_detection/preprocessing/ -i ./object_detection/images --imageWidth=800 --imageHeight=600

This script will run through the folder with the specified photos, resize them to 800×600 and put them in object_detection / images / resized. Now you can replace them with original photos in object_detection / images.

To mark up the data, use the tool labelImg.

Clone the repository labelImg in object_detection

Go to the folder labelImg

cd [FULL_PATH]/object_detection/labelImg 

and execute the command:

pyrcc5 -o libs/ resources.qrc

After that, you can start marking up the data (this is the longest and most boring stage):


In the “Open dir” specify the folder object_detection / images and go through all the photos, highlighting objects for recognition and indicating their class. In our case, these are the face values ​​of the dice (1, 2, 3, 4, 5, 6). Save the metadata (* .xml files) in the same folder.

Create a folder object_detection / training_demowhich we will upload to Google Colab a bit later for training.

We will divide our photos (with metadata) into training and test in the ratio 80/20 and move them to the appropriate folders object_detection / training_demo / images / train and object_detection / training_demo / images / test.

Create a folder object_detection / training_demo / annotations, in which we will add files with metadata necessary for training. The first one will be label_map.pbtxt, in which we indicate the ratio of the class of the object and the integer value. In our case, it is:


item {
    id: 1
    name: '1'

item {
    id: 2
    name: '2'

item {
    id: 3
    name: '3'

item {
    id: 4
    name: '4'

item {
    id: 5
    name: '5'

item {
    id: 6
    name: '6'

Remember the metadata we got in the data markup process? To use them for training, you must convert them to a format TFRecord. For conversion, we will use scripts from the source [1].

We will carry out the conversion in two stages: xml -> csv and csv -> record

Let’s go to the preprocessing folder:

cd [FULL_PATH]object_detectionpreprocessing

1.From xml to csv

Training data:

python -i [FULL_PATH]/object_detection/training_demo/images/train -o [FULL_PATH]/object_detection/training_demo/annotations/train_labels.csv

Test data:

python -i [FULL_PATH]/object_detection/training_demo/images/test -o [FULL_PATH]/object_detection/training_demo/annotations/test_labels.csv

2. From csv to record

Training data:

python --label_map_path=[FULL_PATH]object_detectiontraining_demoannotationslabel_map.pbtxt --csv_input=[FULL_PATH]object_detectiontraining_demoannotationstrain_labels.csv --output_path=[FULL_PATH]object_detectiontraining_demoannotationstrain.record --img_path=[FULL_PATH]object_detectiontraining_demoimagestrain

Test data:

python --label_map_path=[FULL_PATH]object_detectiontraining_demoannotationslabel_map.pbtxt --csv_input=[FULL_PATH]object_detectiontraining_demoannotationstest_labels.csv --output_path=[FULL_PATH]object_detectiontraining_demoannotationstest.record --img_path=[FULL_PATH]object_detectiontraining_demoimagestest

We have finished the data preparation on this, now we need to choose the model that we will train.

Available retraining models can be found here.

Now we will choose a model ssdlite_mobilenet_v2_cocoto further launch the trained model on android device.

Download the archive with the model and unpack it into object_detection / training_demo / pre-trained-model.

It should be something like
object_detection / training_demo / pre-trained-model / ssdlite_mobilenet_v2_coco_2018_05_09

Copy the file from the unpacked archive pipeline.config in object_detection / training_demo / training and rename it to ssdlite_mobilenet_v2_coco.config.

Next, we need to configure it for our task, for this:

1. Indicate the number of classes

model.ssd.num_classes: 6

2. We indicate the size of the package (the amount of data for training per iteration), the number of iterations and the path to the saved model from the archive that we downloaded

train_config.batch_size: 18
train_config.num_steps: 20000

3. Indicate the number of photos in the training set (object_detection / training_demo / images / train)

eval_config.num_examples: 64

4. Specify the path to the training dataset

train_input_reader.label_map_path: "./training_demo/annotations/label_map.pbtxt"

5. Specify the path to the test data set

eval_input_reader.label_map_path: "./training_demo/annotations/label_map.pbtxt"

In the end, you should get something like of this.

Next, archive the folder training_demo and the resulting upload to Google Drive.

Alternative to work with the archive

IN this the article tells how to mount google drive to the Google Colab virtual machine, but you need to remember to change all the paths in configs and scripts

On this, the data preparation is completed, let’s move on to the training.

Model training

In Google Drive, select training_demo.zipclick on Get shareable link and from the received link we save ourselves id of this file:[YOUR_FILE_ID_HERE]

The easiest way to use Google Colab is to create a new notepad in Google Drive.

By default, training will be performed on the CPU. To use the GPU, you need to change the type of runtime.

Ready notebook can be taken here.

Training consists of the following steps:

1. Clone the TensorFlow Models repository:

!git clone                                                

2. Install protobuf and compile the necessary files in object_detection:

!apt-get -qq install libprotobuf-java protobuf-compiler                                               
%cd ./models/research/
!protoc object_detection/protos/*.proto --python_out=.
%cd ../..

3. Add the necessary paths to the PYTHONPATH environment variable:

import os
os.environ['PYTHONPATH'] += ":/content/models/research/"
os.environ['PYTHONPATH'] += ":/content/models/research/slim"
os.environ['PYTHONPATH'] += ":/content/models/research/object_detection"
os.environ['PYTHONPATH'] += ":/content/models/research/object_detection/utils"

4. To get the file from Google Drive, install PyDrive and log in:

!pip install -U -q PyDrive

from pydrive.auth import GoogleAuth
from import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

5. Download the archive (you need to specify the id of your file) and unzip it:


training_demo_zip = drive.CreateFile({'id': drive_file_id})


6. We start the learning process:

!python ./models/research/object_detection/legacy/ --logtostderr --train_dir=./training_demo/training --pipeline_config_path=./training_demo/training/ssdlite_mobilenet_v2_coco.config

Description of parameters

–train_dir=. / training_demo / training – the path to the directory where they will lie
training results

–pipeline_config_path=. / training_demo / training / ssdlite_mobilenet_v2_coco.config – path to the config

7. We convert the training result into a frozen graph, which can be used:

!python /content/models/research/object_detection/ --input_type image_tensor --pipeline_config_path /content/training_demo/training/ssdlite_mobilenet_v2_coco.config --trained_checkpoint_prefix /content/training_demo/training/model.ckpt-[CHECKPOINT_NUMBER]
 --output_directory /content/training_demo/training/output_inference_graph_v1.pb

Description of parameters

–pipeline_config_path /content/training_demo/training/ssdlite_mobilenet_v2_coco.config – path to the config

–trained_checkpoint_prefix /content/training_demo/training/model.ckpt-[CHECKPOINT_NUMBER] – the path to the checkpoint that we want to convert.

–output_directory /content/training_demo/training/output_inference_graph_v1.pb – name of the converted model

Checkpoint number [CHECKPOINT_NUMBER]can be viewed in the folder content / training_demo / training /. AFTER training there should appear files like model.ckpt-1440.index, model.ckpt-1440.meta. 1440 is and [CHECKPOINT_NUMBER] and learning iteration number.

To visualize the learning outcome in a notebook there is a special script. The figure below shows the result of image recognition from a test data set after ~ 20,000 learning iterations.

8. Converting a trained model to tflite.
For use tensorflow lite need to convert the model to format tflite. To do this, convert the learning outcome to frozen graph which supports conversion to tflite (parameters are the same as when using the script

!python /content/models/research/object_detection/ --pipeline_config_path /content/training_demo/training/ssdlite_mobilenet_v2_coco.config --trained_checkpoint_prefix /content/training_demo/training/model.ckpt-[CHECKPOINT_NUMBER] --output_directory /content/training_demo/training/output_inference_graph_tf_lite.pb

To convert to tflite we need additional information about the model, to get it, download the model from output_inference_graph_tf_lite.pb:

And open it in tulza Netron. We are interested in the names and dimensions of the input and output nodes of the model.

Knowing them, you can convert the pb model to tflite format:

!tflite_convert --output_file=/content/training_demo/training/model_q.tflite  --graph_def_file=/content/training_demo/training/output_inference_graph_tf_lite_v1.pb/tflite_graph.pb --input_arrays=normalized_input_image_tensor  --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3' --input_shapes=1,300,300,3 --enable_select_tf_ops --allow_custom_ops  --inference_input_type=QUANTIZED_UINT8 --inference_type=FLOAT --mean_values=128 --std_dev_values=128

Description of parameters

–output_file= / content / training_demo / training / model_q.tflite – path to the conversion result

–graph_def_file= / content / training_demo / training / output_inference_graph_tf_lite_v1.pb / tflite_graph.pb – path to the frozen graph to be converted

–input_arrays= normalized_input_image_tensor – name of the input node, which we learned above

–output_arrays= ‘TFLite_Detection_PostProcess’, ‘TFLite_Detection_PostProcess: 1’, ‘TFLite_Detection_PostProcess: 2’, ‘TFLite_Detection_PostProcess: 3’ – the names of the output nodes that we learned above

–input_shapes= 1,300,300,3 – the dimension of the input data that we learned above

–enable_select_tf_ops – to use the extended runtime TensorFlow Lite

–allow_custom_ops – to use TensorFlow Lite Optimizing Converter

–inference_type= FLOAT – data type for all arrays in the model except input

–inference_input_type= QUANTIZED_UINT8 – data type for all input arrays in the model

–mean_values= 128 –std_dev_values= 128 – average value and standard deviation of input data, for using QUANTIZED_UINT8

We archive the folder with the learning results and fill it in Google Drive:

!zip -r ./training_demo/ ./training_demo/training/

training_result = drive.CreateFile({'title': ''})

If there is an Invalid client secrets file error, then you need to re-authorize google drive.

Running a model on an android device

At the core of the android application was used official guide by object detection, but it has been completely rewritten using kotlin and Camerax. Full code can be viewed here.

CameraX already has a mechanism for analyzing incoming frames from the camera using ImageAnalysis. Recognition logic is in ObjectDetectorAnalyzer.

The whole process of image recognition can be divided into several stages:

1. At the entrance we get an image that has Yuv format. For further work, it must be converted to RGB format:

val rgbArray = convertYuvToRgb(image)

2. Next, you need to transform the image (rotate, if necessary, and resize to the input values ​​of the model, in our case it is 300×300), for this we draw an array with pixels on Bitmap and apply the transformation on it:

val rgbBitmap = getRgbBitmap(rgbArray, image.width, image.height)
val transformation =  getTransformation(rotationDegrees, image.width, image.height)
Canvas(resizedBitmap).drawBitmap(rgbBitmap, transformation, null)

3. We convert the bitmap into an array of pixels, and give it to the detector input:

ImageUtil.storePixels(resizedBitmap, inputArray)
val objects = detect(inputArray)

4. For visualization, we will pass the recognition result to RecognitionResultOverlayView and transform the coordinates in accordance with the aspect ratio:

val scaleFactorX = measuredWidth / result.imageWidth.toFloat()
val scaleFactorY = measuredHeight / result.imageHeight.toFloat()

result.objects.forEach { obj ->
    val left = obj.location.left * scaleFactorX
    val top = * scaleFactorY
    val right = obj.location.right * scaleFactorX
    val bottom = obj.location.bottom * scaleFactorY

    canvas.drawRect(left, top, right, bottom, boxPaint)
    canvas.drawText(obj.text, left, top - 25f, textPaint)

To run our model in the application, you need to replace the file with the model in assets with training_demo / training / model_q.tflite (renaming it to detect.tflite) and file with tags labelmap.txt, in our case it is:



Since the official guide uses SSD Mobilenet V1, in which indexing of labels starts from 1, but not from 0, you need to change labelOffset from 1 to 0 in the collectDetectionResult method of the ObjectDetector class.

That’s all.
On the video, the result of how the trained model works on the old Xiaomi Redmi 4X:

In the process, the following resources were used:


Leave a Reply