Do Androids Dream of Neural Processing Modules? Porting the TensorFlow Lite Model to a Mobile App

Despite the fact that artificial intelligence is the most hyped topic in modern IT, and given the fact that the previous mega-story with the Internet of Things and Edge Computing is still not forgotten, I was surprised that there are no clear tutorials like “Hello world” on adding machine learning in mobile applications on AndroidWell, of course they are. There isbut not at all entry level. In addition, they suggest using other people's models to recognize cats from dogs and handwritten letters in text, etc. But to take regression and work with it – no (or I didn't find it). And I didn't find it in books. If there is – share. Well, in the meantime, I will add my model to the application and write this text in parallel.

The history of my application is in publications 1, 2, 3. In short, this is the RuLearn program for memorizing vocabulary in foreign languages or in any other area that requires mechanical memorization. Its effectiveness is determined Ebbinghaus's “forgetting curve”but as it turns out, it would be good to adapt the intervals for repetition to the complexity of the material being studied. That is, the student learns new words, and the application learns from his mistakes and adjusts the repetition algorithms in an optimal way. Machine learning in human learning suggests itself.

Modern ARM processors in mobile phones contain NPU for working with neural networks. Perhaps, this is still the least used part of the hardware in your mobile. Therefore, no client-server architecture is needed in this case, in the end of our development, machine learning will take place on the end device (bonus – no one from the server side will be able to laugh at the user's mistakes). But at the intermediate stage, we will have to use a model created on the desktop. Today we will do this.

Native library for machine learning on Android devices is TensorFlow Lite. In article 3, I described how a model was created for the TensorFlow desktop version; all that remains is to save it and convert it for further use on a mobile device.

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('ruLearnModel.tflite', 'wb') as f:
    f.write(tflite_model)

Next we have 2 options:

Use native TensorFlow Lite calls in a mobile app. A so-so option if you made a model on a desktop using Keras, which hides the TensorFlow API from you and you feel insecure. Everything will be very simple later, once you figure it out, but keep in mind that TF Lite is C++ code. If errors occur, no debugging will help you understand the problems that are happening. The pointer flew out where it shouldn't, broke the library and the app, and left an incomprehensible entry in Logcat that something bad happened (it's not clear what). I will return to this topic when I describe training a model on a device in the next article and we will already have a working model for Android, but for now we will go the other way.
Use a “wrapper” that Android Studio can generate for you. This will create a class in your app named the same as your model, and you can use the methods of this class to work with the model. It's strange, but this option is poorly documented (or, again, I haven't found it – please share if you know where it's described). I'll try to correct this misunderstanding.

To import the model, you don't need to update Android Studio to the latest version, find it in the context menu here:

When importing, 3 things will happen:

The model will be copied to the Android Studio project. You won't need to extract it from assets while the app is running.
The required dependencies will be added to the build.gradle(:app) file.
The same wrapper class for your model will be created with the name that matches the model file name. My model is called ruLearnModel.tflite, the class will be called ruLearnModel.

Give Android Studio time to download and index what it wants. After that, you'll see a message on the screen that the model has no metadata (don't try to add it or search for information like this – you don't need it for your own projects, you know more about your model than anyone else). You'll also see a code snippet for calling TF Lite in Kotlin and Java. Since the on-device training examples on the TensorFlow website are in Java, we'll use that language here too:

try {
    RuLearnModel model = RuLearnModel.newInstance(context);

    // Creates inputs for reference.
    TensorBuffer inputFeature0 = TensorBuffer.createFixedSize(new int[]{1, 4}, DataType.FLOAT32);
    inputFeature0.loadBuffer(byteBuffer);

    // Runs model inference and gets result.
    RuLearnModel.Outputs outputs = model.process(inputFeature0);
    TensorBuffer outputFeature0 = outputs.getOutputFeature0AsTensorBuffer();

    // Releases model resources if no longer used.
    model.close();
} catch (IOException e) {
    // TODO Handle the exception
}

In the class hierarchy of your project, you will not see the RuLearnModel class, and the model file will not appear in the assets. However, through the file manager you can see that the ml subdirectory has been created in the main directory and the model has been copied there. After compilation, the generated class for the model will appear in app -> build -> generated -> ml_source_out -> debug -> package name -> ml. The code above is ready to use after importing all the necessary libraries, with the exception of byteBuffer – this is where you need to pass the input parameters. Please note – TensorFlow Lite expects input parameters in the form of float32, and int[]{1, 4} is the dimension of my model – 4 input parameters in one line. But these parameters need to be converted to a float array and passed to the TF Lite input:

//строчку
inputFeature0.loadBuffer(byteBuffer);

//меняем на
float[] featureArray = new float[]{
        (float)id, (float) cur_rating, (float) n_repeat, (float) s_lapsed
};
inputFeature0.loadArray(featureArray);

The variables id, cur_rating, n_repeat and s_lapsed are the parameters of my model from article 3. Next, you can try to pass the same parameters to the trained model in Python and the code in Android Studio, the result should match.

In the meantime, we bypassed one ambush – data normalization. The model in my example did not perceive unnormalized data well (i.e. having different scales – compare: id = 17, cur_rating = 13, n_repeat = 5, and s_lapsed = 434943!). Therefore, before training the model, the data dimensionality had to be brought to the same order. This could be achieved in different ways, but if you used StandardScaler from sklearn, then you would somehow have to transfer data on the mean and standard deviation by columns of the dataset to the mobile application in order to repeat this operation with each request to the model. In my case, the normalization layer is built into the network itself and it itself transforms the data at the input in the required way. Convenient!

Hidden text

normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(X))
model = tf.keras.Sequential([
    normalizer,
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1)])

Now let's check how the model will work on the entire database. Here we will solve 2 problems – we will find out how much the prediction of TF Lite differs from the desktop TF and at the same time we will find out how much time it will take. It would seem that you can read the entire table – it takes up a couple of dozen kilobytes in memory – and pass it to a buffer of the form TensorBuffer.createFixedSize(new int[]{, 4}, DataType.FLOAT32). After all, you can do this in TensorFlow on the desktop. But it was not so easy. Our class with the model (thank you very much!) gives meaningful descriptions of the error that you cannot load such a buffer in TensorFlow. I did not find a way to deal with this (tell me if you know). But I was not very upset. This code needs to be executed only once as an experiment, so we record the time, tediously read the table line by line, run the model for each value, remember the result. It is clear that the model is initialized once in onCreate and closed in onDestroy for the Activity. When the table is processed, we see how much time has passed. I tested it on a 2017 phone on Qualcomm Snapdragon 625, without connecting hardware acceleration, in build variant: debug and also calling the model for each line – that is, as inefficient as possible. It turned out to be 167 milliseconds. We conclude that even on the oldest devices the model for my application will work unnoticeably for the user experience.

At the same time, I wrote down the training results in a file. After that, I downloaded the file and compared what the model predicted on the mobile device and the predictions of the TensorFlow desktop version. You can use the same Python code as in the evaluation of the model's performance:

from sklearn.metrics import mean_squared_error
print(f"Mean squared error is: {mean_squared_error(predictionTF_Lite, predictionTF)}")
---------------------------------------------
Mean squared error is: 6.711172283236772e-15

Or you can download both results into one csv and admire the results of your efforts for today in Excel:

That's enough for today, next time we'll teach the model on the device. Thank you for your attention! If you have materials on this topic, please share them in the comments.