TinyML. Compressing the neural network
Now programmers are faced with a difficult task – how to implement such a cumbersome structure as a neural network – in, say, a bracelet? How to optimize the power consumption of the model? What is the price of such optimizations, as well as how justified is the introduction of models into small devices, and why it is impossible to do without it.
And what is the use?
Let’s imagine an expensive industrial sensor – 1000 measurements per second, a temperature sensor, vibration measurement, data transmission over 10 km, a powerful processor – 20 million operations per second! Its job is to send data about temperature, vibration, and other parameters to the server to prevent equipment breakdowns. But here’s the bad luck – 99% of the data sent by him is useless, from it – a net loss for electricity. And there can be tens and hundreds of such sensors in production.
In reality, we are not interested in the data from this device itself, but in the insights from them – is everything working as usual? Are there any emergencies? Perhaps repairs will be needed soon? So why not deploy the neuron to the sensor itself, and instead of an endless stream of data, only sometimes send signals “Everything is fine” or “Performance anomalies!” This is exactly what TinyML is about.
Everything revolves around how we can squeeze the model as much as possible so that it fits into a small device. As “devices” everything will do: kettle, industrial sensor, iron, telephone, bracelet, etc.
Advantages of the approach
Firstly, saving resources. Since constant communication with the server is not required, this significantly saves energy, because you can do without a constant connection to WiFi, Bluetooth, and so on.
Secondly, fast work speed. It takes too long to transfer data to the server when the result is needed “here and immediately”…
The third is savings on cloud computing. In the cloud approach, the data needs to be sent to the server not only for training the model, but also for the predictive. Imagine that a face change application on your phone will constantly require an Internet connection, as is the case with a navigator … Very inconvenient and costly. Actually, that’s why such technologies are already built into our phone (and this is the work of TinyML).
Fourth, security. Sending data somewhere is always risky, it is much safer to receive the result already on the device, and send only the predicted from the device.
Fifth, the speed of the neural networks themselves becomes faster, because working with int inside a neuron is faster than with float – I will talk about this below.
Quantization
And to help shove a fat neuron into a slender sensor will help such a trick as Quantize. The essence of the method is simple – let’s reduce the space occupied by numbers in memory. Conventional neurons use a data type such as a thick 32-bit float. What happens if we replace them with lean 8 bit ints? They will take up less space, but the quality will also decline.
An unattainable dream is to use 1 bit. Then we will get a gigantic gain in size. It’s a pitty it’s impossible.
Or maybe? Binarized convolutional neural networks at your service. You can read more about them. here
From theory to practice
And now, to make it more perky, a little code. Let’s make the simplest model predicting the sine of a number
# Генерируем данные для нашей нейронки
x_values = np.random.uniform(
low=0, high=2*math.pi, size=1000).astype(np.float32)
# Перемешиваем
np.random.shuffle(x_values)
y_values = np.sin(x_values).astype(np.float32)
# Добавляем шум, чтобы было "как в реальной жизни"
y_values += 0.1 * np.random.randn(*y_values.shape)
plt.plot(x_values, y_values, 'b.')
plt.show()
The data is ready. Now we divide them into training, test and validation samples (I will skip this and some other parts of the code to save your time, a complete notebook with the code here here)
Time to build our neuron
# Создаём нейронную сеть
model = tf.keras.Sequential()
model.add(keras.layers.Dense(16, activation='relu', input_shape=(1,)))
model.add(keras.layers.Dense(16, activation='relu'))
model.add(keras.layers.Dense(1))
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
# Тренируем модельку
history = model.fit(x_train, y_train, epochs=500, batch_size=64,
validation_data=(x_validate, y_validate))
# Сохраняем
model.save(MODEL_TF)
Now let’s see what happened
So, the basic model is ready. The time has come “squeeze” her in many different ways. And TFLiteConverter, created specifically to facilitate your neurons, will help us with this.
# Конвертируем нашу нейронку в формат TensorFlow Lite БЕЗ квантизации
converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_TF)
model_no_quant_tflite = converter.convert()
# Сохраняем
open(MODEL_NO_QUANT_TFLITE, "wb").write(model_no_quant_tflite)
# Конвертируем нашу нейронку в формат TensorFlow Lite ИСПОЛЬЗУЯ квантизацию
def representative_dataset():
for i in range(500):
yield([x_train[i].reshape(1, 1)])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Задаем параметры, по котором в нейронке всё будет конвертировано в int
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# Задаем репрезентативный набор данных, чтобы обеспечить правильность квантизации
converter.representative_dataset = representative_dataset
model_tflite = converter.convert()
open(MODEL_TFLITE, "wb").write(model_tflite)
In total, we have three neural networks: regular, converted TensorFlow Lite without quantization, converted TensorFlow Lite with quantization. It’s time to compare how much space they take up
pd.DataFrame.from_records(
[["TensorFlow", f"{size_tf} bytes", ""],
["TensorFlow Lite", f"{size_no_quant_tflite} bytes ", f"(reduced by {size_tf - size_no_quant_tflite} bytes)"],
["TensorFlow Lite Quantized", f"{size_tflite} bytes", f"(reduced by {size_no_quant_tflite - size_tflite} bytes)"]],
columns = ["Model", "Size", ""], index="Model")
So, compared to the original neuron, converting with TensorFlow Lite gives a 32% gain in size, and quantizing as much as 40%! A very impressive result, however, at what cost did we achieve it?
The loss of quality is almost insignificant, at the level of error. But it should be borne in mind that we are using a very simple model for the test, on larger models the result may not be so optimistic!
Conversion to C code
We did all the previous steps in Python in a laptop, but we are interested in how to deploy a model into a microcontroller, right? And for this, it is required to convert the resulting model into code with which these microcontrollers themselves are used to working. IMPORTANT! I ran the code below on Ubuntu, if you want to do this on Windows, you have to look for workarounds.
# Ставим xxd
!apt-get update && apt-get -qq install xxd
# А теперь конвертируем
!xxd -i {MODEL_TFLITE} > {MODEL_TFLITE_MICRO}
# Меняем имя переменных
REPLACE_TEXT = MODEL_TFLITE.replace("https://habr.com/", '_').replace('.', '_')
!sed -i 's/'{REPLACE_TEXT}'/g_model/g' {MODEL_TFLITE_MICRO}
# Давайте глянем, что из себя представляет сконвертированная в C код модель
!cat {MODEL_TFLITE_MICRO}
I only inserted the first few lines of the converted model, but in general there are about 400 of them.
Microcontroller code
We have a model, and now let’s see how our C code will look on, in fact, a microcontroller. Before deploying to the microcontroller, the code can (and should) be run on the computer first. I’ll say right away that I pulled out only the most interesting pieces of code, but if you want to run it on your computer, then behold
// Определяем входное значение, а также ожидаемое выходное
float x = 0.0f;
float y_true = sin(x);
// Логирование
tflite::MicroErrorReporter micro_error_reporter;
// Подтягиваем ранее сохранённую модельку
const tflite::Model* model = ::tflite::GetModel(g_model);
Next, we allocate space on our device. How much to take? In fact, the question is solved head-on: we take a certain random, reasonable number. If the model fits and everything works well, we try to reduce the space. Continue this process until the system stops working.
And finally, this is how the sine prediction looks like on the microcontroller.
x = 5.f;
y_true = sin(x);
input->data.int8[0] = x / input_scale + input_zero_point;
interpreter.Invoke();
y_pred = (output->data.int8[0] - output_zero_point) * output_scale;
TF_LITE_MICRO_EXPECT_NEAR(y_true, y_pred, epsilon);
Edge impulse
We should also mention the Edge platform Impulse from the guys who are closely involved with TinyML.
It takes on a lot of work on deploying models directly to microcontrollers, you just need to connect some kind of arduino to a computer, and roll a model onto it in a couple of clicks. I didn’t use it myself, and I don’t think that it will be possible to do something very serious on its basis, but for those who want to play a little – definitely here.
Well, instead of a conclusion – the topic with TinyML is gaining momentum. In some areas (bracelets for tracking the state of people with heart disease, detection of tongue cancer with a built-in neural network through a photograph, etc.), it simply has no alternatives. The growth in the number of similar devices predicted at the level of 20% per year, which means that we will hear about this technology more and more often.
If you want to know more on the topic, then join our NoML Community – https://t.me/noml_community…
List of sources
Laptop to create the model https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/hello_world/train/train_hello_world_model.ipynb#scrollTo=l4-WhtGpvb-E
C code for deploying the model https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/hello_world/hello_world_test.cc
Binarized convolutional neural networks https://arxiv.org/abs/1602.02830
TinyML community website https://www.tinyml.org/
A specialized YouTube channel (here you can find overview videos and the application of technology in industry) https://www.youtube.com/channel/UC9iWqvsWjhowkHWVJquHwkg
Edge Impulse development platform https://www.edgeimpulse.com/
A book entirely dedicated to TinyML https://www.amazon.com/_/dp/1492052043?tag=oreilly20-20