What is more important to you in a DL project, convenience or performance? Let’s look at the problem through the eyes of an engineer-developer of complex systems with elements of artificial intelligence. How does a typical toolkit in this space deal with learning and execution?
In this article, we will run a couple of neural networks in MATLAB and compare the performance of ResNet with open source frameworks. So, if you want to discuss how (besides convenience) a commercial framework can win over open source, welcome!
Where does a MATLAB neural network start?
Our customers are willing to use neural networks as components of their machine vision systems, predictive analytics, digital signal analysis, and many other areas. We often use MATLAB because it allows us to develop and deploy a complex system with a very small team, integrate everything we need with other MATLAB toolboxes, in order to create both a GUI for a laboratory bench and a website in one sitting, and connect external Python code, if it’s necessary.
So, even though we use MATLAB in every ML / DL project, the Russian version deep learning environment comparisons from Wikipedia does not even contain a line about MATLAB.
This is an occasion to talk about this framework and compare it with the rest.
If you have never seen how to run a neuron in MATLAB, here is a small example. Let’s say you want to classify this picture:
What code will allow us to do this?
net = resnet50(); % Открываем скачанную нейросеть im = imread('cat.png'); % Открываем картинку sz = net.Layers(1).InputSize; % Подгоняем картинку под нужную размерность im = im(1:sz(1),1:sz(2),1:sz(3)); label = classify(net, im) % К какому классу относится наш объект?
And this is what we get in return:
label = categorical Egyptian cat
In front of the classifier is an Egyptian cat, and in front of us is a high-level framework. Here you can do fit-predict, or you can download and run a ready-made neural network. Let’s experiment with noisy images.
% Получим пары “класс, уверенность” predict_scores = predict(net, im); [scores,indx] = sort(predict_scores, 'descend'); classNames = net.Layers(end).ClassNames; classNamesTop = classNames(indx(1:5)); % Настраиваем область отображения h = figure; h.Position(3) = 2*h.Position(3); % Сократим ширину окна на 50% ax1 = subplot(1,2,1); ax2 = subplot(1,2,2); % Выводим график image(ax1,im); barh(ax2,scores(5:-1:1)) xlabel(ax2,'Уверенность') yticklabels(ax2,classNamesTop(5:-1:1)) ax2.YAxisLocation = 'right';
Let’s wrap the code in a small loop and get:
Quite a small script, but how many possibilities. Many popular pre-trained neurons are already included in MATLAB Deep Learning Toolbox, and those that are not are can be imported from ONNX, or from TensorFlow 2, or run via interface with Python. You can easily add a module for determining the depth map on the base MiDaS or increase the resolution of the image with ESRGANand your robot will post hi-res 3D masterpieces to Instagram.
This is not all the functionality: in MATLAB, you can draw network graphs and do version control of models, or you can simply train models inside Simulink blocks – you just need to specify a dataset for them.
That’s good, of course, but will the network run fast enough on your platform?
Show me where to speed up
The biggest part of the time in the project is spent on understanding the task and data: no one can return the time spent on solving a task that is unnecessary for the business, and mostly garbage models are obtained from junk data.
We reason on the basis of the diagram, which is a free redefinition CRISP-DM:
What is hard to speed up: the left side of the diagram expresses near-business costs. It can be accelerated by the company’s experience, good practices/patterns, productive communication with the customer and colleagues, the convenience of the framework, version control, all sorts of code and documentation generators, and so on. It is rather difficult to objectively compare frameworks in this area.
Which is much easier to speed up: the entire right side of the scheme was recently the prerogative of data scientists, but over the past 10 years it has been well automated (both due to the democratization of the industry, and due to the progress of methods and things like AutoML). In this article, we restrict ourselves to comparing the execution speed and a little bit the learning speed of neural networks.
What do we expect from a high-level framework in terms of development speed?
High-speed execution of the model
The framework should optimize the code of the model, as well as allow it to be easily transformed and compare the performance of different versions.
Model auto-fitting and easy hyperparameter enumeration
Graphical graph editor, interface debugger (between neural network layers, for example), scripts, AutoML, tutorials and templates, friendly community
Simple and convenient tools for working with data
Exploratory analysis, visualization, user-friendly GUI, refactoring, versioning, reporting
Development and support of the project
Project assembly, integration of all data sources and external components, project unpacking on all necessary target platforms, simulation and debugging in any scenarios
Comparing speed on ResNet-50
Well, it means that MATLAB also has neurons, great – you don’t have to manually write the back propagation of the error. What’s next? Recently, MathWorks employee David Willingham presented a report where he compares the performance of three frameworks: MATLAB, PyTorch and Tensorflow.
The first idea of the differences can be seen in already mentioned table on Wikipedia. Here, the frameworks differ only in the license and support for libraries for parallel computing (OpenCL and OpenMP), which is secondary for us.
Now we will study the results of the comparison on inference, but first we will compare the learning rate of ResNet-50. Sometimes you have to retrain for a new dataset or train custom architecture.
So, for comparison, we took a standard grid for image classification – ResNet, a variant with 50 layers. According to some benchmarks, training on ImageNet 1000 on an NVIDIA M40 GPU for 90 epochs should take 14 days (https://arxiv.org/abs/1709.05011), according to others – it turns out that the A100 will work 50 epochs in ten hours (AIME). On training, our tools show approximately the same results. Maybe the truth is all rests on the iron.
But when the network is already trained, its performance will depend on a host of factors that are interesting to study. It also needs to be run for each user and for each example (photo, new measurement from the sensor, speech signal …), on any platform, from the server to the on-board computers of an autonomous robot. Sending through all sorts of interfaces and accessing databases can take the lion’s share of time …
Let’s compare the performance of the models. First – performance on the CPU.
Performance on the CPU
Let’s compare the launch of the ResNet neural network on the CPU in this scenario: you are working on an embedded platform where there are no accelerators, or on a system where there is no CUDA. For example, this includes a lot of robotics, from college omni-carts to space satellites.
It turns out that our grid code runs in MATLAB on the CPU a little faster than in other frameworks.
It turns out that our grid code runs in MATLAB on the CPU a little faster than in other frameworks. The benchmark for the CPU is built on Intel Xeon 3.6 GHz – such solid frequencies are rarely found on board. On a more mundane CPU, perhaps the code will behave slightly differently.
What if we compile everything with MATLAB Coder? Networking in MATLAB is 2x faster than Tensorflow and 3x faster than PyTorch. Yes, this is not the heaviest neural network, and well-studied besides, there are many optimization tricks that MathWorks could lay in the pre-trained network included in MATLAB.
If you have robotics project with CPU on boardthen MATLAB is, in principle, the only boxed product that allows you to train models and run them on any embedded platform.
Will the effect persist on the GPU? Let’s find out.
Using a GPU allows you to achieve phenomenal speeds of learning or executing neural networks … when run on relatively large amounts of data.
When running through the neural network on the GPU one example in one pass (batch size 1), if you run the network through a script, MATLAB has something to work on, judging by the graphs, you can just run 1 batch on the CPU, there will be no difference. I promised to discuss situations where MATLAB loses – this is exactly the case..
Transferring from shared memory to GPU memory takes a long time, and concurrent data processing (in the form batch) is more profitable if there is more data in the batch. But since model code and a small dataset after all, they must fit in the memory of the video card, the maximum batch size is also limited.
If there are 32 examples in a batch, then everything is much better. Now we load the video card well and all frameworks show approximately the same performance increase relative to running on the CPU. But that’s not all, you can speed up. In MATLAB, there are two ways to compile a neuron for the GPU for this.
MEX functions are shells for external programs. They behave exactly like MATLAB/Simulink functions, but can call an external program, C/C++, or other code along the way while executing MATLAB code. The MEX toolkit generates C++ code for the GPU, but does not require a license for GPU Coder for prediction tasks.
The ResNet neural network code packed in MEX runs on the GPU almost 2 times faster on one image (on one batch) than in PyTorch or TensorFlow.
The second way to get optimized code is − GPU Coder. It’s already special GPU code generator. To apply it, you just need to switch one checkmark in the code generator settings (it’s called Generate GPU code), after which we get an embedded optimized CUDA code that does not require any special environment to run on the target platform.
A neural network automatically optimized for TensorRT runs very fast – almost 3 times faster than the same network on a GPU in Tensorflow or PyTorch. And this is on one batch.
In their report, MathWorks notes that for six months, the execution of ResNet was accelerated by almost 2 times. Apparently, both in the MATLAB core and in the architecture of the neural network, there is still something to optimize. This inspires optimism – we will expect that the developers will keep this pace of improvements in the next versions.
In general, MATLAB meets all our expectations from a high-level framework in terms of developing systems with elements of artificial intelligence (the list of criteria is above). It makes it easier for us to work with data, and the selection of classical machine learning algorithms, and the connection of deep neural networks, and the development of our own topologies, if necessary. So we use it in AI projects for a wide variety of clients.
Please tell us in the comments about your experience with neural networks in MATLAB, about benchmarks or preferences for frameworks. We wish everyone very productive AI projects completed in a comfortable environment!