OrangePi AiPro – Guide and Review
I don't know what to call a review/guide about this board. The most Chinese board? The most mysterious? The most controversial? In any case, one of the most interesting!
At the moment my top Edge boards for Computer Vision / LLM are probably like this:
Jetson Orin (Nano is weak, but the rest are ok)
Intel-based (doesn't matter what)
Hailo vs RochChip (I don't know which is better)
The board we are going to talk about can probably compete for the fifth position (most likely with Qualcomm). But it looks like it is not very eager. Especially outside of China.
We will talk about the Huawei-based board – OrangePi Ai Pro. The company is banned in half the world. But it is filled with Chinese taxpayers' money. And it makes decent products!
The board is released only for the Chinese market and is a mystery to those who do not know Chinese. However, unlike regular Chinese boards, it is not terrible. Well, almost.
Disclaimer + context
Nobody sent me a board. This is my entertainment, to get strange boards and see how well ML works on them. And then drag them into production. Over the past 15 years, we have worked on blackfins and other exotic things. It seems that there were almost pioneers of Jetsons in 2014 and RockChips for Computer Vision 3-4 years ago.
I make a video about all this on my site YouTubeI write articles here, on medium or at home cart.
I will be relying on many other boards for this comparison. My opinions on many of them can be found in this 2022 review.
This article can be found at mediumor watch as a video on YouTube
Purchase
The board is obviously not officially shipped outside of China. But you can get it, of course. Some time ago there was a lot of confusion with OrangePi 5 pro, but now everything has more or less settled down. In Russia, it seems to be sold with an overprice of 50 percent compared to what you can buy from China in the EU.
It seems that in China you can buy it for $120-140.
Chipset
Formally, the board is based on the Ascend processor (Ascend310B4).The 310 series itself was launched back in the year 2018. B4, it seems, was made 2 years ago (but it's not certain). A year ago there was an official reference board – Atlas 200I DK A2. It already has normal Linux and Python.
But until now, everything you could buy was around 500 euros, which is a bit pricey. And OrangePi is not bad for the price.
Documentation
We approach the first quest. The official website from the documentation looks something like this So. Nothing in English. And all the files, as usual, are only on Baidu.
Just while I was writing this article, something appeared English language landing page on the OrangePi website. But the download links are empty for now.
The main guide can be found on several sites, for example here here. It's obviously in Chinese. But there are no images or files there. And the rest of the files are more complicated.
There is such a semi-official one website. Not everything is here + not everyone can download from it. There is a super useful one website. It's about Atlas 200I DK A2. The software is all the same. But I haven't tried to take the image of the system itself from there. And it differs by 1GB. So I wouldn't take that one.
But, of course, you can download it from Baidu. When I mentioned on my channel that I received a payment, the guys from KAVABANGA.TEAM shared with me the images downloaded from Baidu. I'm posting it.
Information points
Community
Gitas
Chinese
European
Other
We are flashing
The documentation recommends writing the image via Balena Etcher. But I couldn't do it. In the end, I wrote it via RPi imager. The ascend website has its own toolBut it's in Chinese, I haven't tried it.
Let's launch
The launch is obvious. I prefer via SSH. I love the Chinese because they write the login password on each pole.
You can connect via USB, via COM port or directly plug in the monitor.
Let's run the first example
I really like how the Chinese have been making examples lately. Even in boards that are almost impossible to work with, making a beautiful example that runs in 2 clicks is a matter of honor. This was the case in:
Grove Vision AI (the board itself is more of a toy, but the example runs out of the box)
MilkV$5 fee. One example out of the box, and to run the rest you'll have to go through hell.
MaiX-III . All examples are simply run from python. But each network is a baked binary in c++ with network weights and code inside
In short, the beauty of the first example has been confusing me lately. But here it is not bad:
cd samples/notebooks/
./start_notebook.sh
(the laptop can be configured to just go in from the outside, but I prefer to throw a tunnel)
And that's it. You have an interface from which you can launch a dozen examples!
It seems to me that this is the easiest and most convenient way to run neural networks on remote boards for beginners now. A professional, of course, will try to connect via VSCode. But when you need to do something quickly and understand what and how – it is convenient directly.
This is where 99% of guides about any AI boards end. Ours is just beginning.
Let's talk about how to fit your model into the board!
Export
If you search in the main search documentation. You won't find the words ONNX, TensorFlow, PyTorch. It would seem that this could make me sad. But in the official example, it seems like the export was described. And it even seems to work right on the board:
This is rare. It seems that only Nvidia prefers to convert on the boards. Almost all other manufacturers prefer to export on the host machine.
In practice, it does not start. At least on an 8Gb board. The problem is described here.
But this solution does not work. Apparently, because the reference board is still different. In the example from OrangePi, you can see attempts to play with the swap. But it did not work for me. The export process leads to a reboot.
When I was already preparing the article, I found this this approach to fix the problem. Force the model compilation to run on one core by adding before export:
export TE_PARALLEL_COMPILER=1
export MAX_COMPILE_CORE_NUMBER=1
And it really helps, but the compilation is monstrously long:
To run it, you need to initialize all variables and run:
. /usr/local/Ascend/ascend-toolkit/set_env.sh
export PYTHONPATH=/usr/local/Ascend/thirdpart/aarch64/acllite:$PYTHONPATH
atc --model=yolov5s.onnx --framework=5 --output=yolo5s_bs1 --input_format=NCHW --input_shape="input_image:1,3,640,640" --log=error --soc_version=Ascend310B4
So we'll leave the compilation as homework for the lucky owners of 16GB boards.
Real export
For us, simple mortal peasants, another way is available. Installing CANN-Toolkit on the host machine. And exporting through it. Please note that the system must have at least 16 gigs of RAM. Or better yet, more. Official guide.
Download the latest toolkit. For me it was seventh.
Initially I downloaded version 6.2 and nothing worked. While I was preparing the publication, version 8.0 came out. It is also not clear whether it is compatible or not. So for now it is better to use the 7th.We set up dependencies. A lot of.
sudo apt-get update
sudo apt-get install -y gcc g++ make cmake zlib1g zlib1g-dev openssl libsqlite3-dev libssl-dev libffi-dev libbz2-dev libxslt1-dev unzip pciutils net-tools libblas-dev gfortran libblas3
Install Python3 + pip3 if you don't have them yet. If you do, don't forget:
pip3 install --upgrade pip
But for this piece of the guide, hands should be torn off. And people should be taught to do it in Docker. Many hardware manufacturers are already doing this. For example, Rockchip started doing it this way. Hailo does it this way. You can't do it like this:
pip3 install attrs
pip3 install numpy
pip3 install decorator
pip3 install sympy
pip3 install cffi
pip3 install pyyaml
pip3 install pathlib2
pip3 install psutil
pip3 install protobuf
pip3 install scipy
pip3 install requests
pip3 install absl-py
Without specifying versions, even without a ready-made Requirrements.txt … In short, at least numpy <2.0 should be installed now. I started with 1.26.0. Every day the difference will be bigger and bigger. Now all that remains is:
chmod +x Ascend-cann-toolkit_6.2.RC2_linux-x86_64.run
./Ascend-cann-toolkit_6.2.RC2_linux-x86_64.run --install
And in general everything is ready. Before exporting, you need to configure the environments:
source /home/ubuntu/Ascend/ascend-toolkit/set_env.sh
export LD_LIBRARY_PATH=/home/ubuntu/Ascend/ascend-toolkit/7.0.RC1/tools/ncs/lib64:$LD_LIBRARY_PATH
export PATH=/home/ubuntu/Ascend/ascend-toolkit/7.0.RC1/tools/ncs/bin/:$PATH
Different guides have slightly different values. This is what worked for me. It was installed in /home/ubuntu/Ascend/
Works with some warnings. But it works!
Impression of export
This is far from the worst board 🙂 The final guide is quite simple and clear. There are no magical errors at all.
But there is no ready guide anywhere. Either for another board, or very fragmentary from OrangePi. The system files need to be collected from all over the Internet. Even taking into account that I understand well what to look for. And searching and googling errors in Chinese, where Google can't cope – is trash.
I would give it 6-7 out of 10. I finished it in about ~5-6 hours.
DISCLAIMER. From my experience with other boards. This guide works for now. But I'm sure that it will break in less than half a year.
Layer tolerance
All basic convolutional grids are transferred well. Even some transformers went (Dinov2 checked). The threshold is where you need to work with text. YoloWorld was not exported out of the box.
And what's with LLM is not entirely clear. Neither Torch nor Torch.jit are supported. The following are supported:
Onnx
Tensorflow
Caffe
MindSpore
But, as you understand, most LLMs are poorly exported even in ONNX. So, it seems to me, the limitation will be greater at this level. Everything is ok for pure convolutions, but all the logic must be extracted from the model.
Export Whisper from here only partially successful.
Let's look at the speed
The board is good in speed! All tests below were done in FP16 (in int8 it should be 2 or 4 times faster). Yolov5:
preprocess time:
0.0060651302337646484
inference time:
0.04085516929626465
nms time:
0.0065386295318603516
Look at the comparison with other boards here (unfortunately, I haven't added a few new ones yet). The following are faster:
Jetson Orin
RK3588 in 12 thread mode (because of the tricky 3-core NPU you have to be creative there) in one thread slower
Hailo-8 (but hailo is int8 and very dependent on how fast the bus is). For example, the modern RPi Ai Kit will be slower.
MAIX-III (but it is absolutely unusable)
Intel processors (but quite modern)
At the same time, in terms of simplicity, usability, speed, price and support, the only direct competitor is RK3588 (and I like it more, but more on that later)
And some comparisons of Dinv2 I tested here:
Dinov2S (224*224) - 46ms
Dinov2B (224*224) - 111ms
ResNet-50 - 12ms
A couple of important notes:
The platform can always perform better on other architectures. No need to wait for analogies in other networks with the same RockChip.
Increasing the batch size does not result in speedup
Let's look at consumption
I'm a little confused. The power consumption in standby mode is -7W, and the power consumption with the npu fully loaded is 9W. This is completely different from all other boards I've tested.
Temperature
Temperature is also very strange. I tested it without a cooler. The board is very hot, even in standby mode. And the temperature is almost the same during grid inference.
General opinion about usability
Oh. Now this is interesting.
I will speak against the background of everything that I have dealt with.
When choosing a board, it is important first of all to look at what it solves and where it can be used.
It is clear that if the board is to be used in any critical infrastructure. Hospitals, smart city, etc., then no one in Europe or America will use Chinese boards. There we use Jetsons, Hailos, Intels. There can be five other boards.
It is quite normal for private businesses to use the Chinese. And that is why all sorts of simple things (animal monitoring, smart stores, private parking, etc.) often use RockChips.
But there are problems with orangePi AiPro:
Even for Europe/America/Japan it cannot be used everywhere.
Even Russia is not the main market for the board. It is necessary to invent a lot of additional things for it to work correctly and be supplied well.
They don't even try to write documentation in English.
And it seems that these three problems are enough to choose rockchips everywhere except for super large lots (there you need to look at slightly different parameters)