How to prepare EdgeAI in 2024

I have this kind of entertainment – testing different boards for AI.

For what? I have been involved in Computer Vision for over 15 years. I started with a classic CV. Now there are transformers and that's all. But now I mostly manage teams: I structure how to properly combine the product and mathematics.

A lot of what I work with is about Computer Vision on Edge. At some point I realized that I lacked information. Would you like to read something about the new board? And there is nothing but an enthusiastic press release about it. God willing, there is still a video of official examples being launched. But usually without it.

At some point I started testing everything myself. To understand what is possible and what is not.
Most of these tests are in my cart.

Sometimes (once every year or two) I write a review article. And that's exactly it. Here I will try to consider the criteria that can be considered important for AI boards. We will also briefly review the main boards on the market (mostly with links to our reviews).

My last review (for 2022) can be found here on the hub.

In this review I will change the structure a little:

  • Let me consider the criteria for choosing boards. And for each criterion I will give several typical examples where the criterion works well and where it works poorly.

  • I will consider all the main boards.

Well, yes. The article was originally written here. The translation here is the author's version, with minor edits and additions that were made after publication. There is also a version on YouTube:

Criteria

The board selection criteria can be divided into several categories:

  • Grocery

  • Engineering

  • Scientific

These criteria do not have a clear boundary. The same energy consumption can be called a product and engineering criterion. But it’s better to at least break it down somehow in order to highlight the zones of influence of different parts of the company.

Grocery

Product criteria are how the consumer or your product sees the board:

  • Manufacturing price of the board. A board based on SG2002 may cost $5. And a board based on Jetson Orin can reach up to $1000. And between them there is a continuum of solutions.

  • Cost of developing ML on board. On Jetson the price will be minimal, and on some microchip it will be maximum.

  • Do you need your own production? Some boards are sold only in chip format. For example Hailo-15. You won't buy boards based on it. Also, nothing will work if you need the correct configuration of connectors, or the minimum price.

  • Possibility of producing a large batch. Everyone knows about the problems with the supply of Jetsons. Only those who have been promised by Nvidia have no problems…

  • In which country the product will be released. If you release a product in Russia and you are not associated with the government, getting Jetson will be expensive and difficult for you. If you release a product in America, you will not be able to buy Huawei. And there is no point, it won’t be sold anyway. If you are making a product for hospitals in Europe, most likely you will not be able to use RockChip (this is both certification and a restriction on equipment suppliers).

  • What is the board's power consumption? If you want to build facial recognition into a battery-powered doorbell, that's one level of boards. And if you can spend hundreds of watts on recognition, that's another level.

When you think about product criteria, you need to roughly limit the parameters for each item. What volumes do you want? What is the price range, form factor, country, etc.

Engineering

Engineering, this is what the board looks like to interact with it. What are you ready for within your company?

  • System. Someone wants a Windows system. But this is rather rare. Linux? Which one? Ubuntu or YOCTO? Or BuildRoot? Or maybe you don’t need any Linux, and pure Linux like ESP32? Or should I quit MicroPython?
    Obviously, this greatly affects what the team should be able to do, the convenience of fleet management, and the ease of manufacturing hardware.

  • Is this a separate device? Some accelerators for neural networks are separate boards. Some are integrated into the processor. It is clear that these are different inference options for different tasks.

  • How powerful is the processor? Executing neural networks is often not all that algorithms need. And you need to look at how much the processor is needed:

    • Can it keep up with image preprocessing?

    • Can it keep up with decoding and encoding video?

    • Can it process 3D?

  • Support from the manufacturer. Often such boards are very limited and documentation is not complete. Is consultation from the manufacturer required for development? Because open sources are not always enough.

Research

These are the criteria for the AI ​​you put inside. They are most often overlooked when choosing a board. But in vain. This can impact development time tenfold.

  • Speed ​​of inference. This is a key parameter for many applications. It is clear that if the board provides only 1 FPS for detection, nothing can be done if you need to detect objects at 1000FPS.

  • Supported layers. Complexity of export tools. What does the manufacturer provide? Is quantization necessary? What about LLM support?

  • Memory capacity, memory speed.

Jetson's family

It seems that Jetson is now perceived as “edge by the default”. I first collected Caffe using them back in 2015. And since then they have only gotten better.

Orin series from Nvidia

Today the current series is Jetson Orin. There are three types of devices in this series (Nano, NX, AGX). They differ in price and computing capabilities. The old Jetsons are still in use. But much less often than the current ones.

There are several subtypes within AGX and NX devices. Again, varying in price and speed:

Here's the whole table

Jetson is, first and foremost, a GPU board. There are several versions where the GPU gives less performance than the NPU. But the GPU is much more convenient even there (quantization is not needed). The CPU in Jetson is weak enough to run networks. NPU is not available everywhere yet.

The key difference between the different models is the NPU. It's not in Nano. In NX there are 1 or 2 of them (depending on the model). AGX has almost 3 times the frequency.

Right here you can check out my review of the previous generation Jetson Nano. In terms of performance, it is already losing significantly. But most of the ideas and logic of the work remained the same.

Modern NPU works well only in Int8, but due to fallback layers it is possible to ensure the calculation of individual layers via the GPU.

TO pros Jetson can be attributed to:

  1. Huge infrastructure around. TensorRT, Triton, CUDA, etc. Almost everything that can be run on a desktop can be run here without any problems.

  2. A huge amount of information on the Internet. There is already a topic on the Internet for almost any problem.

  3. Support for modern models. Yes, something may not work. But most of the LLMs, VLMs and so on are already here. And this is a qualitative difference from 95% of boards.

  4. High speed. If you convert TOPS to USD, Jetson may not be the most profitable. But of the boards of this format, it is clearly one of the most productive.

  5. Ability to write low-level code via TensorRT.

TO cons

  1. Price. The Jetsons are expensive. The assembled device on NX will immediately cost about 1000USD, which is not little.

  2. Availability. There are only a few ways you can ensure an uninterrupted supply of Jetsons. You either need guarantees from Nvidia itself, or you need to be some kind of government (link to drones). In all other cases, you cannot guarantee batch sizes of thousands of devices.

  3. Energy consumption. Nvidia often reports that each new Jetson is more and more energy efficient. Maybe that's true. But each next one only consumes more and more. For NX now it’s about 40wt, which is not small.

  4. The NPU is primarily INT8 oriented.

x86(Intel, AMD)

When talking about x86, first of all we need to talk about Intel. There is much more support for AI on their part. NPU is present in the latest chips. OpenVino has been supporting Intel GPUs for a long time (where the performance is quite good).

The main disadvantage of such computers is their high power consumption. At the same time, the performance is comparable to Jetson. And the devices are much more affordable. And there are a lot of them. From N100 for 70 bucks to boxes under 1000.

Pros:

  1. Availability

  2. Good community, support. Basic CPU inference – works by default. ONNX Runtime, OpenVino, TorchScript, etc.

  3. The ability to efficiently calculate all modern networks (some via PyTorch, but still).

Cons

  1. NPU and GPU are not in every device

  2. There are networks where support and speed are worse than Nvidia

  3. Power consumption is often higher than the Jetsons.

  4. The price is Jetson level

Other CPUs (ARM, RISK-based)

To close the “classics” thread, I’ll also mention other CPUs here. Those boards that are suitable for Embedded development and use ARM/RISK are usually significantly slower than x86. However, this does not prevent them from sometimes being fast enough to solve many problems. The same RockChip, MediaTec, Huawei, which will be discussed a little lower, have more than decent processors that can take on ML for many situations (CV, NLP, etc.). At the same time, it is obvious that “import onnxruntime” out of the box is super simple and convenient.

But, of course, power consumption and maximum speed will be lower for most NPU modules, x86, as well as good video cards (such as Intel's).

RockChip

On my channel there are probably the most videos on RockChips (1,2,3,4,5,6,7). Now there are really a lot of them, and they are very good for ML tasks. They have a great NPU module with great support for different networks. A good half of modern Edge boards are based on them.

  • OrangePi

  • Radxa (RockPi)

  • Banana Pi

  • NanoPC

  • Khadas

  • FireFly

Just a few boards

Just a few boards

And many others.

They are made on the basis of different boards

  • RK3588 – the most powerful and top-end in terms of performance (and there is a pack of upgraded RK3588s, 3582)

  • RK3568 – One of the old boards. Quite slow and not optimal in price. But at the time of choice it was faster than RPi and one and a half times cheaper.

  • RK3566 – Super cheap board (Linux + NPU)

  • RK3576 – Similar to 3588, but a little simpler processor

  • RV1106RV1103 and several others – boards without full Linux and Python inference

  • RK3399Pro – The oldest NPU board, now almost no longer supported.

  • ETC…

TO benefits RockChip can be attributed

  1. Price

  2. Availability. Can be purchased from dozens of different manufacturers. Individually and in large quantities.

  3. Volume of supported networks. Of course they are inferior to the Jetsons. But you can find almost any network. Now the border runs approximately along the VLM. They support LLM, but VLM no longer exists. Some transformers work, but Whisper is only from one team under the GPL license.

  4. Many different forms of factors. You can buy a completely finished board or build it from scratch.

  5. FP16 is also calculated on the NPU. This is very important, since not every network can be quickly and easily launched under int8.

  6. There is some low-level access to the NPU – you can do a lot of math on it manually.

Cons:

  1. Quality. Many vendors make crude systems based on RockChips. RockChip itself cannot be said to produce very high-quality code either.

  2. This is a Chinese board. In the US and EU this may be subject to various restrictions.

  3. Not every network can be launched.

  4. Complex NPU architecture of older models. There is no inference server of your own – you need to use multi-threaded inference to maximize execution speed.

Qualcomm

In recent days there have been rumors that Qualcomm is going to buy Intel. And, it should be noted, this is a good competitor in Edge Computing.

You can buy such a fool for 400-600 bucks

You can buy such a fool for 400-600 bucks

At the moment I don't have a video about this board on my channel. And the last time I developed for Qualcomm was 3 years ago, a lot has changed since then. As soon as I have the opportunity to test RB3, I will add it to the channel.

Pros

  1. Quick inference

  2. Cheap board in large quantities

  3. Fairly good neural network support and documentation. 3 years ago there were problems – now there are fewer of them.

Cons

  1. There is a lot of bureaucracy. Access to the development environment can take up to a month. As a private owner, you cannot buy a board and sign all the contracts.

  2. No open information. All for NDA. You cannot predict in advance whether your system will work or not. For example, I’m almost sure that LLMs never work fully (including VLM).

  3. As far as I know, there is no low-level access to the NPU.

VeriSillicon

These are truly multifaceted boards. And most likely you haven’t heard about boards from VeriSillicon. Because they don't exist. The company sells chip designs. And many people have NPU from VS. For example:

  1. NXP is one of the largest electronics companies. I have a Debix review on my channel.

  2. Amlogic is one of the leaders in small-sized processors (but it seems that VS is no longer available in the latest motherboards). I have a review on my channel Khadas VIM3 based on Amlogic 311D

  3. STM32. Of course, this does not apply to all motherboards, but to the most productive ones. I have a small one on my channel interview with their representative.

  4. Synaptics.

  5. BroadCom.

I'm sure I'm missing a lot.

Since the company provides hardware and a set of low-level libraries, the experience from two different vendors can be fundamentally different. Look at the videos I posted above. Essentially, if a good vendor, such as NXP, the use is super intuitive.

Pros

  1. This is a fairly energy efficient architecture

  2. There are a lot of vendors, they sell in different form factors

  3. Chips are quite cheap

Cons

  1. There are vendors for whom export does not work very well

  2. These are not super fast boards

  3. Not all networks are supported. No LLM, VLM, etc. And, unfortunately, this cannot be corrected in any way – there is no low-level access to the NPU.

External accelerators

In this section I will talk about all accelerators in general. But, of course, each of them deserves a separate issue, or even several. What do accelerators have in common? They are trying to solve the problem of “adding missing AI power to your system.” Connect separately.

The most popular accelerators

The most popular accelerators

The most important thing for an accelerator is how it is connected. Mostly these are PCI-e(M.2) or USB accelerators. Important for:

  1. How much data can be transmitted through the channel. If you run your networks on large images or videos, this can be very limiting. There are accelerators from PCI-e(2) x1 line to PCI-e(4) x4 line. This is also important for devices. For example, on RPi there is only 1 line (officially PCI-e(1), but in practice PCI-e(2))

  2. What is the latency for transmission? If you need to react quickly, this is critical.

  3. Is the processor fast enough to prepare data and send it over the bus? On slow boards, calculations can be much slower even when there is a fast accelerator.

What accelerators are the most popular?

Hailo

Hailo-8, Hailo-10. I have two videos about him on the channel(1, 2) . And soon there will be a third one. I have used it in practice and consulted several companies regarding it. So in the video you can find my real feedback and detailed review.

Pros:

  • Good support, open community

  • Just buy

  • There are guides for RPi

  • Fast

  • Very good guides for exporting models. Lots of quantization algorithms out of the box

Cons:

  • Hailo will cost more than some RockChip (but cheaper than Jetson)

  • The need for quantization. And there are no options.

  • Many Transformer Based models do not work (LLM/VLM/Whisper). Perhaps something else. Hailo promises to provide support. They released Hailo-10 especially for LLM. But there is no launch option yet.

But there is also Hailo-15. And unlike its brethren, it's a dedicated processor module rather than an external module. Its pros and cons are more or less similar. But there are a few points:

  1. It is slower than Hailo-8 and Hailo-10.

  2. He has a weak processor. However, if you don't need to process many cameras or do complex pre-processing, it will definitely be enough for you.

  3. The processor-NPU bus is fast.

  4. It's cheaper.

  5. There are no ready-made boards out of the box. It is necessary to develop based on reference design (at least this was the case recently)

Axelera

The channel only has interview with a representative. I haven’t tested it myself, so I can’t guarantee that the words spoken are true or not.

Pros :

  • Very fast. Judging by the documentation, one of the fastest external boards.

  • Abundance of form factors

Cons:

  • Expensive enough. Perhaps this applies to individual boards.

  • As of spring/summer, it was only possible to place pre-orders for development versions. But there were working samples at the exhibitions.

  • Not all transformers worked (LLM, VLM, etc, again, could have changed)

SIMA

There is no review on the channel, but there is interview which I wrote down. I would say that SIMA is less oriented towards Edge in terms of “near-camera computing”. But rather “computing on a local server”. And this is more of an alternative to a GPU than an accelerator for a local board. But they support up to 8 channels.

Pros.

  • The only manufacturers of such boards who more or less promised LLM and their analogues.

  • Fast. One of the fastest connections (lots of PCI-e 4th generation lanes)

Cons:

Other plug-in accelerators

Most of them have very little information (except Coral)

  1. Kneon (USB)

  2. Coral (USB, M.2, PCI-E) – I would say that they are already too old

  3. Gyrfalcon – it seems that they have already accumulated

  4. BrainChip

  5. kinara.ai

I think I forgot a lot of them. So write in the comments

Ambarella

One of the most mysterious boards. You won't find reviews of it on the Internet. But at the same time, it is used in many cheap super mass devices. Lots of DVRs, DJI cameras, etc., etc. At the same time, there is a complete policy of secrecy. I know many teams working with it. But I didn’t have the chance myself. Largely due to this closedness. When in 2018 we at Cherry Labs were deciding whether we should switch to this board, the blocker for us was precisely the inability to adequately test.

Ambarella also wants to apply for LLM

Ambarella also wants to apply for LLM

I really hope that I will be able to try it someday. But at the moment I don’t consider myself competent enough to talk about it. Should be cheap, with int8 quantization.

Huawei

The channel has a review of Orange PI AiPro, with the corresponding board. And on Habré I wrote an article about it.

Orange Pi Ai Pro

Orange Pi Ai Pro

Overall, the board is quite decent and open. Good speed.

Pros:

Cons:

  • No LLM support

  • Focused on the Chinese market. A little Russian and Indian.

  • Documentation in Chinese. Can't buy in Europe and America due to sanctions

Sophron

This is a manufacturer of AI accelerators that has become very popular recently. You could see with it: MAIX-CAM, Milk-V, SiFive, hw100k, reCamera, etc.

And again, it turns out that the manufacturer here means a lot more than the chip vendor. Look at mine video about MILK-V. Using the accelerator is almost impossible. There is no documentation, most of the examples cannot be built, there is a lot of C++ code.

Milk-V Duo

Milk-V Duo

And compare with video about MAIX-CAM (not mine). This is a qualitative difference. But at the same time, I would not call MAIX-CAM a product solution. More of a “small batch craft”. It is difficult to make a more or less general overview of all platforms. I myself haven’t tested SiFive at all and haven’t found any good reviews. But in a nutshell:

Pros:

Cons:

  • Will likely require a lot of support if the party is large

  • Most likely, a fairly limited memory will be enough

  • No LLM/modern networking.

  • Most vendors are Chinese. Most likely can't be used for everything in Europe/USA

MediaTek

Another big company. Accustomed to working with big people. Somewhat reminiscent of Qualcomm. And this spoils the whole experience.

Quite good, fast and cheap Radxa NIO 12L turns out to be almost useless:

  • There is only access to an export tool as old as mammoth waste products

  • MediaTek refuses to provide access to new export tools.

  • Radxa doesn't seem to know about their existence

You can only test if your company enters into an NDA with MediaTek. Which is not easy.

And this is the feeling I had after testing it

And this is the feeling I had after testing it

Does this make the Genio a bad board? Overall, my expectation is probably not. This is a good board at a reasonable price. Better than NXP in terms of performance, worse than Qualcomm. But there is no chance to test it.

Hobi boards.

There are a lot of manufacturers who make small, good boards aimed at enthusiasts who don’t know what Computer Vision is but want to add eyes to their devices. It is very difficult to test such boards, since they are about “other things”. Often there are not even any detailed instructions, just a “friendly interface”. Nevertheless, the boards perform their task. From what I tested on my channel this:

And, probably, the already mentioned MAIX-CAM also belongs to this direction.

Outdated board.

It's funny, but for many the topic of “accelerators” is still very new. And often people don’t understand that several eras have already changed. They write to me with questions about how to launch something on Intel's Myriad. And this is a board where Intel has already completed production and support. There are many such boards. However, they are still often used in production.

For example, the OAK series from Luxonix uses various Intel accelerators (it seems that they will soon switch to Qualcomm, but for now it’s official RVC 4 not released and these are just my guesses).

Google Coral, released almost 7 years ago, evokes approximately the same feeling.

And other small things like K210original ESP32(1,2), MAIX-IIGAP8, etc.

If anyone is interested, I have provided links to the reviews that I have. But these are not the boards I would recommend in 2024/2025.

Microcontrollers

If we touch on the original ESP32, then we can touch on the topic of microcontrollers. On my channel, I mostly avoid such fees. Except MAX7800ESP32. But you need to understand that there are a lot of them now:

All of them differ in the following:

  • There is almost never a normal operating system. This is either C/C++ or MicroPython development. Or, through EdgeImpuls, if it supports this board.

  • Typically the number of running networks for each platform is very small. Literally 1-2 grid

  • Usually the speed is very low. We have to use super optimized models

  • Almost always boards require int8 quantization.

  • Almost always, manufacturers have very average documentation.

  • Many of the boards require a lot of C++ code

  • Some are completely unsuitable for images, positioned for voice

Edge Impulse often solves problems. But you need to understand that it may not fully support the board. And the convenient interface that it provides can partially limit both the achievable capabilities of AI devices and the possible features of the board.

Other fees.

There are many other boards. So far I haven’t had enough strength to test it:

  • MAIX-III— I tested this one, but it’s so-so, as it seems to me

  • Texas Instruments (Beagle Board, etc.). Quite a popular board. Now I’ve found a way to test it, maybe I’ll get started.

  • Kneron – a lot of marketing about it

  • MAIX-IV (AX650N, axera-tech) – I wouldn’t expect anything cool from it. Feels like this is a continuation of the third version

  • AMD Kria is an FPGA. I'm skeptical about the idea of ​​ML on FPGA. All my friends who did some projects on this board argued for a very long time. So I decided not to test.

  • Arm Ethos (U55, U65) —Arm also decided to take the “do ML” path. So far they are producing very weak accelerators. But I hope the day will come and they will start in the fattier segment. Haven't checked anything yet.

  • Renesas – they also have a lot of boards, they seem to be a fairly large vendor, but haven’t looked yet

  • Syntiant

  • BrainChip – I tried to communicate with them at some exhibition, but everyone was busy. And it seems like it’s not easy to get a piece of hardware

  • MemryX

  • deepx.ai — I asked them for a price, but they gave me something like “a couple of thousand USD,” so I gave up

  • Horizon X3M – seems to be no longer in production

  • Kendryte k510/k230 — It seems the guys have switched to mining. But the boards are still on sale.

  • Sony IMX500 – A very strange board. The calculator is combined with a matrix. It's like OAK-D but on one chip. RPi presented a camera based on. But knowledgeable people say that working with the chip is difficult. Performance-wise it shouldn't be great.


Do you have any questions? Or maybe there is some interesting board that I haven’t considered here. Write!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *