Do I need to take a new ML model?

Have you ever had such a situation? A new neural network comes out and all management starts demanding to implement it? Half of your colleagues enthusiastically talk about the new layer that has improved the accuracy of the network? YoloV(N+1)? LLAMA100?

Everyone in the company is excited about the introduction of the new model into production.

Everyone in the company is excited about the introduction of the new model into production.

I wrote the original version of the article published on Medium. So in a sense it is a translation with improvements, but it is the author’s.


Qwen2 and YOLOV10 were recently released. Is it time to change? There is no more hype with ML than the release of new models. Each video about the release of the new YOLO receives a lot of views. Everyone wants to change their models! What should I do? Deploy?

In this article I will try to tell you how to avoid the hype and make informed decisions. Let's talk about the situation of replacing the network with a similar one. If a network can do something fundamentally new, it most likely falls under the purview of product owners. And the question there is: “Do we need new functionality”?

So we will talk about situations like:

Target

Let's start with the most important thing. Determine the purpose of introducing a new network. The goal of “novelty for the sake of novelty” pays off very poorly. When a researcher comes in with sparkling eyes, it doesn't look good for business. What could be more appropriate goals?

  1. Your current neural network is not working well, you need at least X for the system to work, but you have Y?

  2. You are always trying to optimize your models. There is a separate budget allocated for this from above (there may be no logic here).

  3. Your accuracy is directly related to your income/expenses.

  4. You want to optimize your speed, you have a time budget to test several additional models

  5. LLM based models – knowledge about the world around us, about new objects and events

  6. You need to puff out your cheeks and emulate AI activity.

  7. You haven't chosen the models yet and you need to analyze the possible options.

Maybe there are a couple or three more adequate reasons. But in general, it all comes down to situations where there is a direct connection between accuracy and money. We won’t even consider reasons 2 and 6. This has nothing to do with ML. But it happens often…
There is also nothing to add about knowledge about the world around us. If it is critical, then it needs to be changed.

Let's start with cases related to accuracy.

Accuracy is always directly related to speed. You need to compare and make decisions for the same models.

If you see that a new super large transformer has been released, then you need to test it only if the same large model is running in your product. Of course, you can compare with ResNet-18. But the main question is why didn’t you do this before if you could make the model heavier.

ΔX% is not enough to achieve product accuracy

Sometimes you have a clear understanding: “improving the accuracy by ΔX will help the product perform better!” But…

It is important to understand that changing a model to one similar in performance of approximately the same generation will never provide a significant boost.

Replacing ResNet (2015) with ConvNext (2023) may reduce the classification error. But it will almost never be “reduce the error by 2 times”. For a comparable model size, it is somewhere around 5%-10%. If you change similar models of adjacent years, the difference will be minimal. For models with a large difference in years:

between ResNet and ConvNext 8 years of progress

between ResNet and ConvNext 8 years of progress

ResNet-50 (76.1) vs ConvNeXt-T (82.1). Is this too much?
This reduces the error by 6% from the original ~24%. In this case, ¾ of the errors will remain the same. This means there will be approximately 25% fewer errors. This is good, but usually not x2 or x3 in accuracy. It’s even more sad if your accuracy is already more than 99%. Was 99.6% accuracy for ResNet-50, now probably 99.7%.

But what about beautiful graphs? Yolov10 is the best!!

Here it seems necessary to say WOOOOOW

Here it seems necessary to say WOOOOOW

But. See numerous comparisons of YOLOv8|YOLOv9|YOLOv10, etc (1, 2). This accuracy is not where you expect:

  1. The detection boundary is a little bit more accurate

  2. Detection works a little better for very small objects

  3. Larger objects work a little more stably

It can be the same with LLM. Better metrics on a specific dataset? Maybe due to support for other languages ​​that you don’t need in production. But for 95% of the distribution nothing changes. And detection is primarily determined by the dataset. The LLM model will not understand medicine better if you did not have medical datasets in training.

Result. Very rarely can an improvement in accuracy on the dX be achieved by changing the model. This may improve the metrics a little, but that’s all. Are there ways to improve in a more tangible way? See below.

Accuracy improvements as a process

You want to improve the model's performance because errors directly affect your income. I congratulate you! This is a very rare task for Data Sientists. When you can spend time optimizing networks and processes. I would like to remind you that before choosing a more powerful model, you need to check several other approaches that often give up more:

  1. Have you investigated the errors? Is it possible to collect more examples with typical errors? Part of adding system errors to the dataset is the best way to improve accuracy. This way the error can be reduced even by several orders of magnitude.

  2. Have you tried training more powerful models? This makes it possible to evaluate the maximum accuracy/how close you are to it. And if such a model is much more accurate, adjust the distillation.

  3. Have you tried to optimize augmentations for your task? People often forget that while looking at mistakes, you can qualitatively improve accuracy through augmentations. Or vice versa, disable augmentations that worsen the quality.

  4. Have you tried optimizing LR/optimizer/loss functions? Again, this is often forgotten, but there may be more accuracy hidden here than in changing the model.

    If you have already tried all these things, it makes sense to start testing other models.

Selecting a model from scratch/optimizing performance for an existing pipeline

In fact, these are the only two places where it makes sense to sort through several models. But again. I would advise you to at least optimize your work with the dataset and augmentations before getting into this.

What else should not be forgotten

Quantization – very often different modules will have different accuracy after quantization. And the top model may stop performing top-notch. Don't forget to check everything after quantization.

Different architectures. Sometimes, some other architecture works better for some task. I have rarely seen this for classification, but for detection and segmentation, especially a couple of years ago, it was common. Another example is LLM, where text generation occurs versus encoding-decoding.

Preprocessing. Sometimes the right choice of data preparation can affect speed/accuracy. Standard preprocessing YOLOv8, YOLOv5 is like this:

Pay attention to the completion of the square above and below

Pay attention to the completion of the square above and below

And if you use it on rectangles and learn from them, it will be faster:

The area is almost a third smaller!

The area is almost a third smaller!

There are many other ways that data processing can improve your model.

The difference between test and sale. Remember that a correct training set is much more important than a correct model. Keep the training dataset up to date.

Want to maximize quality in a fixed performance? Remember that there are ways to find the optimal architecture for hardware. There are several companies that specialize in this. There are several OpenSource solutions. But the first is usually expensive, and the second is tedious and unstable.

A short summary

Does your model need to be updated? In general it is necessary. But there is actually more hype here than real work.

And a little basement

This article is in video format, if it’s easier for someone to understand:

And I write more on the topic of ML, hardware and optimizations here here.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *