Continuous learning for production systems


Life cycle of machine learning


Agile Software Development Methodology

popularized around 2010

Agile Software Development Manifesto

promotes the idea of ​​adaptive planning, evolutionary development, rapid delivery and continuous improvement as key features that provide a quick and flexible response to the ever-accelerating changes in the market and its requirements.

Because linear waterfall modelsborrowed from the manufacturing and construction industries have been unable to provide a competitive advantage in an ever more complex and rapidly changing world of software, the Agile model and Scrum have become the de facto standard for modern software development.

But what happens when we make the transition to Software 2.0? In a post from 2017 Andrey Karpaty foresaw fundamental changes in the world of software development:

Sometimes people refer to neural networks as “another tool in the machine learning arsenal.” They have their pros and cons, they work somewhere, but not somewhere, and sometimes you can win Kaggle competitions with their help. Unfortunately, those who claim this do not see the forest for the trees. Neural networks are not just another classifier, they mark a fundamental change in how we will write software. They are Software 2.0.

Machine learning is now gradually conquering and transforming every industry with the help of


predictive modules already integrated into the production processes of almost all products and services (a couple of examples can be seen



However, current machine learning models are inefficiently trained from scratch and statically deployed. on every iterationand because of this, they are somewhat similar to the usual sequential model cascade (hard and poorly adaptable).

Essentially, concurrency is based on the assumption that data is a new requirement for the software (it comes and changes over time), and the learning process is the “design and develop” phase that leads to the creation of the software product (the desired prediction function).

What if we could bring what we’ve learned over the last fifty years of Software 1.0 to Software 2.0?

Continuous Learning as Agile Machine Learning

It turns out it can be done! Over the past few years, we have witnessed significant progress in the field of machine learning, called “

lifelong learning

“(Continual Learning), the main idea of ​​which is to constantly


training models in the process of emergence of new data (requirements). This provides huge benefits, similar to those of the Agile methodology:

  • Efficiency: Since the process is continuous, we do not need to start from scratch every time, spending huge computational resources on re-training the model with what it already knows.
  • adaptability: because the learning process is very fast, efficient and flexible, we guarantee an unparalleled level of adaptation and specialization.
  • Scalability: Waste of computing resources and memory remains limited (and low) throughout the life cycle of a product / service, allowing us to scale intelligence by processing new data.

Although it is often considered that continuous learning is a convenient property to be explored in order to create agents

Artificial General Intelligence

(AGI), and its practical use will be limited to embedded computing platforms (without the cloud), in this post I want to state that over the coming years it

everywhere will become a mandatory property of every machine learning system, actively used in production environments


The Importance of Lifelong Learning

At conferences, you can hear statements like this:

“Machine learning systems are incredibly inefficient compared to the brain!”


“Machine learning methods are incredibly data intensive!”


“Machine learning algorithms are only for supercomputers!”

. Of course it is. For example, in the context of vision, it has been established that it takes three to five years for a child to develop a sufficiently good vision system, after which he




to the environment throughout their lives. Why should the situation be different for machine learning systems?

We expect a machine learning system to train in minutes and learn an ideal model of the outside world. But instead, we should aim to create a system of continuous learning, able to build on its predictive capabilities based on previously learned, compensating for previous distortions and data gapsas well as adapt effectively to new environmental conditions in the process of new data emerging.

In my opinion, almost the same as in Software 1.0, where after more than fifty years of experience we “software development” recognized the impossibility of creating a complex system using a purely linear development modelwe realize the same in the case of Software 2.0.

It turns out that some people in our industry are already beginning to recognize these changes. For example, on Google Play and other Google services with Tensorflow Extended:

TFX: A production-scale machine learning platform based on TensorFlow.

Perhaps, to some extent, in Tesla:


Creation of the Software 2.0 stack, Andrey Karpaty (Tesla)

And in many other companies providing MLaaS, for example, Amazon SageMaker, IBM Watson and so on, or in startups like Neurala and Cogitai.

Why do many companies start investing in lifelong learning? Because it’s much cheaper! Let’s look at a practical example.

Simple Example: Decrease AWS Bills by 45% or More

So, for the sake of simplicity, let’s imagine that you have a web company and you need to recognize the content of images published on your web platform by its users.

Unfortunately, you don’t have data in advancebut you have small sets of new tagged images (like user tags) at the end of each day (iteration cycle), and you want to adapt your predictive models as quickly as possible to improve user experience and recommend the best platform content.

By modern standards, this would mean that you need retrain the entire machine learning model from scratch on all accumulated data and redeploy it instead of the old model. However, this is extremely wasteful in terms of computational resources and memory since you are constantly training the model in the same way.

What if we just add new images to it? In his recent article “Fine-grained Continual Learning“(Take the values ​​below with a grain of salt, as they are not given in the article, but projected for this post) we have shown that a fairly simple continuous learning strategy AR1*tested in the training situation on 391 training sets, can:

  • reduce the required computing resources by an average of about 45% throughout its life cycle: initially its advantage is 0% compared to the strategy retraining and deploymentand at the end for the 391st training set, it takes about 92% less computing resources. It should be noted, however, that strategies retraining and deployment an increasing number of epochs (from 4 to 50) are required for each set, and AR1* consistently uses 4 epochs.
  • reduce unnecessary memory costs by an average of about 49% throughout its life cycle: since we do not need to store in memory all the currently accumulated training data, but only those that are in the current training set, the reduction in unnecessary memory costs is from 0% at the first training set up to about 99% at the 391st training compared to the strategy retraining and deployment.

At the same time, at the end of the life cycle of our object recognition system, only

about 20 percentage points


Continuous learning accuracy in three scenarios of increasing complexity with 79, 196 and 391 training sets. For each experiment, the average data of ten runs is given. The colored areas show the standard deviation of each curve. The accuracy for the total upper bound, not shown in the graph, is about 85%. The results in tabular form are posted on

As continuous learning strategies get better and better at overcoming differences in accuracy with an inefficient strategy retraining and deployment (“Cumulative“), over the life cycle, we can save more than 45% of computing resources and approximately 49% of memory.

Moreover, it is worth noting that in this simple example, we have considered only a limited number of incremental training sets (391), and the continuous learning strategy fully demonstrates its advantages when the number of training sets is potentially higher. This means that lifecycle metrics can continue to grow: the longer the life cycle, the more efficiency increases.

To summarize, continuous learning systems are not yet ready to replace modern machine learning systems (with retraining and deployment). However, I believe that a variety of mixed strategies will prove to be very useful in many real-world applications with good balance between speed/efficiency of adaptation and accuracy indicators.

Anticipating the future Software 2.0I see no other way to effectively patch a system, improve its performance and adapt to the requirements of an ever-changing international market.

Similar Posts

Leave a Reply