Many companies have been implementing machine learning in leaps and bounds, acquiring and developing models, algorithms, tools and services for specific tasks. This approach is inevitable at the start, when the business is just learning about the possibilities of technology. But the result is a hodgepodge of isolated, manually launched, non-standardized processes and components. As a result, inefficient cumbersome services appear that bring less value than they could, or completely block further development.
Team VK Cloud translated an article about the importance of standardization and automation of ML processes and how the MLOps approach can help with this.
Why is MLOps needed?
If a company plans to scale ML applications across the company, it needs to automate and standardize tools, workflows, and workflows. It’s important to build and run machine learning models quickly, spend less time manually training and monitoring models, and more time on innovations that bring value and profit to the company.
Developers need access to the data on which their ML models are built to work across different lines of business and leverage the same technology stack transparently. In other words, in order to organize the efficient and flexible work of ML models, you need to adopt best practices in software development. In the context of machine learning, this is primarily MLOps. It is a set of development methods that make models work efficiently and flexibly.
MLOps is needed to automate the repetitive actions of data scientists and ML engineers – from developing and training a model to deploying and operating it. By automating all these steps, companies gain the flexibility and users and customers gain the ease of use, speed, and reliability of machine learning. These automated processes help reduce risk and free developers from routine tasks, giving them more time to innovate. All this affects the final result: according to McKinsey’s 2021 global survey, companies that can scale AI-powered projects can increase their bottom line by 20%.
“It’s not uncommon for companies that successfully develop complex ML solutions to come up with different ML tools in specific areas of the business,” says Vincent David, senior director of machine learning at Capital One. “But you can often see parallels: different ML systems do similar things a little differently. Companies looking to get the most out of their machine learning investments are consolidating and amplifying their best ML solutions. As a result, they develop standardized, fundamental, accessible tools and platforms for all – and ultimately create solutions that compare favorably with others in the market.
MLOps is closely tied to the collaboration of data scientists, ML and SRE engineers, which should ensure system reproducibility, monitoring and operation of ML models. Over the past few years, Capital One has developed MLOps best practices that apply across the industry. These solutions balance the needs of different users, use cloud technology stacks and fundamental platforms, emphasize Open-Source tools and achieve the desired level of Governance and availability of data and models.
How to meet the needs of all users
Typically, ML applications are operated by two types of users: technical experts (data scientists and ML engineers) and non-technical experts (business analysts). These user groups have different tasks:
- technical experts often need complete freedom of action to use all available tools and create models for one purpose or another;
- everyone else wants easy-to-use tools to access the data they need to create value in their own workflows.
In doing so, you need to somehow create consistent sequences of actions and workflows that are suitable for both groups. Vincent David recommends that you meet with application design teams and subject matter experts who work with different use cases.
“In order to understand these problems, we consider certain cases: this is how users get solutions that are useful for their work in particular and for the company as a whole. The bottom line is to understand how to create the right features, finding a compromise between the needs of the business and different stakeholders within the same enterprise.
General technology stack
Collaboration between development teams is a critical element of successful MLOps. But organizing can be difficult if each team has its own technology stack. With a unified technology stack, developers can standardize and reuse components, features, and tools in models like Lego bricks.
“This makes it easy to combine different features so that developers don’t have to spend time moving from one model or system to another.”
The cloud stack allows you to take advantage of the cloud model of distributed computing. It provides developers with infrastructure on demand, constantly pulling up new features and services. Capital One’s decision to comprehensively migrate to the public cloud has had a significant impact on the speed and efficiency of development. Now the code is released to production much more often, and ML platforms and models can be used in the company many times.
Open-source machine learning tools are the core ingredient of a powerful cloud platform and a unified technology stack. They eliminate the need for companies to spend precious technical resources reinventing the wheel, allowing models to be built and deployed at high speed.
David says that in addition to open-source tools and packages, Capital One also develops and releases its own solutions. For example, to work with data streams that cannot be tracked manually due to their large volume, Capital One created Open-Source data profiling tool. It uses machine learning to identify and protect sensitive data such as bank account and credit card numbers. In addition, Capital One recently released a library rubicon-ml, which helps collect and store information about training and execution of models, supports searching through models and re-executing them. The development and release of its own Open-Source solutions allows you to create flexible ML functions, the purpose of which can be modified both by its employees and other companies. All this makes the company an organic part of the Open-Source community.
Data Availability and Governance High Priority
A typical ML system includes two environments:
- analytical – a data warehouse that users can work with;
- operational – real-time data processing.
For many companies, the latency interval between these environments is a major concern. If data scientists and engineers need near-real-time access to data from the production environment, it is important to set up the necessary control mechanisms.
So ML developers need to provide integration and access to both environments without sacrificing the quality of Governance.
“In an ideal world, companies achieve full integration between data warehouses in production and analytics environments. This provides all the control mechanisms and Governance frameworks that data scientists, engineers and other stakeholders involved in maintaining and developing the model need, ”David explains.
Governance and Governance ML models are equally important. During machine learning, when the source data gradually changes, the models begin to drift. Because of this, engineers need to track monitoring and correct for drift.
MLOps practices help automate the management and training of models and their associated workflows. When a company moves to MLOps, it determines for each machine learning scenario which parameters need to be monitored and how often and how much drift is allowed without retraining the model. After that, she sets up the tools to automatically detect triggers and retrain the models at a selected frequency.
When machine learning first appeared, companies prided themselves on their ability to develop new and unique solutions for different lines of business. But today, they are aiming for well-managed, flexible ML solution scaling that handles constant updates to data sources, ML models, features, pipelines, and numerous other aspects of the ML model lifecycle. With this potential of MLOps in standardized, reproducible and adaptable processes in large-scale ML environments, companies have a bright future for enterprise machine learning.
The VK Cloud team develops ML Platform. It helps to build the process of working with ML models from design to deployment, to control the quality of experiments and models. We give new users a bonus of 3000 rubles for testing.