The Future of DevOps Engineer

In recent years, many different events have happened, and we, as a large organization, have felt everything ourselves. We had to solve all sorts of problems very quickly. I want to share our experience and the conclusions we have made, which may seem controversial to some, inappropriate to others, and very important to others.

Historical background

We analyzed various studies over a long period (mostly from companies DORA and Puppet), and here's what we found out in our “archeoIT excavations”.

The term “DevOps” itself came into common usage at the DevOps Days conference in 2009. And in 2011, skills that defined the functions and tasks of this profession began to appear in job postings on LinkedIn, and people began actively searching for this word. In 2013, there were no DevOps or SREs yet, because 75% of respondents considered themselves admins and engineers, 17% – IT managers, and 8% – IT consultants. And already the following year, the first specialists who defined themselves as DevOps began to appear. Over the years, their number increased. In 2018, the population peaked, and then began a decline that continues to this day.

Previously, studies wrote that DevOps deals with automation, CI/CD tools, monitoring, logging, and other tasks. But since 2018, the role of SRE has appeared in Puppet and DORA reports. At the same time, Google, which by that time had bought DORA, immediately explains that it will not separate DevOps and SRE, because these are complementary concepts, and will write them like this: DevOps/SRE. Let me remind you that SRE is a person who must own everything that DevOps owns, and at the same time have development skills, and be more involved in the reliability, availability, and scalability of systems.

In 2020, another role appeared – platform engineer. Note how the graphs correlate:

Perhaps they will come together in the future. Although it is unclear how soon this will happen.

Requirements for a platform engineer include all the skills of DevOps and SRE, as well as the ability to transfer some narrow solutions to public platforms to make it easier, more accessible and more convenient for all participants.

Trends

Probably everyone is familiar with the Gartner hype graph. I highlighted the key trends that had the greatest impact on DevOps.

Many of the practices you are familiar with are already close to the productivity plateau, they are probably implemented in your companies and are bringing benefits. SRE is no longer at its peak, it is beginning to slide into the valley of disappointment, but I am sure that this role will also reach a plateau. And at its peak now SLSA — a new set of security practices that take into account what is happening in the world. They are being approached platform engineering And AI/ML. GitOps still just at the beginning of its journey.

Now let's look at how these trends impact DevOps today and in the near future.

AI assistants

This technology is now being talked about on every corner and great hopes are placed on it. Here is what major companies are offering today:

These assistants are already helping to lower the entry threshold and write code faster. We also have a similar development – GigaCode. Here's what it is used for today:

Here's an assessment of how much AI assistants can currently help DevOps/SRE work:

The set of functions here is very general, in your company it may differ in some way, but on average it will be like this.

As for the prospective use of AI assistants in DevOps/SRE, the overall picture is as follows:

We face these tasks every day, and “co-pilots” can save us a lot of time on routine tasks.

Cognitive load

The industry quickly began to deviate from the initial expectations of DevOps ideas:

  • DevOps has become a job title when it should be a culture.

  • Instead of creating synergies, many organizations simply shifted the workload to developers or DevOps engineers.

  • DevOps theory has not been implemented in practice.

Many see this as the key problem – high cognitive load. And the higher it is, the faster burnout sets in, people become disillusioned with their work. What is the reason for high cognitive load? People are forced to do something in their work that is very weakly connected to each other, poorly described, unclear and complex. And all this has to be built into some kind of end-to-end process that will give a certain result.

Cognitive load can be external and internal. The first is related to the presentation of information, the second – to its complexity. Load is studied through physiological reactions, behavioral patterns and subjective surveys.

Daniel Bryant presented this chart at PlatformCon 2022, showing how the cognitive load on a developer has increased over the years:

There are so many tools available today, everyone is moving to cloud computing, and along the way, complex products like Terraform, Kubernetes, and others are becoming popular. As a result, your work toolbox today may well look like this:

Even for the simplest tasks, a developer has to master many very complex tools. And here I want to share our experience.

Platform Engineering

In our company, DevOps has also become a separate profession (or a dedicated team). The figure shows the main team topologies that have developed in our company, and they all correspond to anti-patterns – how not to do it. In addition, security also throws up interesting promising practices, being outside of teams altogether.

What to do? Go into platform engineering. And you need to start with the three pillars on which any platform rests:

  1. Improve the developer experience by building an internal developer platform.

  2. The platform is Not a set of tools and approaches.

  3. The platform is a kind of unifying, end-to-end solution that allows you to avoid diving into an abundance of tools. A single product that connects the entire SDLC.

Internal development platform

This concept has emerged as a trend, as a branch of platform engineering. The sum of all the technologies and tools that the platform ties together is designed to simplify the customer journey and give developers simple self-service capabilities. There are a lot of companies moving in this direction right now.

What is not a platform?

  • PaaS-like solutions that have the “DevOps” prefix, such as Heroku. That is, anything that does not implement the entire software life cycle.

  • Catalogs of services, orders, accesses and the like. These are just interfaces to platforms.

  • Anything that is available out of the box, even if it is called a platform. Any platform is created for the specifics of a particular company; there are no universal options.

If done correctly and well, the platform will drive massive increases in productivity and development speed, creating a more efficient working environment for developers and operations teams.

Main functions of the platform:

  • infrastructure orchestration;

  • application configuration;

  • deployment management;

  • environment management;

  • role model management.

We started from reference architecture diagram from Humanitec:

The Developer Control Plane is the developer's workplace. They have an IDE, API, AI assistants, a portal for interacting with colleagues. They don't need to go anywhere else. All they need is for it all to start working: resources, stands, access, etc. to appear. That is, the key to this scheme is the orchestrator, which can take instructions from a very high level of abstraction about what the developer wants and interpret them for the entire complexity of the underlying tools and services.

Platform Engineering Team

A successful platform engineering team requires a full DevOps/SRE skill set and deep knowledge of systems integration. It focuses on standardization, automation, and self-service capabilities, and lowers the barrier to entry for tools. In other words, the team’s job is to navigate the golden cages:

  • optimize work processes;

  • expand logging and performance monitoring, search for potential problems;

  • visualize workflows to improve platform architecture;

  • work closely with product owners;

  • increase the speed of integrations;

  • solve common problems;

  • combine all tools into a seamless customer journey;

  • train employees.

How we see it at Sber:

This is the team topology in the platform development concept. We strive to free product teams from DevOps specialists so that they can create value and earn money with their products. And all the complexity should be hidden under the internal development platform. Product teams are currently responsible for CD, and to reduce the cognitive load, they are forced to make highly specialized, point solutions. But this is a dead end if the tools are centralized: they can change and be supplemented with new ones. We believe that DevOps teams can participate in the development of the platform through the community in conjunction with the internal platform team, bring their solutions there so that they have a future.

ML engineering

Above, I showed how AI assistants help save the developer's time. In almost all cases, the platform frees us from solving complex routine tasks even more effectively. What should a DevOps/SRE specialist spend this time on? Obviously, they can help develop the platform. But Sber is very actively using machine learning, almost all of our products are already supplied with models. And we understand that somewhere nearby, a whole separate methodology for their delivery is being born, which will differ from classic DevOps. To train and further train models, data and skills to work with them are needed. At the same time, there are practically no engineers on the market who can work with regular distributions and models. At the same time, ML engineers are in great demand at Sber.

Here, the additional functions that are currently being born and within which knowledge needs to be built up are highlighted in green.

To sum it up: a DevOps/SRE specialist can develop a platform, can learn how to roll out models correctly. But the platform must somehow implement all this, so another layer of abstraction must appear in it:

In closing, I will reiterate our key findings regarding the future of DevOps:

  • Platforms and AI assistants reduce the cognitive load and the amount of routine work for DevOps/SRE specialists. The freed time can be spent on developing new practices and gaining knowledge.

  • Machine learning models are starting to be implemented everywhere along with products and become an integral part of them. We need to learn how to roll them out correctly.

  • There are two promising areas for a future DevOps/SRE specialist: platform engineer and ML engineer.

  • Working with models within the platform requires new functionality, and to do this, platform teams need machine learning skills.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *