what they will tell you at DevOops 2024

SLO: what not to do

Sergey Bukharov

Dodo Engineering

Sergey will talk about the formation of SLO at Dodo Engineering: where they started, what they came to, how they adapted book practices to a version of reality, and what came of it. Don’t expect ready-made recipes – this will be a report about a long-distance rake race, in which, first of all, the speaker will share his experience and mistakes made.


Twenty-five again, or How to prevent the incident from happening again

Kirill Borisov

VK

Let's consider the main and most popular methods of root cause analysis: 5 Whys, fishbone diagram, cast. Let's understand the intricacies and features of application. Kirill will compare tools and give recommendations on choosing the right tool depending on the specific situation. Using the example of one incident, we will consider the root causes using the listed methods and see which of them more fully describes the reasons for the incident.

The analysis of incidents must be carried out based on the totality of root causes, looking for intersections in different incidents. Kirill will give practical recommendations on how to approach this process.


Dealing with metastable failure states

Vadim Martynov

Yandex

Rate limiters, product degradations, server and client throttling, congestion control to databases, geodistribution – these are the tools that Vadim encountered to protect against excess load and transition to metastable failure states. They are good and useful, but have their drawbacks.

It is proposed to look at another solution that protects services and databases, does not require manual configuration, and helps to correctly utilize system resources.


DevOps at the factory: expectations vs reality

Ilya Oleksiv

Sibur Digital

Mikhail Fufaev

Sibur Digital

One of the most important stages in setting up the processes of any IT company is creating and debugging the process of deploying its products. But speakers work in a digital factory, and in an industrial company environment, the deployment task is complicated by the fact that deployment environments are heterogeneous, independent, and often lack direct network connectivity. Is it even possible to create an efficient and sustainable release process when traditional DevOps practitioners face such limitations?

They will tell you about the difficult path to a simple and understandable deployment process.


How to distribute trillions of files

Konstantin Lebedev

Mayflower

When using GlusterFS as DFS on volumes of more than 50 million files, Mayflower faced the problem of the impossibility of further maintaining the cluster in a reasonable time. Therefore, we returned to the choice of modern distributed storage, taking into account new requirements and technologies.

At first glance, SeaweedFS looked like a very attractive solution, since it is written in the modern Golang language and designed based on the Warm BLOB design. But it was not completely clear how it would behave in production. Konstantin will tell you what the result was.


Synchronization of production. Speed, reliability and simplicity of the DevOps artery

Vladimir Medin

Sber

Vladimir will tell you how Sber built a simple, reliable, distributed system on a hybrid tech stack, which delivers gigabytes of distributions, Docker images and deployment scripts from the development segment to the production segment in a matter of minutes. It was possible to do this in such a way that users do not even think about its existence, although they previously performed dozens of routine operations and waited up to several days for the results of their work to be delivered to the industrial circuit.


The monitoring is green, but nothing works for users. How to monitor the client side

Daniel Khaliulin

T-Bank

There are legends that the phrase “everything works for me” (c) instantly alleviates the suffering of clients, and sometimes miraculously corrects failures. Be that as it may, to ensure the reliability of modern applications, monitoring only the server part is no longer enough. Due to the general complication, monitoring by clients is increasingly moving from the category of “nice to have” to “must have”.

The report will examine the issues of client monitoring. They will tell you what data is especially important to track, and you will find out what kind of big shots you got in T-Bank, building observability in the main T-Bank mobile application with traffic of more than 25 million unique customers per month.


Using HAProxy to load balance between locations

Maxim Kupriyanov

A report on how to use the well-known open-source load balancing solution (HAProxy) to automatically redistribute the load between several sites during sudden traffic surges.


Zero-downtime deployment and databases

Andrey Tsvetsikh

T-Bank, DevBrothers

Microservices have long been firmly established in our lives. They allow you to implement scalable and fault-tolerant solutions. But when deploying a new version to a cluster, errors sometimes occur related to updating the database.

Andrey will look at popular methods of deploying to a cluster. Shows typical problems that arise when updating a database and ways to solve them. Let's figure out how updating NoSQL databases differs from updating traditional relational databases.


Culture

Mentoring as part of a DevOps culture

Tatiana Serdinova

TAGES

The modern economy is a knowledge economy. Increasing collaboration and data sharing among technical departments in a company is one of the key cultural principles of DevOps.

And here mentoring comes to the rescue, which Tatyana will talk about in detail. You will learn what mentoring is, who mentors are, and who they mentor.


Combo fakapi, or the Butterfly fakap effect

Grigory Koshelev

Circuit

Stories of investigations into fakes caused by chains of unlikely events coupled with a scattering of harmless bugs.


SRE vs ITIL

Andrey Zarubin

Raiffeisen Bank

The purpose of the report is to dispel the hype around SRE on the one hand and the conservatism around ITSM on the other. Andrey will talk about the principles of SRE and basic ITIL practices. How, in his opinion, they should be combined using DevOps CALMS and what the industry is now offering us.


R&D platform. Chapter 1: Getting organized

Maxim Zalysin

Positive Technologies

As in life, before starting a big project you need to put things in order, and sometimes putting things in order is the first step towards results. In his report, Maxim will tell how the DevOps team at Positive Technologies began moving towards creating an “R&D Platform” taking into account requirements, expectations and reality.


Updating infrastructure dependencies without pain: secrets of our DevOps kitchen with Renovate

Vlada Zubareva

Mayflower

Like any DevOps team, Mayflower creates and maintains many Ansible roles, Terraform modules, and its own Docker containers. These components are actively used by various teams of the company to configure the infrastructure. However, updating versions and communicating changes between teams in a timely manner can be a major challenge.

Vlada will tell you how Mayflower organized the management of internal roles and modules, and how the Renovate tool helps automate and simplify the update process on a daily basis, ensuring the stability and consistency of the infrastructure.


We tried Platform Engineering. The prank was a success

Alexander Kozhemyakin

VK

The story is about how the development of platforms was approached from different angles. What are the pitfalls when you have a heterogeneous infrastructure? How to learn yourself and teach others to negotiate technical solutions? Why build a platform?

The answers to these and other questions are in the report.


How to build a Development Platform from scratch in a single company

Sergey Kiselev

MTS Web Services

Sergey and his colleagues are developing MTS Web Services (the new MTS Cloud) and solving issues related to building a unified development culture. The goal is to create a transparent and understandable architecture to reduce the time it takes to onboard new developers. They want to build a solid ecosystem of libraries and approaches for reusing the Cloud in all development teams.

The story will be from the Development Platform and will cover aspects of building everything from scratch. Let's talk about design documents (ADR) and how they are used. We will definitely touch on the topic of internal open source (innersource) and the cultural aspects of its preparation. In conclusion, we will discuss the fight against boilerplate through code generation. All this is in the format of stories, as the new Cloud is being written right now.


Removing damage from development team resources

Alexander Krylov

Bimeister

Let's discuss approaches to solving problems of resource redistribution in teams participating in the development cycle. It would seem, why do this? To free up the resources of some teams and increase the competencies of others with a change in focus to targeted activities.

Alexander will share what obstacles you may encounter on the way to implementing or changing processes, what arguments you can come to terms with resistance, and what profit you can get as a result.


The path from “IT standards” to “technical capabilities”

Evgeniy Kharchenko

Raiffeisen Bank

The story of how DevOps practices were introduced at Raiffeisen Bank, how they transformed from mandatory standards to an engineering culture and subsequently turned into “technical capabilities” with maturity levels, multiple criteria and automated checks in IT on the scale of 258 teams employing about 3,700 IT specialists.

The report touches on issues of engineering culture and motivation of engineers and teams to develop in this direction, and also offers a solution to the problem of implementing and measuring technical practices in the enterprise.


Safety

Vulnerabilities as data streams

Yulia Volkova

CodeScoring

A report on how the world of vulnerabilities works from a data point of view. Julia will talk about NVD, FSTEC, GitHub Advisory, OSV, newsletters, and how they all live in a single (not always) life cycle.

Why can't we just magically create one tool for all systems and languages. Why do different tools sometimes produce different results, what do PURL and CPE have to do with it.


DSM – BPF For the Little Ones

Lev Khakimov

MTS Web Services

From year to year, more and more network solutions based on BPF and eBPF appear: the development of Cilium, the transition of Calico to eBPF, the emergence of Service Mesh solutions based on this technology. For most engineers, this was a transition from the classic network stack to the magical “black box”. Today we will lift the veil on this technology and understand how popular networking solutions work.


DevSecOps for an hour

Andrey Moiseev

MTS Web Services

In a company, it often happens that you are a one-man team and it is necessary to ensure the security of software development. During the report, Andrey will analyze the basic pipeline for checking software for security. We will use and customize GitLab security templates as a pipeline. Let's look at how to quickly build a minimal DevSecOps pipeline, apply the practices of SCA, SAST, secret management and think about what we will have to do with it next.


Features of certificate management in container environments

Anna Archer

Clearway Integration

To provide a secure communication channel and reliable authentication, certificates are needed. And failure to update at least one certificate in a timely manner can lead to serious failures. In container environments, where certificates can appear in the thousands per day, automation is essential.

In the report, we will look at the sensational failures due to problems with certificates and how we learned from the mistakes of others to manage millions of certificates without failures.


We patch flaws in application images before, during and after runtime

Anatoly Karpenko

Luntry

The usual situation is that you only received the image itself (provided by the vendor, legacy or open source). You scanned it and – “surprise, surprise” – it turned out that it does not comply with best security practices at all: a large number of vulnerabilities, misconfigurations, hard-coded secrets.

And you will have to work with this image, and the project source files and Dockerfile are not available. This is sad! But we will make sure that the image is safe to use.

Let's make changes at the level of the image itself, applying layer modifications using docker-squash, mint, etc. Let's tweak the runtime at the operating system and Kubernetes level: AppArmor, capabilities, privilege management and other “handles”. Let's consider observing the anomalous behavior of containers in runtime: Falco, NeuVector.


GOSTBUSTERS. How to now prepare static analysis after GOST R 71207-2024

Anton Tretyakov

PVS-Studio

In the first half of the 21st century, it turns out that not only ordinary jobs live in pipelines, but also… ghosts. Loaded clusters cannot withstand the onslaught of the supernatural.

But if we move away from references to the famous film, then in the report we will talk about GOST R 71207-2024. There will be theoretical and practical sections. Let's look at what is written in the document, and then at how this is reflected in practice.

The main topics are:

  • How static analysis is defined in GOST.

  • Examples of code with errors according to GOST.

  • How to implement static analysis according to GOST.

  • An example of implementing static analysis according to GOST.


Gentle migration and adaptation of the project in the cloud

Anton Chernousov

Yandex Cloud

In the report we will look at several successful moves/arrivals to the cloud. Let's discuss the stages of migration and adaptation of IT infrastructure in the cloud.

We will touch on the issues of preparation, audit, development of a migration plan and discuss the roadmap. Let's touch on aspects of information security and measures to ensure business continuity during migration.


Back to Basics. Certificates, TLS and mutual authentication of services

Anna Archer

Clearway Integration

Many people use certificates out of choice or for security reasons, but not everyone understands how certificates actually work. During the report, we will look at the basics of how certificates work, cryptographic algorithms and protocols that use certificates. Let's discuss how to avoid basic mistakes when setting up mutual authentication (mTLS) of containers.


Is it possible to access services securely?

Georg Gaal

AEnix

Alexey Fedulaev

MTS Web Services

What is Privileged Access Management (PAM) and secure access to various services. Is this necessary? What solutions are on the market now and how they compare. Why you should use one of them, and not use Ansible playbook to configure servers and users.

The report will show what you can do well and not spend your whole life on it or sell your soul to the devil.


DexExp

MS-DOS Shells: Beyond Norton Commander

Dmitry Moiseev

Circuit

For many, MS-DOS is still associated with a black background, the command line and incomprehensible commands, while the revolutionary macOS and Windows are associated with the advent of convenient user interfaces. But in reality, working under MS-DOS very quickly became convenient thanks to shells and file managers, the most famous of which is Norton Commander. The most famous – but not the only one! And in this report we will look at what else was interesting and unexpected on the market for similar products.


Platforms and other toys for adults

Vasily Kutsenko

Pochtatech

Building your platform is a natural development of the DevOps culture. In his report, Vasily will tell you how Pochtatekh approached the development of its platform (spoiler – in two steps), what tasks it should solve and how these goals are achievable.


Decomposing GitOps. How to Upgrade Your CIOps to GitOps with Minimal Effort

Oleg Voznesensky

VK Tech

Let's discuss the essence of the GitOps approach, its pitfalls, and make our own GitOps implementation from scratch using available tools.


Back to Basics: OOM Killer. Survival Basics

Alexey Tsykunov

Hilbert Team

As part of the report, we will analyze how memory works in Linux and why the OOM (Out Of Memory) situation occurs. You will learn how OOM Killer selects processes to terminate, how to avoid its “visit” and maintain system stability. We will also discuss how OOM Killer is used in Kubernetes.


Our Never-Ending Journey of GitOps Transformation with Flux CD

Tung Nan Kwong

TalkHub

The report is dedicated to how the speaker's company switched to GitOps over the years. Challenges faced, important lessons learned and plans for the future. Of course, this affected the workload in production, but in the long run it was worth it.


How Much Is the Fish

Andrey Sukhorukov

Kaspersky

In pursuit of automation, we have stopped asking a number of questions that affect business. This report is a study that is designed to answer the question of how much a devops “head” really costs.

During the presentation, the “toxic tech director” will present a probable case of “destruction” of a competitor company with calculations and carried out scenarios of an attack on target engineers.


K8s

Java, Spring Boot and Kubernetes: how to speed up application startup and save cluster resources

Alexey Ignatov

SberTech

Java is a convenient language for developing business applications. The Spring Boot framework is still popular and used by many developers. The nature of Spring Boot and the JVM creates some challenges when used in a Kubernetes environment. You have to choose between slow application startup and increased resource usage. The report will tell you how to speed up the start of Java applications in Kubernetes and save cluster resources.


Launching a cloud product in Kubernetes on the developer’s laptop, in production and on the client’s hardware

Alexander Shinkarev

Tourmaline Core

Without fear, we will launch locally… a microservice product that will be deployed to Kubernetes in production.

It will be useful for those who struggle with debugging and running microservices on their computer. A method that works on small and medium-sized products. Let's discuss when this is appropriate, what restrictions and requirements there are, which bigwigs in the speaker's company were allowed to play the jam. We will connect the deployment in production and locally.

All approaches and examples that will be shown to you will be publicly available in repositories on GitHub. You can simply take and start new projects on these rails.


4 Ways to Detect Node Failures in Kubernetes: Current Workload Recovery Strategies

Dmitry Rybalka

Cooper (ex-SberMarket)

The failure of a worker node in a Kubernetes cluster is always an unpredictable event, with varying impacts on the workload.

Dmitry will tell you how to make such situations not just less stressful, but also as manageable as possible.

Consider:

  • How Kubernetes detects node failures. What can you do to improve this process?

  • Node-problem-detector (NPD) and the possibilities of its customization.

  • Alternatives to NPD: Their Strengths and Weaknesses.

  • Failure domain-aware load placement planning strategies to minimize affect.


Governance as a Code

Maxim Chudnovsky

SberTech

Alexander Kozlov

SberTech

Let's consider the Governance as a Code approach. What solutions already exist and how can you manage the configurations of a large number of microservices in a multi-cluster environment.

The report is intended for practicing engineers who are familiar with cloud infrastructure and the phenomenon of Service Mesh.


Reproducible bare metal environments using Talos Linux and Cozystack

Georg Gaal

AEnix

A fascinating story about how AEnix came to Talos Linux and what it gave.

They are developing Cozystack, an open-source platform for cloud providers that runs virtual machines, Kubernetes on Kubernetes, and managed services. The main platform for them is bare metal. Despite the fact that each server has distinctive features, the company strives to ensure the stability of the platform and each of its components.

Georg will share his experience: he will tell you exactly how it works, the problems encountered during development, and the solutions found.


Cloud technologies

Infrastructure from Code: the next stage of IaC development using the example of Serverless

Victor Kuzenny

Yandex Cloud

In the report we will look in more detail at what IfC is, what its advantages and disadvantages are, as well as how it differs from IaC and how it complements it. Using the example of one of the frameworks and serverless computing ecosystem Yandex Cloud, let's see how IfC helps developers create applications based on Serverless faster and more efficiently.


Expanding the capabilities of the Cluster API: how to write your own infra provider and not go crazy

Ivan Gulakov

MTS Web Services

Ivan will tell you how he collected the best results while writing his infra-provider for managing hybrid infrastructure.

During the report we will cover the following topics:

  • What is an infra provider from the inside?

  • Business issues and how bare metal turned into a hybrid.

  • Why dragging too much business logic into a provider is bad, or how to make your own small monolithic operator.

  • How hype immutability hit the forehead with a rake.


Adventures with Envoy: how to build your Service Mesh and not step on a rake

Denis Zolotarev

Yandex Plus Fantech

Denis will tell you how Yandex is building a Service Mesh based on Envoy as the base layer of interservice interaction.
They have come a long way from a small startup within Plus to the infrastructure level of the entire company. Let's briefly talk about the theory and standard architecture of Service Mesh; we will devote most of our attention and time to solving practical problems using Envoy and unobvious problems that may lie in wait along the way. The speaker will show examples of code, graphs, and fatal errors in production. He will tell you how to protect yourself from such errors in your own projects.


Creation and management of infrastructure for developers. Terraform CDK

Anton Ermak

Independent expert

Let's talk about using Infrastructure as Code within the Terraform CDK. As part of the report, we will consider the general idea of ​​​​the applicability of this approach, the pros and cons. Using examples, we will create entire architectural infrastructure patterns and discuss how they are beautifully expressed in languages: through classes, objects, variables.


Other

Enabling teams in DevOps

What is the structure of enabling commands, what are their methods of interaction and how to avoid mistakes when forming them? As part of the discussion, we will discuss the first experience of launching enabling teams in well-known companies, the history of the emergence of such teams, their composition, skills and roles, differences from other teams, activities and interactions, successful and unsuccessful cases, development plans.


Lightning Talks

Try yourself as a speaker and talk about everything that worries you right at the conference.

Give a short presentation on a free topic in any format. Each participant will have 20 minutes to share their stories. Sign up for a performance right on site!

Please note: only participants of the offline part of the conference can speak. There will be no video recording.


Conclusion

We've dealt with the reports – let's finally deal with the rest:

  • The conference has a non-standard format. The first day (November 6) is online, but November 12-13 is up to the participant to choose from: you can come to the conference in person in St. Petersburg, or you can connect remotely.

  • Of course, the conference is not limited to presentations: there will probably be a lot of communication between participants offline. But this can no longer be described in a habrapost, everything is in your hands.

The remaining information about the conference (such as the schedule) is on the official websitetickets are there.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *