Why am I outraged by the habrapost for 75 minutes, or are you hiring DevOps incorrectly?

Quite often I come across a significant problem that many companies make at the level of almost all departments when hiring DevOps. It is multi-level and affects both the interaction and areas of responsibility of the HR department and the hiring team, as well as the building of general processes within the company, for example, an up-to-date knowledge base and openness of future plans to leads.

My name is Andrey Sukhorukov, I DevOps TeamLead at Kaspersky Lab. I have been in IT for a total of more than 11 years and during my career I have very often tried on the role of an anti-crisis leader, resolving major problems in large projects in almost all business sectors: metallurgy, banks, government agencies; was an outsourced consultant. And also – he hired and, accordingly, interviewed. Therefore, I know very well what I’m talking about.

Leave the terms to universities (or better yet, forget them altogether)

As if, in order to check the performance of a future colleague, it is enough to simply ask a few questions

from this habra article for 75 minutes of reading

. In this way, it is quite easy to determine the competence and professionalism of an engineer, because this is such a basis, without an understanding of which not a single process of deployment, let alone building an architecture, is possible. I was able to answer what SDN and ICMP are, what file systems there are and what are the disadvantages of a monolithic architecture – we send an offer, no – “unfortunately, you are not suitable for us.” All.

In general, I have nothing against this hiring practice; it is great for selecting… trainees and juniors to find out whether they even took classes at university. The skills and experience of a specialist at the middle level and above do not correlate in any way with the knowledge of these terms: they carry neither practical nor semantic load.

Learning these terms, like TCP/IP or DMA, is definitely an important learning process. Without it, it will be quite difficult to understand how networks, file storage, servers or operating systems work. However, academic definitions are unlikely to be useful in daily work, which means they will be forgotten as unnecessary. The senor may well not answer the questions – he has been learning the terminology for a long time. Does this mean that he “bought” the grade? No, of course not.

Personally, I would be interested to know whether the candidate can decrypt the traffic dump. This will be much more practical: what is the point if a person knows what TCP/IP is, but cannot say what is written in a large piece of dump?

And in general, it is wrong to start from knowledge of academic terms: the interview should be based on the needs and tasks of the team. What tools does it work with, what and how does it automate. The senior's practical experience hardly comes into contact with the list of these terms. In addition, if the interview begins with such questions, then mutual understanding between the company and a candidate with extensive experience is generally difficult: the senior will not be able to assess the state of the company and the scale of the upcoming tasks, and the company will rather leave a negative impression.

Does this checklist make any sense then? Yes, I have. When you need to hire a person at the start – an intern or a junior. Here academic knowledge will make sense: at a minimum, it will show a person’s interest in development and understanding of the structure of modern technologies. But this selection method is not suitable for already established specialists – you just can't understand each other.

Interview quality = employee quality

How do you know if the interview went well? It is generally accepted that all parties liked each other. But for engineering majors, there is another question that everyone needs to ask to evaluate the interview: was the interview technically correct? For example: the interview takes place according to the checklist mentioned above and the applicant, having previously learned everything, easily receives an offer. And, on the one hand, since he answered these basic questions, it means that he is suitable for the position and in general is a great guy, but on the other hand, having memorized all these concepts, he can come to work and break the job. Then the obvious question arises: how did we conduct the interview like this, if everything was fine during the interview, but in reality the person comes, joyful, and immediately all the servers go down?

This brings us to what a qualitative interview is. This joint the work of a recruiter and a lead (or senior) whose team is being hired. The technical expert here knows how to correctly compose and ask technical questions, come up with a practical case, and evaluate whether the candidate is suitable for hard skills. An HR specialist, taking into account the customer’s wishes regarding a future colleague, must competently select a candidate. A good interview should include the right questions to analyze the person's thinking and assess whether he can cope with our practice, as well as professional and, if necessary, psychological characteristics to understand whether the candidate will be a good fit for a particular team.

Here it is worth taking a closer look at the recruitment process. My fairly extensive experience in searching for employees allowed me to form my own view of what the interaction between HR and the lead should be. There is no single way to select DevOps, this is due to the specifics of their work and the set of tools used: in some projects the emphasis is on Windows, and in others on Linux; In some places orchestration is used, in others it is not; In some places there is an emphasis on cloud systems, and in others they work with hardware.

Prescreen questions should be generated for each stack separately. Therefore, the recruiter must, firstly, correctly understand the requirements for the position and select the right candidate based on his resume, and secondly, be able to identify the real knowledge and skills of the candidate: you can write anything you want on a resume.
And in order for the recruiter to perform his part of the tasks efficiently, the same joint work must be carried out. Before we start running around the market and looking for people, we need to create:

  1. Competency matrixthat we want to see in a person. This should be a paper document from which the entire further hiring process will be based.
  2. Report template: Both the hirer and the recruiter must understand how they will provide each other with a report on the candidate.

The last point is especially important: it is what guarantees the transparency of the process. Argumentation in the style of “we took this one because it suited us, but we didn’t take that one because it wasn’t very good,” from the outside looks strange, to put it mildly. Especially for the manager above, who allocates colossal amounts of money to establish this process.

In the case of standardization of technical interviews, logging communications also improves the quality of hiring: different teams can exchange information about the candidate (and in a reporting form this is much more convenient), and if he does not fit into one team, for example, due to soft skills, he can be taken by another . By the way, the recruiter should write about these “soft skills”, in all their manifestations, in the report, especially if during communication some aspects of the applicant’s behavior seemed strange to him.

What is all this for? Let’s imagine a situation where a recruiter successfully pre-screened a candidate, his resume was quite good, so he was invited for an interview. There he answered all the questions correctly, because he had the knowledge. However, the lead did not receive any feedback from the recruiter about his incorrect behavior on the prescreen, and during the interview the candidate managed to show his knowledge in those areas in which he is strong and which the project needs, but at the same time he behaved differently. As a result, the candidate only had to “behave well and respectably” to receive an offer. This is where problems begin for the entire company, because it has already spent a lot of money and time. The person “proved” himself only after a couple of weeks, and a month later the entire department was howling with reluctance to work with such a newcomer. And not only should such a person be denied a job: it will be necessary to check everything that he managed to do during this time, and a lot of time has already passed.

What do we gain from this shift in responsibilities? The team worked for some time in an extremely unfavorable atmosphere. Instead of useful work, DevOps time is spent on reviewing the “work” of the toxic. HR and accounting have already sent all his documents to departments, the company needs to start the search process all over again, and there are still a lot of problems. Of course, I described an extreme situation above; the consequences will not always be so severe, but the working atmosphere in a team directly depends on the compatibility of colleagues and directly affects its productivity. Therefore, it is very important to competently structure the processes of both communication between the recruiter and the lead, and the approach to recruitment in general, in particular logging.

On the other hand, there may be exceptions, which often depend on the lead and the situation. For example, I once had a person who was very difficult to compatible with everyone else, but he was a very strong senior and technician. I understood that I, as a team lead, could take him on, but then I would have to isolate him from interaction with his colleagues. And it worked: I fenced him off and kept him in a certain information vacuum, did not allow him to show his toxicity towards others (because I made all the contacts), but at the same time he worked much faster than any other average senior. But this is still an exception, and, in an amicable way, it is worth selecting people who are psychologically compatible with the team, with the company, who will be more stable.

How should a DevOps technical interview be conducted?

Livecoding is also not a panacea. Firstly, it is more suitable for programmers. Secondly, this is a small task in which we want to see how a person writes some piece of the algorithm. Is this an objectively stressful situation? Well, in general, no. Because, as a rule, this piece of code is a simple hypothetical problem that just needs to be solved. As a rule, it is not followed by questions about the consequences of using this algorithm. We ask about the reasons for choosing a solution.

I won’t even look at how quickly the engineer’s code will roll out something, because within the framework of automation this does not matter; in DevOps, the main thing is that everything works. I can ask why the candidate chose this or that algorithm, why he thinks it is better, whether it can speed up the process.

However, I find mental investigative tasks much more effective. I need to make sure that a person can establish a connection between the effect and the cause, and investigate them. This way I can understand how optimally he will be able to move on to the problem in production: whether he knows how to use Google in a trivial way. Through this task, I run his thinking algorithm, evaluate how he follows this algorithm and whether he assumes what consequences there may be.

For clarity, I will give an example from personal practice: there are two chair pods in Kubernetes, they communicate via the Internet, and suddenly one pod started receiving some kind of nginx error. Question: why is this happening and how will you localize it? That is, in one problem I give an unusual situation: these two pods communicate with each other not inside Kubernetes, but via the Internet. This means that I want to see if it will consider different ingresses, balancers. I initially create some non-standard situation to see how a person will investigate the situation, what he will assume. And this is not an academic question of the “tell me how it works” kind of question, this is precisely a task to think about. But academic knowledge is also tested here: by thinking out loud, he will show what he knows about Kubernetes, traffic, ingress, etc.

Experience hurts the expert

Many people, after implementing something for the first time in their lives, begin to think that this is the only correct way. This happens, among other things, because the people who came after them do not say how bad and painful this decision looks from the outside. The biggest mistake an expert can make is relying on subjective experience.

In our IT sphere, there is a certain paradox in this sense: it seems that we are all doing the same thing, but in fact, in every company everything is always new. Even despite the identical technology stack, the implementation of solutions may be so different that you will need to rebuild all your experience, all your knowledge for a different solution, because yours will not work here. If a person does not understand this and tries to implement his solution (after all, it was successful!) and suddenly for some reason something stops working, then a colossal waste of time, money and a very big headache begins.

In general, every year you need to ask yourself the same question, regardless of the place of work and the project: is my experience really applicable in the same case that I did before? More often than not, no longer. Individual solutions, maybe, but not a complete one. Especially in other companies that build their product differently, they have a different implementation of working with queues, event models, customer data, although the technologies are most likely the same. It’s commonplace that a new employer may not have enough money to support your decision if you decide to go to a startup, for example.

Another paradox arises: we need an experienced employee, but his experience will interfere with our and his work. There is a solution: even if this is a lord, you cannot let him go free and give him, as they say, a flag in his hands. The company must have a system of counterbalances. The newly arrived senior should have the opportunity to prove that his solution will be better, but for this he will need to immerse himself in the existing system and provide the architects with his calculations. In general, he must understand on the ruins of which he is going to build something new and whether it is necessary at all.

In my practice, it often happened that some brave guys came to companies and said, “Right now I’ll do everything for you here.” Then I came. I looked at it all. I was scratching my turnips. And they understood that all this was painful and expensive and that everything would have to be redone, because their ideas led to huge problems, non-stop falls, and conflicts. It would be simpler and cheaper not to start all these rearrangements at all.

Therefore, the next stage of competent hiring is the high-quality integration of a new colleague. He needs to show all the processes, explain the traditions, tell how and why it is pleasant to do this and not otherwise. If this is not done, a person can go looking for information himself, and all guides, as a rule, are at the level of “they wrote it 10 years ago and forgot it,” and since then everything has changed twenty times. Obviously, he will draw the wrong conclusion based on this information. Next, you will have to contact a huge number of people who have a scattered pool of information. This process will be long, and it is not even a fact that the information provided will be correct.

Therefore, it is necessary for the person to be integrated. Such integration will also be useful for the company itself: if in order to integrate one person he needs to meet with twenty more, then there is clearly something wrong with the knowledge base. In addition, if you do not have the habit of updating your databases in a timely manner, then at one point you will become confused about what was done where and how. Especially if the person who developed some feature left. This does not mean that the knowledge base must always be kept super up-to-date, but some fundamental things, such as infrastructure descriptions, HLAs, deployment diagrams, network cards, must be updated when changes occur. There must also be a communication matrix diagram – who to contact on what issue. Based on this, you can and should ask questions about operational details, but this is already enough to understand where and why the application may fail. The result is a double check: the company, on the one hand, checks the candidate, and on the other hand, the hiring process checks the company itself: how communication is established within it, whether the knowledge base is supported, whether the integration processes and counterbalance systems of “brave” seniors who are ready to deliver the company is upside down.

Understanding the company's skills needs

Why are we even starting to expand the team? There will always be one answer – we do not have enough workers, because there has been a sharp increase in work. But the work has both increased and decreased: with the help of new people, for example, everything has been automated. The question arises: what to do with these people now, since to support the project we now need not 50 people, but 25? Load them with some unnecessary tasks for the company (and themselves)? Often, by the way, this is what happens: the team completes its task and then sits idle and without any prospects, because the plans have not been decided or it seems more profitable to the management to buy someone else’s solution than to develop their own: we have dissected all this

right here

.

In order to avoid such a situation, preliminary joint work is needed between the lead and his manager (or even higher), that is, the person who is responsible for technical and product planning in the company and who knows what will happen next in the company, what features the business has planned. For example, after the system is developed and implemented, they plan to develop it into MLOps or DataLake. And so we recruited people dedicated to automation, they successfully completed their task, but for subsequent plans their qualifications are not enough, and it is not a fact that they will be able to obtain this qualification. And business wants to release something big, cool and scary next year.

I will make a reservation that this approach should not always be used: for example, it is not necessary to consider candidates recruited for product teams with a reserve for other projects; for outsourcers or integrators, such an approach is also not necessary. And startups have no use for this approach.

However, for large corporations it is simply imperative to consider candidates, taking into account their additional experience or ability to refocus from automation to the same MLOps. Therefore, the company’s future plans should be available to leads, who will create a competency matrix for candidates. Openness of plans (and their existence in general) is a mandatory principle of the company. Hiring people and then firing them – well, that’s it. At Kaspersky Lab, by the way, they are sympathetic to the desire of employees to move from one project to another, so if you want to work with both Linux and our own microkernel KasperskyOS, – welcome to our team. Moreover, you can get to us in just 2-3 days.

By the way, perhaps I will be the one who will piss you off 🙂

Conclusion

So what is the final algorithm?

  1. The lead goes “up” and learns about the company’s future plans.
  2. The recruiter and the expert together create a portrait (based on hard and soft skills) of the candidate and a report template.
  3. During the interview, the candidate is tested for his thinking and ability to solve atypical problems (and in no case is he asked for the definition of TCP/IP (!!!)).
  4. The hirer and HR exchange opinions about the interview in a report form.
  5. If everyone is happy with everything, the new colleague is integrated into the company, giving him access to the current knowledge base and introducing him to all the processes and their reasons.

In general, the recipe for competent DevOps hiring is actually simple: this

“system of weights and measures”

. When we continually place responsibility on one person, forgetting that he is not an expert in everything, then we ourselves violate our system, forcing people to do more and do it poorly. This concerns the interaction between the recruiter and the team lead, and this concerns the building of processes within the company.

Most often, there is no clear division of areas of responsibility, and when they are not spelled out, big problems and high costs begin. Therefore, to summarize all of the above, it is necessary to create a “system of weights and measures”. Everyone must understand that not one person, but a huge number of people, is responsible for any process, therefore the process must be formalized, understandable, the boundaries of the division of responsibility must be spelled out and how conflicts will be resolved.

The author lives here -> tg @happy_devops

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *