What is a staff engineer?
Hello! My name is Dima Salakhutdinov, I am a principal engineer at Cooper and the author of the tg channel “Staff Engineer”. In our company, this is one of the grades of the technical development branch of engineers, which we collectively call “Staff Engineer”.
I believe that in the IT industry there is a great demand for formalizing career growth beyond the senior grade, especially in large companies. Confirmation of this is the hot round table “Is there life after senor?» from the Dump-2024 conference. Participation in it as a viewer (and sometimes a commentator), as well as the standardization of the role of a staff engineer in Cooper, led me to write an article based on generalized experience of a dozen of my colleagues, staff engineers from Cooper, with whom I interviewedinspired by the second part of Will Larsson's book.
The purpose of the article is give the senior developer a general idea of the role of a staff engineeras one of the areas of career growth. And also give practical advice on what to upgrade, in case the described difficulties do not deter you.
The article will consist of two parts, in this part we will look at what staff engineers do and what awaits you in this role. Let's get started!
Terminology
In short, a staff engineer is a senior senior who works individually (individual contributor). Strong technician with great influence on the technological direction of the company. In addition to deep technical expertise, he practices and has the skills to manage large projects. He may not have administrative powers, but he drags projects through experience, charisma and a combination of technical and managerial skills.
First, let's define the terminology; it varies from company to company:
Unit — a structural association of several (2-4) teams, under the leadership of a unit lead.
Domain (development management) – association of several units. Technical management of the domain is carried out by a manager in the position of EM (engineering manager). Domains are the largest unit of division of the IT department of a large company.
Staff engineers have several gradations, depending on the scale of activity and sphere of influence:
A staff engineer usually works at the unit level (several teams), structurally subordinate to the unit lead.
principal engineer – at the domain level (several units) under the guidance of EM (engineering manager).
fellow engineer – reports to the company's CTO, no one has ever seen him 🙂
For simplicity, I will generally call all grades of staff engineers (staff, principal, fellow) from the yellow rectangle simply staff engineers, staff engineers, or simply staff.
What do staff engineers do?
A staff engineer does not have a permanent team or people under him, although he may often work with one or more groups of engineers on projects. He can act as a technical lead or be a technical consultant on a project. There are also options where he is connected to the project at an early stage to give the team a kick-start.
Hierarchically, the staff reports to the head of the department, being on the same level with EM/unit leads/team leads. In this position, he has the opportunity to build direct relationships with technical management and has an appropriate level of influence on the project or product. He participates in large projects or initiates them. And if there's a fire and the team can't pull a person out of the planning cycle, he can come in and fix thorny development problems with his own hands. As a last resort, he can replace the developer so that the team’s work does not critically sag at a critical moment.
Below we will consider potential areas of work for staff engineers. Despite their diversity, they are united by a common goal – solve large-scale technical problems in the interests of businesscurrent or for the future.
Architectural track
Development of the architecture of existing and planned accountable systems is one of the key activities of a staff engineer. Based on the needs or processes built in the domain, the staff acts as a consultant when drawing up an architectural solution by the development team. He can be fully involved in this process if the task is large-scale and requires in-depth research.
The joint goal of the staff and the team is to reach a consensus on the functionality of the proposed solution, or point out its shortcomings in the long term. Very often this means finding a compromise between solving a business problem and the right architecture: somewhere you can cut corners and come back to it later, somewhere you can adapt the requirements to the technology.
Staff will help in the early stages highlight potential difficulties and implementation nuances, even before we went to a specific development team with a plan. This can be called a very early assessment consultation of potential effort (time and resources).
The staff engineer conducts a final review of the architectural solution before the team goes to the architectural committee and accompanies the team in the process of passing it.
You cannot do without knowledge in architecture, patterns and antipatterns, observation, mistakes and reflection on them.
Representative function
Staff has the best understanding of the technical structure of their domain and is the central point of competence and awareness on most technical issues. I am ready to consult myself or connect you with the right specialist.
Our system is already so large that no one can imagine it in its entirety. And even just providing advice and sharing knowledge about the system is a useful and important task. In this sense, staff can sometimes replace a systems analyst.
Imagine you need to interact with another department. In technical terms, it is a black box for you, inside of which there are a lot of teams with their own fragmented knowledge. The staff acts as a single person with a holistic perception of systems and can speak for all the equipment of the entire domain (and there are dozens of services).
This also means that when mutual interests of domains intersect, staff engineers are needed on both sides. In a sense, it acts as the glue between domains and teams.. Staff stores knowledge and understanding of all systems holistically, and not fragmentarily, or knows where to quickly obtain them.
This suggests that he has the skills of an architect: draw out the solution, especially when you need to compare trade-off options.
Designing large cross-domain features
Major product changes affecting several domains are led by a staff engineer from the technical side. Especially at the initial stages, when you have to put together a product and an engineer to implement a new feature, previously formulated by the business.
Usually at this stage there is no idea how to put it on the equipment. It is not clear how to do this, which teams and departments will be involved, and what kind of “ripple effect” (ripple) the planned changes can give.
Here the role of the staff is to help the product and the product team draw up a technical requirement, formalize the description, conduct research, suggest where to turn for advice to colleagues – staff from another domain.
The goal is to first work out a technical solution, come to a conceptual agreement with other departments, and decompose it to the level where the project can be put into operation.
The value of this work is in the systematic and coordinated implementation of changeswhen the “puzzle comes together” (even if some of its parts have to be hammered in).
Staff helps to “build bridges” in a coordinated manner, having a preliminary architectural plan and calculations. Otherwise, the implementation of one feature by unsynchronized teams may lead to unexpected results: instead of a functional bridge, a piece of road may appear on one side, a piece of metro on the other, and a raft connecting them.
A good study is:
clear artifacts – the team read them, everyone understood everything in the same way, and there was no ambiguity;
implementation takes place with a small number of clarifying questions or without them at all;
As a result, we received a product that meets business expectations.
Work in this direction involves meetings and communication with colleagues. Developed communication skills are required: the ability to maintain a conversation (small-talk) and start a conversation, clearly formulate thoughts, facilitate meetings, and maintain a focus on results.
Stability of processes in the domain
In some departments, a staff engineer is involved in the process of improving the stability of systems:
participates in incident reviews with an eye to the architectural plan;
helps the team understand and fix the problem systematically;
ensures that SLI schedules and processes reflect reality and help identify incidents.
Stability metrics should correlate with real life: developers technically operate with latency, response statuses or lag in the consumer group. But these factors do not always determine the problem in the real world. For example, you may not receive a response from a partner in some kind of information exchange: according to the graphs, everything is green, but in fact, “everything is bad and we don’t know about it.”
The staff’s task in this case is to build stability graphs documenting such situations for better control of the system’s operation. And also build a comprehensive stability management process on top of the graphs.
In this direction, the staff is greatly helped by the experience of developing and operating various systems.
Hands-on work
Most staff engineers have strong technical expertise in one or more stacks (or some technology) and work with their hands quite often.
For example:
will help you figure out a technical problem, write code, help you out, or advise your colleagues;
Some problems can be fixed themselves or passed on to the development team in the form of artifacts;
conduct research or quickly draft some technical report on an urgent problem;
create a proof of concept (PoC) or make a prototype of a solution;
explore OpenSource dependencies and features of their implementation or service code from another domain;
go write code with the team if the situation requires it.
This requires deep expertise and development experience. Most of the staff engineers at Cooper are former senior developers with a serious technical background. Working with his hands allows the staff engineer not to break away from reality and not lose his skills.
But there is a subtle point about hands-on work: staff cannot afford to do long-term tasksrequiring constant support and attention, because attention staff in the future maybe be easily switched to something of higher prioritywhich burns more.
Search for technology blockers
A staff engineer identifies and studies technological (or any other) blockers in his domain in advance. Looks for system features that in the long term worsen operational stability, reduce Lead Time, or otherwise hinder business development. Such research is usually aligned with the IT strategies of the domain or company in order to treat what really hurts the business.
Staff and his leader are in the same boat here. The manager is obliged to identify technological blockers in his department. But, in fact, he cannot always do this himself due to lack of time, deep knowledge of the system or expertise. The staff has time, knowledge of the system and expertise!
Limiting factors may not always be technological, or have a technical solution, such as introducing or improving certain processes.
In this kind of research, staff uses their experience, communicates with teams, managers, product and business, identifies problems, studies dashboards or implements metrics to measure the problem.
The primary task is to study and formulate the problem, assess possible risks and the effect expected from potential improvements.
In addition to the difficult to formalize “problem solving skills,” posing a problem will require systems thinking, the ability to separate cause and effect, find patterns and anomalies, as well as a product approach—develop hypotheses and test them until the metric improves. The faster the hypothesis testing and the better the quality of the hypothesis, the better.
Leading technological change
After identifying and posing the problem, the top-level staff engineer will work on its solution, coordinate the allocation of resources and take part in the project as a technical lead. Next, he will coordinate the work of connected teams, taking responsibility for the technical part of the project.
I will give two real examples of such technological initiatives.
Example 1: transferring a monolith to a platform
Cooper has PaaS – a platform that speeds up the launch and simplifies the maintenance of services (here article from my colleague about how it works here). On the other hand, we had two large non-platform monoliths, the separate maintenance of which consumed the resources of devops.
In order to free up resources to support monoliths, and also for monoliths to receive platform features out of the box: alerts, SLOs, automatic issuance of rights to topics, stages, etc., it was necessary to transfer them to the platform. This was the first large-scale test of the versatility of PaaS. Previously, it was believed that monoliths were so large, complex and specific that they could not be placed on a platform.
The staff engineer led this move, working with a cross-functional team drawn from different departments: product, DevOps, QA and developers, etc.
The plan for switching on day X was so worked out that it happened on the principle of “copy the command to the console.”
Some may perceive drawing up such a plan as a managerial task. Is assembling a rocket a management job or an engineering job? It seems to me that it is engineering, in which each of the engineers has his own understanding of part of the work in this entire assembly. Otherwise, something incomprehensible will come out of the rocket.
By the way, developing a platform is an equally complex technical task, which is led by one or more staff engineers.
Example 2: decomposition of a monolith to reduce load
At one point, the technological blocker of the entire system became the prohibitive load on the database of a large historical monolith.
In this case, the staff engineer worked ahead:
Independently investigated incidents, identified and summarized the problem.
I developed a system of metrics for analyzing the most loaded parts of the monolith (you can see how to set up a system of metrics for load on the database and look for effective optimization points in report or on slides).
Identified a limited context that required transfer from a monolith to a separate service with concomitant load transfer.
Developed a progressive migration plan, with a focus on stability (I talked about this project in more detail in article, in a conference report).
Presented the project to management, assessed the necessary resources, bargained for them, and launched the work.
Accompanied the project, providing assistance and support to all participants.
Bring the project to fruition.
As a result, by the high load season, an important limiting factor has been removed. And many engineering practices, first tested on this project, have been replicated in the daily work processes of different teams.
As you can see from the examples, along with technical skills, project management and teamwork skills are required (although it is stated that staff is an individual contributor).
But the most important thing is to be self-guided. If a staff member has a task, he researches it himself, arranges meetings with other domains, studies processes and the business area. At the same time, the staff itself can come up with a problem and formulate a task (see the previous paragraph).
Support for company-wide initiatives
To implement large-scale technological changes, they require support from various development departments. Thus, a large project turns into a large-scale initiative, to which staff engineers will first of all be involved from the domain side.
Example, a working group to prepare for the high-load season, suggesting:
analysis of results load testing (you can see how load testing works in Cooper in report);
drawing up an optimization plan in your domain with an eye to architectural changes;
creating a backlog of decomposed tasks, which will then go to the development teams.
Experience in operating services under high load, a thorough understanding of domain systems, as well as the ability to formulate and decompose tasks will come in handy here.
Guardian of Knowledge
In the company, due to rapid growth, people who have been working for more than three years are valuable experts, and have seen things that 90% of the company have not seen. They have undocumented context about the implementation of a particular feature.
And beyond technical expertise, the staff may have deep business expertise. With such expertise it is impossible to buy a person from the market and it can only be cultivated. Staff can be deeply immersed in what now exists, why it was made this way, and understand what ultimately wants, as well as systematize and distribute this knowledge in the form of artifacts (ADR, community guides, documentation), forming local (for companies) “best practices” about how to do it, how not to do it, and why.
Exchange of experience
Staff shares knowledge inside and outside the companypromotes engineering culture, helps less experienced engineers advance to the next level of their careers.
Most staffs have one or more activities from the list below:
mentor and participant in technical education programs;
reports at technical conferences, meetings, speeches, podcasts;
blogging (personal blog, stackoverflow, github, habr, etc.);
Writing this article is part of my job as a principal engineer.
Also, the staff may be tasked with developing architectural competencies within teams in the format of joint design:
support the team in the process and share experience;
help compare the pros and cons of different solution approaches;
gradually give teams more freedom and the opportunity to independently design solutions, arriving with a ready-made experience;
In this case, the team grows, and the staff’s work takes on more of a consulting nature.
The leader's right hand
Staff engineer shares the goals of the domain/company managerwho responds to all engineers in his department. The staff's task is to make sure that reporting systems were excellent, developed in accordance with the business plan, ahead of the company's development strategy.
In this regard, the staff engineer sometimes acts as a deputy or right hand (in the book Will Larson uses the term “right hand”), providing management assistance to the manager, as well as consulting on key technical decisions.
Right Hand is a very common pattern of staff work; in a sense, staff and manager are parallel tracks with a partially overlapping skill set.
Good staff helps the manager achieve his goals. Management experience helps him in this (and most respondents had it in the position of team or unit lead).
But there is a subtle point: The manager has administrative leverage, but the staff engineer does not. I recommend how to effectively work as a technical lead in such circumstances report.
Manager's Expectations
The head of a large department thinks broadly. He has a combined product and technology strategy for a year or two ahead. To support it, the staff’s work is mainly of a project nature: where it hurts, that's where we go.
In this case, the manager can completely delegate some problem to the staff for solution in order to relieve himself.
Here is a list of generalized requirements for staff engineers from technical management, which I collected as a result of interviews:
self-direction – the staff receives a direction or a problem and acts independently;
makes technical and architectural decisions and is responsible for them;
leads projects aimed at important metrics, for example, SLA, lead time, completion of the season, etc.;
helps teams work through product changes;
keeps the documentation of his unit up to date;
participates in the elimination of incidents and their subsequent analysis;
assists with urgent tasks in his area;
forms a technical vision for the development of its unit project;
flexibly participates in key projects in different roles (not only as a consultant)
As you can see, it is almost a mirror image of the previous section. I will only note that with a high probability one person will not be able to cover all these points, and a large division may have several staff engineers.
In the next article I will take a closer look at Senoir's path to Staff – this was also based on a series of interviews with Cooper staff engineers that I conducted this summer.
If the topic is interesting to you and you don’t want to miss it, subscribe to my tg channel Staff engineer or on the social network Cooper.tech — Telegram And YouTube.