I could not find on the content hub about management models and metrics, but it really is not enough, so I decided to share my experience.
Let’s start by answering the question why our company is responsible for service stations:
1. The team. The service station is responsible for selecting and retaining a strong team of engineers, as well as for their involvement.
2. Architecture.The IT landscape should be flexible, modern, and the state of production should be transparent to everyone.
3. Reliability. Everyone understands that the IT landscape should be not only the most fashionable and youthful, but also not fall.
4. The budget. In an IT company, IT costs are the main article, therefore, not only the CFO, but also you must work with these costs, as well as look at market benchmarks.
5. Cybersecurity. You are also responsible for cybersecurity. It doesn’t matter if you have CISO or not (we have a spoiler). In the end, all the good and bad things in the IT landscape of this company are done with your most direct hands.
Now let’s see the metrics that will help you see the picture of your responsibility. Somewhere metrics are derivatives, because measure directly will not work.
1. The team.
– completeness of IT functions. What for? The answer is that due to low staffing, the impossibility of fulfilling the planned tasks follows. The priority of hiring is set strictly according to the staffing indicator. The lower the staffing level, the higher the priority of your vacancy for HR. Everything is simple.
– “fluidity” in the context of functions. What for? A high / growing staff turnover means that you have a poor team climate, problems with processes, or wages. Exit interviews with employees will help cluster problems.
– the percentage of employees coming on recommendations from current employees (referral indicator). A growing percentage suggests that your engineers like to work for you, they are ready to recommend your company to your close people and just acquaintances.
There is still a field for experimentation. But still:
– architectural deviations. What is it? Architectural deviation – you found a production on production that does not meet the approved architectural standard. What for? IT landscape is a huge interconnected system that must work according to the rules. If you do not follow the rules, the system simply will not.
– PageSpeed Insights metrics for web applications. The metric is relevant for the front end of course. What for? Caring for your users. Low indicators – you have a braking site that does not work well on your mobile, you are at the bottom of Google’s issuance, and of course you have very poor front-end expertise.
– accessibility in the context of web applications and services. No comments
– TOP culprits of incidents for the period. What for? where it’s time for you to connect, like a service station and deal with architecture and processes.
– TOP affected services for the period.What for? An experienced engineer designs the system so that it is minimally dependent on external dependencies. If the “victim” often falls into your accountability, you should go and deal with architecture and processes here.
– crash-free of your application on mobile platforms. No comments
4. The budget.
– what percentage of the IT budget is the cost of the data center, the development of systems, how much you spend on their maintenance. What for? There are reputable companies like Gartner that annually issue IT spending benchmarks by article. Useful stencil. For example, if you spend more on maintenance than the average value of your colleagues in the workshop, you may have problems with the architecture and the release process. If you spend more than your colleagues on hosting, then either your utilization is low, or you are obviously buying overpriced equipment and virtualization.
– TOP expense items in the driver layout. What for? This is your optimization job. Here, most likely, there is fat that can be cut and put into cool things.
-% of budget execution. No comments
– Found vulnerabilities in the context of functions / teams. What for? If the trend for vulnerabilities is upward, you should connect as a service station and deal with architecture and processes.
– SLA failure to close vulnerabilities.What for? Respect for discipline, remember I wrote above that the IT landscape is a large interconnected system that must work according to the rules …
The reader may ask, “But what about such important metrics as the time that market, the number of automatic deployments vs the number of manual, something about the density of pull requests?” The answer is yes, they once looked at them, now they are not relevant for us. This is normal for metrics; each has its own life cycle.
Now let’s move on to the features in IT in DomClick. I will not quote classics with their “development, implementation, maintenance, operation” – I will describe functions at a different level of abstraction:
1. Product development. Teams that create products through the sale of which the company receives the main income. There are mixed business and IT teams. There are ROs and CJEs that answer the question “what to do” and “what priority”, engineers answer the question “how to do”.
2. Development of platform services (Core). Teams creating “fundamental” services. Those services that everyone in the company uses, for example, authorization and authentication services, API Gateway, file storage, grocery billing, etc.
3. Internal development. The team creating the systems for the company itself is a corporate portal, a system for accounting and finance, automation of all internal processes in the company.
4. Web Standards (Web Core) The team that defines the standards of the web-front throughout the company, is developing a common library of UI-components, is connected to solving complex and border front-line problems.
5. Mobile platform. Mobile development always stands apart because of the features of mobile platforms: starting from development, ending with another testing process and rolling out releases into products. We have our own “framework” for mobile development work, which I think will be written in a separate article. Spoiler – most of our mobile developers work in Product development teams side by side with fronts and backs.
6. Development and maintenance of infrastructure. A team that is fully responsible for the network, hardware, OS, virtual machines, k8s, databases.
7. The DevOps platform. For medium-sized companies, this issue is already important and requires the creation of a separate function. We have a dedicated team creating the best pipeline with canaries and other useful goodies.
8. Architecture. A function that prepares a representation of the IT landscape at the highest level. It is very important when the business has only an idea and it is not yet clear which teams to give the implementation to or whether it will require the creation of a separate team.
9. R&D. The team solves problems whose application technology is not yet clear. Any projects with VR / AR, which will definitely find a place in real estate, but so far there is no consumer technology and the business effect is not clear.
10. Competitive intelligence. A function that collects information about the device of IT colleagues in the workshop. It is always interesting to know which resource (stack, team, management model) other companies solve similar problems. Of course, only your divine hands write the best code, but this is not so, your colleagues also understand something in IT and they need to learn from them.
11. Management of IT contracts. (not to be confused with the purchase). This function is responsible for optimizing contracts and improving contract terms with all IT vendors. This is primarily negotiators.
12. Data Science. In modern IT, you must have a DS team with at least RecSys and ML competencies. OCR and NLP – according to your needs, most likely, you just buy the products of some vendor. We have a full range of competencies, including OCR. Document recognition and image classification services are developed internally by Domclick.
13. Work with data. The main objective of the function is to make the data in the company easily accessible. Hence the tasks of standardizing data models of different services, monitoring the availability of data, their completeness and relevance.
14. Integration team. The team is responsible for coordinating the integration with external services, where there is a lot of communication, several teams participate, where contracts or hard deadlines appear and scrum no longer helps
Feature Layout in DomKlik, it looks like this:
At leisure, I recommend reading an article by our head of internal and core services development, as well as an article by an engineer from the unit Development and maintenance of infrastructure. It will be a little clear what the guys are doing.
A function is not necessarily a large team of people. Some functions may consist of one person, but they should be, someone should solve this problem. Perhaps at first, you yourself.
The layout of these functions is left to your discretion, because it depends on the competence of your employees. I will give only two tips from personal practice:
1. Function Development and maintenance of infrastructure must be equidistant from other functions. This means that it must always be independent. The company should not have conflicts and questions that someone has more limits in k8s, someone has been given a more professional admin, and someone will always get the first hardware and virtual machines.
2. If your business is large and you have several teams Product Development, then you cannot combine one of the product teams with the team Platform Services Development. The reason is simple – the entire core backlog will be full of the wishes of a particular product team, and not the needs of the entire company. And the worst part is that the implementation will be sharpened by the stack of this command.
We have interesting distinctive features that greatly simplify the work and level conflicts over the issue of priorities. You remember that IT, in addition to creating and selling products, you also need to switch to new versions of API-related products, conduct refactoring, fix vulnerabilities, and analyze root problems:
1. Each team devotes 20% of its time to the engineering quota. In this quota, we try to solve all the above problems and do research if time is left. Important: cyber security bugs and blockers are NOT included in 20%, they are eliminated due to the food quota.
2. Large areas of product development (several agile teams work on one product) have their own small platform team, which generally ensures that the “caftan does not leave” and solves only the engineering problems of this product.
1. The management model is primary – it is stronger than the company’s culture.
2. Any system strives for balance, so be persistent in changes.
3. Review your management model every year. As an example, a function IT contract management appeared with us in 2020, and Web Standards (Web Core), R&D,Competitive intelligence – in 2019.