MTS, like many other modern companies, has undergone the so-called digital transformation. In simple terms, the launch of digital processes and products has become our priority.
For me, as a techie, this means that the direction of the business in the company depends entirely on the quality of IT systems and their ability to rapidly evolve.
Of course, this is a wrong definition, and marketers can argue with me – and even argue! But for everything that you read below, it is quite enough.
Less bureaucracy – easier development
What has changed: first of all, the model of company management. If earlier the guys from the centralized enterprise architecture (enterprise architecture) verified each project, now they publish a technical policy (a large and clever document) and train architects in it. And how to apply it is already a personal matter of each product architect from more than a hundred teams.
On the one hand, this is good – less bureaucracy, which greatly simplifies development. On the other hand, all products interact with each other in one way or another, and an error in one of them may affect the other.
For example, in Software Systems Architecture: Working with Stakeholders Using Viewpoints and Perspectives, Eoin Woods and Nick Rozanski write about the basic security principle, secure the weakest link. It means that if there is at least one weakly protected IT system in your IT landscape, then the entire IT landscape is at risk. Just because a hypothetical attacker can work with impunity on behalf of this system.
There are many more examples where it is useful to have guaranteed quality and consistency in the design and development of IT systems.
What we came up with: create a community to share knowledge and disseminate best practices. The idea is not new and not very revolutionary, but meets the requirements and specifics of the development of digital products.
- As a way to organize work, we chose to conduct interviews with representatives of all roles of production teams – from analyst to DevOps and support engineers;
- Interviews are conducted by the same representatives of product teams, who are respected in the company. It is very important that these are practicing specialists, and not external consultants or auditors;
- We do not look at the compliance of works with any standard or regulation, our task is to identify risks. This has several advantages. Firstly, risks can be assessed in terms of probability and impact on a specific team; secondly, they can be sorted; thirdly, for each risk, you can come up with a control plan;
- Interviews in the format of live communication, where both parties share experiences and discuss technical nuances;
- Organization of rotation in a team of “auditors” so that as many team representatives as possible have the opportunity to share knowledge and experience.
To start the process, we assembled a team of enthusiasts, developed a list of discussion topics for each of the roles, and trained the team of our impromptu auditors. By the way, training was the most difficult stage, because often very good specialists in our field are also very good introverts 🙂
What is the result?
- The process of researching product teams has been rather leisurely. On average, it takes about 31 days for one team. During this time, we manage to communicate with representatives of all areas of the team’s activity, compile a memo report and explain it to the product owner so that he can plan it for action;
- The result of the work is very dependent on the expert. Therefore, it is important that there are several for each role: two analysts, two architects, etc .; where one has already conducted a series of interviews, and the other is only involved in communication;
- It is also necessary to constantly adapt the methodology of interviewing, as some topics lose their relevance, and in their place there are questions that no one had thought of before.
For example, let’s look at the results of a study in the direction of “Architecture”.
What have we done:
- Communicated with 20 teams;
- Each spent an average of 31 days. Given the fact that we simultaneously interacted with several teams, the whole process took six months;
- Revealed 180 risks associated with architecture.
Inside our teams, the risks were divided as follows:
Risk 1: Design
It is important to understand that all the software systems that we are exploring somehow go through a fairly strict output quality control (for example, for telecom systems, the control period is longer than the development period), but there are no boundaries to perfection and efficiency.
To understand what we consider to be risks, let’s look at the TOP-3 by examples.
For young product teams, the situation is quite normal when the software architecture is developed on a residual basis. At first it seems that everything is simple, and the timing of projects rarely gives you the opportunity to seriously think about the organization of architecture. And then the bottom-up design method comes into play – when we develop the individual components of the solution, after which we assemble them into a single whole.
For example, we decided to make a digital product for telemedicine. What is needed for this?
- We probably need a component for video calls between the patient and the doctor – we make a component for calls;
- Sometimes you need a regular chat – that means we make a component for the chat;
- We need to take the medical history from automated medical systems – we create the appropriate component;
- We need to keep a schedule of doctors on duty – we make a component for this as well.
And so on.
Everything seems simple until we start putting it all together. And here there are problems with duplication of functions – for example, chat and video call are very close applications in themselves (at least from the point of view of the context of the doctor-patient interaction). Those. the risk is that we will have to redo our application quite significantly due to the large amount of duplicate code.
Or problems with the data model. Each component by default provides interfaces in that model, which is convenient for storing and processing this particular component, and not the application as a whole.
Therefore, it is worth remembering a number of simple rules:
- The bottom-up design method is good for small projects with low technical complexity, small teams and volatile requirements;
- For large projects and teams, the design method is top-down, that is, when we first design the picture as a whole, and then proceed to coding.
Therefore, before plunging headlong into a new project, ask yourself the question: what type does it belong to?
Risk 2: Security
It seemed that security is being thought very seriously these days. Everyone remembers such banalities as necessity:
- Authenticate users
- authorize them to carry out actions;
- comply with the principle of least privileges;
- maintain data confidentiality;
- keep a log of the audit of user actions.
But here is the surprise! For teams that do services for internal automation, this is not as obvious as for everyone else. It seems that if the application is already working on the internal corporate network, then why else should it be protected? In fact, it is necessary, especially if the data with which the application works is classified as personal. Yes, the probability that an intruder penetrated the internal network is very small, but there is not much protection.
And with external applications, nuances can also arise. Consider a simple, purely hypothetical, example web application that authenticates a user with a password. What problems can there be:
- An application may allow you to enter passwords that are too simple, which are then easy to pick up;
- The application may not be protected from brute force passwords themselves (there is neither captcha, nor anything like that);
- The application can generate a password at the first registration and not require its mandatory change. Thus, the password will be stored somewhere in the mail in the clear;
- The password can be transmitted in the URL request or in the body of the HTTP request in clear text;
- The application stores passwords in the form of hash functions, but uses an insecure cryptographic algorithm. For example, MD5 is relatively easy to sort through using rainbow tables;
- The web application does not provide users with the ability to change the password or does not notify users of the change of their passwords;
- The web application uses a vulnerable password recovery feature that can be used to gain unauthorized access to other accounts. For example, it asks for information that half of the organization knows besides you;
- The web application does not require re-authentication of the user for important actions: changing the password, changing the address of the delivery of goods, etc .;
- The web application is working unsafe with HTTP sessions:
- the web application creates session tokens in such a way that they can be matched or predicted for other users;
- the web application is vulnerable to session fixation attacks (that is, it does not replace session token when an anonymous user session is authenticated);
- the web application does not set the HttpOnly and Secure flags for browser cookies containing session tokens;
- the web application does not destroy user sessions after a short period of inactivity or does not provide a function to exit an authenticated session.
Thus, the risk here is that someone will gain access to data that is not intended for him. And this can lead to problems in the application.
These are just examples of what you can talk about in the security field. Of course, an ideal option would be to implement the Secure Development Life Cycle process, for example, such as Microsoft recommends.
Risk 3: performance
One of the problems with quickly created product teams is a three-letter word. This is an MVP or minimal value product. Such teams strive to create an application as soon as possible, which will begin to generate revenue for the company, and since there will be very few users at the beginning of the application, they usually think about performance parameters at the last moment. But if the created application suddenly becomes popular, then you have to think about what to do next.
The recommendations here are simple: application performance is inversely proportional to the number of requests for slow resources. Accordingly, all tactics are aimed either at reducing the number of requests, or at accelerating the resources themselves. In this case, resources are understood as a processor, memory, network, disks; it is also sometimes convenient to consider a database or an application server as a resource.
- First, we look at whether it is possible to make a client cache in a distributed application so that each time we do not request / calculate the data we need. If this is possible, then we save on network requests, loading server resources and everything that he does there.
- But it’s very rarely lucky, so we’re looking to see if we can make a server cache. With him, the principle is the same as with the client, but the performance gain is slightly less, because network requests will still go;
- Here we recall that it would be nice to scale the server. Nowadays, it is difficult to imagine microservices an application that would not scale horizontally, that is, by installing another copy of the server and organizing the distribution of the request, for example, on the basis of load balancer;
- Since our server is scalable, we need a distributed cluster cache. There are many useful systems for this – from the already obsolete My SQL Cluster Grid Edition to the completely hype Apache Ignite (Gridgain).
Well, of course, we must remember that the cache itself solves the problem of access to data, but creates a new problem with the algorithm for its invalidation and preload. And in some systems, caching can be completely useless. For example, in CRM (Customer Relationship Management) systems it is very rarely possible to effectively cache customer data. The specialist who works in the office moves very quickly from one client to another and the cache is simply not used.
Thus, the risk here is that without first thinking about the strategy of how we will “overclock” our application, we may end up at very high costs for rewriting the application in the future.
In this article I tried to talk about how you can organize the process of building effective development in a distributed digital company through expert communication. In our time of remote development, such processes are becoming especially relevant. They allow you to destroy Conway’s law, or at least minimize it.
If you decide to create your own checklists, then I would recommend not to do everything from scratch, but to take something from existing literature. Software architecture’s Handbook by Joseph Ingeno ISBN: 9781788624060, for example, is very useful in architecture.
My report can be viewed here
Article author: Dmitry Dzyuba, Head of the R&D Center