Auditions for the Role of the Architect: The Offensive

In the last episode: the entrance to the harsh world of enterprise architecture lies through a panel. What is dead cannot die, and therefore in the task I was asked to reflect on the legacy diagram and offer rebirth. But to make it more fun – a minimum of information and only four hours of time.

Some understatement is always present on first dates. It's a pity that this time it is in the plane of the technical task. When time is short, it is easier to dance from demands and restrictions, rather than engage in confirmation paradox of choice. So let's talk about architecture right away. At interviews, all they talk about is architecture and design. They talk about how damn cool it is to watch a huge monolithic legacy melt into the waves.

Knockin' on Heaven's Door : On the Shore of Architecture

Knockin' on Heaven's Door : On the Shore of Architecture

To drown in the sea of ​​possibilities, let's start by defining the conditions.

Guesses and additions:

  1. The system does not manage or receive streams, but only processes successfully completed sessions. The task of streaming, waiting queue, correctness and availability of source data is no longer our problem. You can also filter out reading problems. If the record has been received for processing, it means it is undamaged and accessible. (In real design, subsystems are divided this way – there is no need to drag image format conversion and image quality improvement into the text recognition module. They make a conveyor of individual modules and a mediator. For example, microservices with ESB)

  2. All sessions are carried out with the participation of an operator and the system does not receive records greater than the number of operators at the same time. This means that we cannot have an unpredictable explosive load due to the flow of clients. Everything is according to the theory of constraints. In this case, we do not need a throughput greater than the operators can handle. We can immediately calculate the maximum possible load on the system. Of the necessary data, we still do not have the size/length of the session with the operator. If the minimum significant session for us is 30 seconds and there are 100 operators, then our legacy system can receive a maximum of 200 calls per minute with a peak of 100 at a time. It is also worth considering the exhaust of the system – API calls, but more on that later.

  3. API calls do not block the system. Ideally, of course, we don't want to wait for an answer at all and would like to just knock on something and get code 200 (not to be confused with cargo). But if I correctly determined that this is an integration point, then API is external systems and we cannot control how they work. On the other end, there may be a cheerful and rested REST, or maybe a soapy SOAP. In the case of heavier systems and hardware, you can prepare for TCP / UPD. Therefore, I suggest being ready to discuss this in advance, but if possible, remove legacy from the problems, and therefore save your nerves at the next stage. A legitimate assumption would be that the external system receives data and gives an ACK (confirmation) in response. It will process the data later. Well, judging by the diagram, in which there are no external players, we do not need any special result from it at all. This unspoken assumption is what holds the entire fragile integration mechanism together.

  4. The categorizer processes all records with full metadata, passing them through all categories. The conditional ruleset for calling the API does not skip anything. Since we know that it is triggered by a timer and processes everything that is available, we consider legacy as a pass through the entire list without any tricks with filtering and searching.

  5. The record that has passed, all handlers go to the archive immediately after categorization. That is, full categorization occurs only once and only after all meta processors. This, apparently, is now solved stupidly by a timer. First we wait for all the processors, and then we mark the categories.

  6. Each Processing Engine works independently. This means that they do not take into account the results of other handlers and do not compete for the record itself. You can even specify that everyone gets their own copy. Again, we reduce legacy to going through a list, without a dependency graph, entering a deadlock and infinite loops.

  7. The repository serves only to store and provide access to records and metadata.. CRUD. We don't want the repository to have business functions. All sorts of procedures, triggers, etc. The original diagram didn't have an outgoing connection from the repository, but just in case, we won't assume that the incoming ones perform function calls.

  8. The client always works with the same system installation (instance). Most likely, the system is generally installed and works locally for each of the company's clients. But SaaS is no longer news, the idea was born on the web and hosting. It is quite possible that small clients work with the company's data center. The diagram does not contradict the idea of ​​\u200b\u200bsimply raising everything on virtual machines in the cloud. Lift & Shift is expensive, inefficient, fast. In this case, we would like the client not to be thrown from turnip to turnip like a granddaughter and a Bug. Thus, not to complicate our life with the requirement to launch the API only once.

Preliminary result:

It is not entirely clear what the blocks in the diagram represent – stand-alone services or modules inside a monolith. I assume that this is a set of monoliths. The processing manager and handlers are one big stone, and the categorizer is another. Each of them has its own interface (UI/API/CLI), timer (scheduler), configurator and error handler, and database. This is hosted somewhere locally at the client's and, possibly, in their own data center. It clearly does not pull on a cloud infrastructure. All optimistic assumptions can be lubricated with freshly squeezed oil of flattery: “I am sure that since the product is popular, then everything is well designed here, and I imagine this option…”

  Legacy session processing: input stream->repository->processing engines->categorization->callout  ” title=” Legacy session processing: input stream->repository->processing engines->categorization->callout  ” width=”789″ height=”257″ data-src=”https://habrastorage.org/getpro/habr/upload_files/183/887/b3d/183887b3d4fe5a7c2f88cf2600723220.jpg” data-blurred=”true”/></p><p><figcaption>  Legacy session processing: input stream->repository->processing engines->categorization->callout</figcaption></p></figure><h4>What is the point of the current system:</h4><ol><li><p>Easy deployment and monitoring. Installing and maintaining 4-5 processes is much easier than 200. With the expectation of used and affordable sysadmins, not trendy expensive devops.</p></li><li><p>One-way communication – monoliths are weakly dependent and theoretically should not affect each other during development or operation. Parallelization allows for scaling in large blocks both horizontally (scale out) and vertically (scale up).</p></li><li><p>If there is no multitenancy, then it will be relatively easy to import. There may be overlays with missing client identifiers and potentially missing state reset. It is necessary that the basic components work stateless and there are no overlays with different configuration and access for different clients.</p></li><li><p>Based on the previous points, it is quite feasible to make a step-by-step migration to the new system. The steps, of course, will be Gulliver-like, but still we can hope for a little bloodshed of human resources crunching underfoot.</p></li><li><p>High data consistency. One repository – one source of truth. Convenient to check and save. Facilitates both maintenance and disaster recovery.</p></li><li><p>High data privacy. If the client has everything, then he is the owner of both the data and the hardware. Perfect for all government agencies and corporations – the dream of officers and offices with tinfoil hats.</p></li><li><p>House of a thousand customizations. Since the client does not have an apartment in an anthill, but, although typical, still his own house, then the level of perversions that he can allow himself in this house is limited only by money and the bandage of bureaucratic fantasy.</p></li></ol><h4>Rake:</h4><ol><li><p>With great control comes great responsibility (c) Human Admin. Yes, the client takes care of the entire infrastructure, both hardware and software, himself. To the best of his ability, and not to the requirements, as in the case of SaaS/PaaS.</p></li><li><p>Resource management. Large components in the discrete load world require resources to be constantly available to handle the potential maximum. If we have 100 records per second at peak, and 10 on average, then we still need to have resources (physical or virtual hardware) for 100.</p></li><li><p>Dependencies. In theory and in the picture, monoliths are independent of each other, but in reality the entire system is a distributed monolith, stitched together by a single business process and contracts. Processors write the meta on which the categorizer is based. Entering new data will require changes in both. Even at the timing level, we see a dependency – the categorizer runs much less often than the processing manager, since it must work later.</p></li><li><p>Shared resource. As we said, categorization is based on meta-data written by handlers. This means that both modules read and write to the same record. And since they don't have a common controller, but work in parallel, we have a classic race condition.</p></li><li><p>Redundant operations. Already at the design level, we were told that the categorizer is forced to process the record several times. In the worst case, a lot of times (<a rel=with return without regard to order). Judging by the single arrow between the handler manager and the handlers themselves, we have a blocking call and the manager is waiting for a full response. Another indirect clue that there is a monolith without communication or a bad architecture.

  • Low integrity of processes, unlike data. Data has one source, and the process is divided into parts. If one of the handlers constantly crashes, the categorizer does not know about it and may wait and return to processing the record forever. The overall process and the state of the process are unknown. It is especially worth noting that the missing result of the handler can be both an error/crash and acceptable behavior. Such are Schrödinger's processors.

  • Many points of complete failure (fragility). Each block is not replaceable (single point of failure), and a fall/stop affects the entire process. Judging by the diagram, there is no division of queues. If the repository is not available, everything lies. And it does not even digest what has already been swallowed. But if it is available, and the processor has fallen, not only will the client not receive integration calls, but we will also have a dam effect. We counted on 100 requests per second, but the repo accumulates records and as soon as the processor returns, it and categorization will have an endless flow. Thank God that the processor itself manages its tasks, which means it will not drown. But the SLA of integration and services on the other side can be flooded. So you have a dam, and the client can break through.

  • Scalability/performance problem. While processing can clearly be parallelized and each can be allocated to a separate stateless service, the categorizer does not look like that. In the current architecture, it is a single block that sees and does everything. Yes, and according to the task, it cannot be touched, but this is already a problem in itself. It is worth declaring and taking into account in the new design.

  • Limited throughput. This is probably the biggest headache of all for businesses. Considering that metadata processors do not always return a response and work for several minutes, we are most likely wasting some of our time. This assumption is based on the fact that in the diagram, timer X < Y. And (you do understand that the sentence should have started with this letter, right?) if we start plugging in numbers and estimate that the maximum processing time is 5 minutes, and the processing timer is also 5 minutes, then the logical categorization timer will be 6 minutes. It turns out that if our session with an operator is less than 6 minutes, the system will always be in catch-up mode. This means it is not capable of producing useful output 24/7. In other words, a busy business needs more than one system. Like working on system 1 during the day, and finishing processing at night, but working on system 2. So that we can finish processing in 2 during the day. I doubt customers are happy with active-passive scaling on a twin system Dioscuri.

  • Conclusion:

    On the one hand, there is little time in the presentation, but it is very important to show that you have taken into account and know a lot. Everything needs to be stated in a concise and confident manner. The way Carlson would do it. Leaving no room for doubts and questions. But you can describe it in more detail, if you have time. After the panel, you will be asked to send a presentation and the real selection of candidates will begin when everyone has spoken. And then they can pay special attention to the content. Perhaps, the selection committee will also include people who did not participate in this interview, but really want to “see everyone”. It is very important to mention the advantages of legacy architecture. You were told that the product is successful, so you should express your φ by starting with compliments. You should not include in the list of disadvantages unsolvable problems or what you could not/wanted to fix in the new approach. You need to lay out the rakes so that they can be avoided. That same exactly once – to hell with it.

    Carlson leaves no room for doubts and questions.

    Carlson leaves no room for doubts and questions.

    The answer to the first two questions is ready. In the general presentation of 9 slides: 3 of them are tinsel (Title page, About the author, Thank You!), 2 in this article, 3 more will be the content of the next one.

    Similar Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *