Auditions for the Role of the Architect: The Offensive
In the last episode: the entrance to the harsh world of enterprise architecture lies through a panel. What is dead cannot die, and therefore in the task I was asked to reflect on the legacy diagram and offer rebirth. But to make it more fun – a minimum of information and only four hours of time.
Some understatement is always present on first dates. It's a pity that this time it is in the plane of the technical task. When time is short, it is easier to dance from demands and restrictions, rather than engage in confirmation paradox of choice. So let's talk about architecture right away. At interviews, all they talk about is architecture and design. They talk about how damn cool it is to watch a huge monolithic legacy melt into the waves.
To drown in the sea of possibilities, let's start by defining the conditions.
Guesses and additions:
The system does not manage or receive streams, but only processes successfully completed sessions. The task of streaming, waiting queue, correctness and availability of source data is no longer our problem. You can also filter out reading problems. If the record has been received for processing, it means it is undamaged and accessible. (In real design, subsystems are divided this way – there is no need to drag image format conversion and image quality improvement into the text recognition module. They make a conveyor of individual modules and a mediator. For example, microservices with ESB)
All sessions are carried out with the participation of an operator and the system does not receive records greater than the number of operators at the same time. This means that we cannot have an unpredictable explosive load due to the flow of clients. Everything is according to the theory of constraints. In this case, we do not need a throughput greater than the operators can handle. We can immediately calculate the maximum possible load on the system. Of the necessary data, we still do not have the size/length of the session with the operator. If the minimum significant session for us is 30 seconds and there are 100 operators, then our legacy system can receive a maximum of 200 calls per minute with a peak of 100 at a time. It is also worth considering the exhaust of the system – API calls, but more on that later.
API calls do not block the system. Ideally, of course, we don't want to wait for an answer at all and would like to just knock on something and get code 200 (not to be confused with cargo). But if I correctly determined that this is an integration point, then API is external systems and we cannot control how they work. On the other end, there may be a cheerful and rested REST, or maybe a soapy SOAP. In the case of heavier systems and hardware, you can prepare for TCP / UPD. Therefore, I suggest being ready to discuss this in advance, but if possible, remove legacy from the problems, and therefore save your nerves at the next stage. A legitimate assumption would be that the external system receives data and gives an ACK (confirmation) in response. It will process the data later. Well, judging by the diagram, in which there are no external players, we do not need any special result from it at all. This unspoken assumption is what holds the entire fragile integration mechanism together.
The categorizer processes all records with full metadata, passing them through all categories. The conditional ruleset for calling the API does not skip anything. Since we know that it is triggered by a timer and processes everything that is available, we consider legacy as a pass through the entire list without any tricks with filtering and searching.
The record that has passed, all handlers go to the archive immediately after categorization. That is, full categorization occurs only once and only after all meta processors. This, apparently, is now solved stupidly by a timer. First we wait for all the processors, and then we mark the categories.
Each Processing Engine works independently. This means that they do not take into account the results of other handlers and do not compete for the record itself. You can even specify that everyone gets their own copy. Again, we reduce legacy to going through a list, without a dependency graph, entering a deadlock and infinite loops.
The repository serves only to store and provide access to records and metadata.. CRUD. We don't want the repository to have business functions. All sorts of procedures, triggers, etc. The original diagram didn't have an outgoing connection from the repository, but just in case, we won't assume that the incoming ones perform function calls.
The client always works with the same system installation (instance). Most likely, the system is generally installed and works locally for each of the company's clients. But SaaS is no longer news, the idea was born on the web and hosting. It is quite possible that small clients work with the company's data center. The diagram does not contradict the idea of \u200b\u200bsimply raising everything on virtual machines in the cloud. Lift & Shift is expensive, inefficient, fast. In this case, we would like the client not to be thrown from turnip to turnip like a granddaughter and a Bug. Thus, not to complicate our life with the requirement to launch the API only once.
Preliminary result:
It is not entirely clear what the blocks in the diagram represent – stand-alone services or modules inside a monolith. I assume that this is a set of monoliths. The processing manager and handlers are one big stone, and the categorizer is another. Each of them has its own interface (UI/API/CLI), timer (scheduler), configurator and error handler, and database. This is hosted somewhere locally at the client's and, possibly, in their own data center. It clearly does not pull on a cloud infrastructure. All optimistic assumptions can be lubricated with freshly squeezed oil of flattery: “I am sure that since the product is popular, then everything is well designed here, and I imagine this option…”