Audition for the role of Architect: performance

The election of an architect is always different. Only politicians are elected the same way – on the principle that it won't be this… But with architects, there is always a different scenario. Somewhere in half the seats, aesthetes sit and offer to depict something and discuss it. A quarter of the seats are clearly chemists with measuring cups in a safe – asking about the correct code or asking for bubble sorting. Even more rarely, circus performers ask to juggle glass balls on a tower or to race trams. The rest just want a discussion with the audience. As if hinting that you can't just come to a school named after Gnesins with the surname IvanovSomeone from the guild must finally say that there is something in the boy.

We already have a chewed-up legacy, a set of rakes methodically laid out under our feet, and high beams of signal lights to aim for. So all that's left is to pave the way from a pile of legacy to those very lights across a minefield of rakes. That's architecture. Step forward, consult the map, and be a wizard.

The architect sees the goal, believes in himself and does not notice criticism!

Whatever paradigm you choose, the integrity of the picture is important. So that the suit fits equally well in the shoulders and in the waist. If you are planning a hive of microservices in the cloud, then most likely some PostgreSQL will fit in well there – it is accessible and does not tie you to a specific cloud provider. Everything is centralized, automated, via remote access. And if you are planning a local deployment, then think about MSSQL clustera reasonable set of manual settings and monitoring once in a while, since the uninterrupted operation of the system is no longer guaranteed by the infrastructure and will depend on the average hand of the admin (the one that he put on your system). Naturally, choose the option – a Korean restaurant. The one in which you ate a dog.

The new is the unforgettable old. Since I assumed that legacy is on-premises, I will also fit nextgen into this reality. Not every country has government agencies and corporations that can work with public cloud services. After all, on the timeline, events are at the beginning of covid and widespread remote work. So I will remain in the reality of large components and ready-made solutions in a local server room. According to the conditions, we have an unchanged beginning and final blocks. There is no point and time to look for a new ingenious solution. We remain with the conveyor. The dependence between the steps will not go anywhere, but you can play with the level. We need to make scaling possible. That is, instead of one long rope between the pontoons, we need two short ones with a mount in the middle. So that it would be possible to hook and unhook the pontoons without destroying the crossing. Size is not as important as quantity. I'm talking about connections. Intercomponent. At an interview, such allegories are superfluous as a fax in a printer. Professionals are spoken to in professional. Broken English. So we are going to build something incredibly great and complex here… It is not clear yet what, but it is event driven.

Cohesion & Coupling: Coupling and Dependency Expert on Quantity-Quality Approach to Microservices

The principle of quick construction:

the arrows are only one-sided,
message through some message queue,
We do not disclose details.

Will we really have a Message Queue, Pub-Sub with topics or Event Bus – those same unimportant details that we can talk about, but definitely don't have time to think through and depict. Ideally, this affects the direction of the arrows. However, rarely does anyone look at this. Ideals are important to the young, and I am experienced. In my memory there is polling in pub-sub, multi-megabyte events with full context and queues without observing the order. The rule that in any rule, without exceptions, there are exceptions remains exceptionally correct. In short, Carlson's mode and to all questions: “Implementation details, it's a matter of everyday life!”

Implementation details are not an architect's business!

The first sketch is a picture from the assignment, converted into a diagram from PowerPoint objects by breaking up all the arrows with the exchange component:

Input -> MQ -> Repository -> MQ -> Processors -> MQ -> Categorization -> MQ -> Callout

Immediately the second iteration, since there are two black boxes:

Input -> Repository -> MQ -> Processors -> MQ(?) -> Categorization -> Callout

Since the categorizer, like a child in the back seat, asks every five minutes: “Are we there yet?”, regardless of what is happening, then the initiative always comes from him. In theory, you can remake the request to the queue or topic, instead of the storage. In extreme cases, you can always cook up an adapter over the queue, simulating the storage contract from legacy. These thoughts immediately prompted the idea that it will not work just on bare events. Although it sounds vulgar and attractive. The categorizer does not have an interface for registration and callback in pub-sub. Plus, it immediately receives the entire context, and not just a notification that an important event has occurred – go get a treat.” So, after all, a centralized MQ with persistence and confirmations (Guaranteed Delivery).

Event Driven: Naked Event Expert on Minimal Context

This is where we start to accelerate.

Now about the integrity of the process – we introduce a rule that all processors (Processing Engine) return a result. And accordingly, we need a supervisor who will monitor what has fallen and restart it. It seems that in this case, the message queue will be a good option. The manager will take messages, create processors and confirm the removal from the queue after.

Wait, but this “after”, how will the manager know about it? Then we need a two-way communication between the Processing Manager and the Processing Engine that it creates (whether these are processes or nanoservices in containers – implementation details). I really don't want this. I can't eat. Let's do it this way, we'll apply a variant of the design pattern replacing null value. We will have one special empty Processing Engine, which will give the answer synchronously, after it writes to the Repository metadata that the processing has started. This is not in the diagram, but since I asked myself the question, others can use it. So, you need to be ready to give a weighty answer. The Processing Manager will only spawn those Engines whose result is not in the published metadata. But it is not clear what to do with the fallen handlers. At this stage, I put a note that the re-publishing of the session in the queue lies on the Repository and is triggered either by timer X (as they have now, and therefore no worse), or through external monitoring. After all, respected dons should have some kind of watchdog/healthcheck to monitor their business. As doctors say: “sooner or later we all come to this need.” Basic DR We already have it, and we can remove the persistence and guaranteed delivery requirements from the queue.

We have hung the publish/republish function in Repository. Let's add routing there too. It would be normal to have separate broker and queues in the cloud, but for deployment on domestic servers I need a minimum of components. So there is one message server, and there will be a couple of queues/topics. A record with incomplete meta to the queue to Processing Manager, and with complete meta to Categorization.

I am not a domain expert, so I have no idea what and how is processed. Perhaps a session is a large file that is read sequentially, or perhaps a set of chunks like in a torrent, and the handlers pull them in random order. So we need a blank for all occasions. I will divide the data storage into 3 bases:

Operational relational, where meta is written and read. So here are searches and indexes.
Object or file storage where a session is published once and read many times.
Storage of processed sessions – blob and meta in one document. The task did not mention what happens after external API calls. But from operational data it is necessary to remove what is not needed, so as not to break the performance. So here there will be cold storage. Or reports and metrics to run, without consequences for current processes.

With the help of such simple mechanics, I got a state machine and a facade over the data. The first is important for scaling – we removed the need to store the state in other modules/services. The second is for stability and predictability – the abstraction will allow you to change the vendor and, if necessary, implement CQRS.

Now our process looks something like this:

Input -> Repository -> MQ -> Processor Manager -> Processors[] ->

Repository -> MQ -> Categorization -> Callout

The next iteration seems to hint that since we are breaking dependencies by exchanging through a queue, why not apply this approach to every pair of modules:

Input -> Repository -> MQ -> Processor Manager -> Processors[] ->

Repository -> MQ -> Categorization ->

MQ -> Callout

This way, we can increase the throughput of the closed Categorization module. We simply remove the need for it to wait for a response. But in the case of multiple services and an unstable connection (most likely, we do not have a LAN or Cloud2Cloud there), calls are fraught with excessively fraught timeouts. The magic is that calling an external service is no different from calling your own. That is, instead of directly posting to someone's ABC service, we post to our queue, thereby disconnecting the categorization and integration processes. And then the newly created services pick up from the queue and execute the same request in ABC. That is, as with processing, we will not be hindered by some manager creating a call worker.

We get a new picture of the world:

Input -> Repository -> MQ -> Processor Manager -> Processors[] ->

Repository -> MQ -> Categorization -> Repository

MQ -> Callout Manager -> Callouts [] ->

Here a pattern is already emerging. When one decision MQ -> Manager -> Engine[] repeats itself over and over again – it already looks like Mandelbrot architecture, which means I’m on the right track!

The categorizer cannot be changed according to the assignment, so I play with the idea that modernization is just stretching something fashionable onto the working framework. Apparently, it receives all its context from outside and executes it to the end. It turns out to be a kind of stateless gigaservice. That is, by creating additional instances and managing the context, you can scale in width. What was Categorization will become Category Engine and we will need a manager who will subscribe to messages for communication and will create and kill Engines. And the timer configuration should be changed to a minimum (if the legacy waits for the interval before the first call at startup) or a maximum (if the call is first made at startup, and then the wait). Thus, the dotted block should generally remove the minuses without changing the old code.

Somewhere around this point, all the sand in the hourglass went into a state of rest, and I went into a state of anxiety. Something went wrong with the admissions committee schedules, and I was told I had one more hour. I spent it marking, coloring, and fixing mistakes.

In the end, this is what happened:

An interview on Zoom during the corona quarantines was like the Voice. The four architects on the jury are sitting, staring at the next screen, continuing to work. It's better to pull them away right away. Periodically ask simple things: can you be seen normally and how is the connection quality on their side. If you let everyone disconnect, then the impression will be blurred and there will be questions out of thin air. And questions without context are designed to make you guess the thoughts in the head of the person asking. I'm not good at this. For me, the point of the presentation is precisely to make my thoughts clear. Some people go to a therapist for this, and some go to interviews.

They gave little time for the presentation, and didn't ask much. The presentation was in English (a requirement for many large companies), and there was no discussion. Judging by the questions, they clearly wanted to go to the cloud, but they liked the idea of transitional architecture. I tactfully hinted that if I had given the same result in a couple of hours, which their team had been working on for months, they would have 3 extra architects.

In any case, remember the main secret of architects – the answer to any question begins with the phrase “It depends”. Then, depending on the level of English, there will be either the preposition “on” and a couple of examples of dependencies, or a lengthy discussion in order to gain time. But any unambiguous answer with a superficial acquaintance with the task will give you away as a craftsman. And we need creators here! As you probably already understood, if you have endured to this point – churning out blueprints is not difficult. It is difficult to see potential problems of development, execution, restoration, maintenance and confrontation with influencers who force their silver bullets for gold coins by looking at cubes and arrows.

It is worth remembering that the result might not have depended on the content of my report. Maybe they just urgently needed a person. The main thing is that he did not make them wait for troubles, of which they clearly have enough. And was ready to discuss the genius of their architecture with the developers. They themselves seemed tired. Or maybe the not entirely competitive offer that they made me was based on this presentation. I myself do not remember why another queue for Callout suddenly appeared, and the existing one in the diagram was not reused. Perhaps because I decided that the categorizer does not write a response to the repository. This means that the life cycle and the process will be different. The slightly clumsy execution of the diagram is also original – preserved in its original form.

If you suddenly felt that this text was torn from some larger story: