You will pay for this! The Price of Clean Architecture

Hello everyone, my name is Artemy, I work as a senior Android developer in the RuStore core team. My experience in the industry is already 8 years. During this time, I managed to work in different projects and companies. I had experience working on a project that had over 300 modules and more than 60 Android developers. Such conditions force us to think about scalability at a fundamentally different level.

Today I will talk about ways to ensure project scalability and how the misperception of Clean Architecture (hereinafter referred to as NA) can harm this. I warn you right away, this is a longread in two parts!

Price and value. A little theory

In one of the reports, and in real life, I often heard the opinion that HA is expensive, but worthwhile. On the contrary, I heard from one person that she was not worth it. Do you think CHA is expensive?

Let's remember the beginning of Robert Martin's book, which is called “Clean Architecture”. In it, the author talked about the period of his life when he worked on a project in a team of unprincipled developers, that is, the basic principles had not yet been developed, which are subsequently described in the book. Without long introductions, the project died I found myself in a state where the number of developers was increasing at the same rate to release features. The author even provided graphs that well characterize the position of the project:

The graphs show that the increase in the number of developers did not have a significant impact. The number of lines in the code base remained virtually stagnant, and the cost of writing a line of code increased significantly. Later in the book, Uncle Bob described the necessary principles that were supposed to help us fight cost, and not join it.

In articles and reports, the thesis about high cost is supported by various statements:

  • The CA must meet a certain set of criteria.

  • It is expensive to maintain the principles in the book.

  • Difficulty in supporting layers.

  • Lots of components.

But I think that all the problems that developers face have a common beginning.

Misunderstanding of terminology

The root of problems with cost lies in a lack of understanding of what NA is. When they ask at an interview: “What is CA?”, candidates most often talk not about the book and its contents, not about the chapter of the same name from the same book, but about the diagram of the same name. And the interviewees, in turn, most often expect clarification on this.

But this is not a CA, this is just one of the options for representing its structure.

It turns out that many people confuse cause and effect. NA is not derived from this scheme. This scheme was derived from the application of the principles of Cha. But it was derived based on the experience of a specific person in a specific project for a specific platform. We can, of course, try to derive the definition of NA from the options for its implementation, but for this we need more options, otherwise we will have a one-sided idea of ​​it.

I could not find a single article that would talk about other options for implementing NA, and I do not think it is correct to define NA based on only one result. We need to go back to basics and look for the answer in a book. The author himself has not given us a clear definition, but I can suggest a way to derive it. To do this, we need to answer the question: “What is SOLID for?”

The topic of SOLID is even more hackneyed than the topic of CA, but how often do you see in articles about SOLID the answer to the question: “Why is it needed?” I think that many mistakes could have been avoided if the authors of these articles had first asked themselves about the goals.

Why is SOLID important for definition?

The first half of Uncle Bob's book tells us about 11 principles. Among them there are 6 principles of component organization (which, in turn, are divided into principles of compatibility and connectivity of components), as well as 5 SOLID principles. It would seem that SOLID takes up less than half, but in reality everything is a little more complicated.

Here is a diagram of the dependencies of the principles, depending on which we form our dependencies (in other words, the principles of CA):

Everyone is used to talking about only 5 principles, but the goals of applying the remaining principles are no different.

To answer the question: “Why do you need SOLID?”, you don’t have to go far, let’s turn to Wikipedia and find out the answer:

  1. To create a system that can be easily maintained and expanded over time.

  2. To improve the software.

If NA is based on principles that have clear goals, then these goals can be included in the definition of NA:

A Clean Architecture is an architecture that can be easily maintained, extended, and improved over time.

It would seem that now those who like to write code in one class can safely say that their architecture is Clean, because it's so easy for them. But will it stay that way “for a long time”? It seems easy enough for a team of a couple of people to make changes, even if the code is in the same class. But if you leave the project, will your receivers be able to thank you? What if you add a pinch of scaling, and they hire a dozen other people to help you?

It is important to understand that “lightness” is not some abstract feeling, but the preservation of the main indicator – TTM (Time to market). I could say that time for Onboarding of newbies, Build Time of the project, etc. are also important, but all this is included in TTM.

Contrasting CA with other architectures

Great, the definition has been deduced. But if the presence of the word “architecture” in it confuses you, then you have not yet come to a full understanding. One of the problems in understanding CA is that it can be mistakenly contrasted with other architectures. Have we heard that there are other architectures? Bulbous (layered), hexagonal, maybe someone will remember others. But CHA cannot oppose them.

When did we start considering CA as a full-fledged architecture? I think it all starts with articles on this topic. When I first started studying it, I liked the CA diagram, passed through the prism of mobile development, indicating the components that are typical for us:

C:\Users\klimenko\Desktop\HA\MicrosoftTeams-image (17).png

Although the scheme is passed through the prism of mobile development, its structure does not differ from the book one. From article to article we see the same structure that is familiar to us: everywhere between the lines there is a huge equal sign between a term and a diagram. As a result, the scheme is considered the final form of implementation of NA. But the authors of the articles cannot be blamed for this, because Uncle Bob himself signed this scheme as a CHA.

According to the definition we have derived, we can understand that “purity” is a property. It means that any architecture that we tried to oppose can be “pure”. It is enough just to be guided by certain principles.

What makes Cha expensive?

Extra interfaces

I will show this problem using the example of a monolithic structure in which modules are cut into layers.

An important point: in the diagram I do not indicate dependencies on Entity, because any component can depend on it and use it. With an indication of all the dependencies on it, the diagram would be perceived more difficult.

Now I will explain the components used, because the names of the components may differ in different projects. I hope everyone can find associations with components from their own projects.

For Domain:

  • Entity — entities characteristic of our project.

  • UseCase — business logic component.

  • Repository interface — a component necessary for applying dependency inversion and directing it from the data layer to the domain layer.

For Presentation:

  • Presenter — a component of presentation logic.

  • View interface – a component necessary to apply dependency inversion and direct it from the UI layer to the presentation layer.

For UI:

For Data:

  • Model (or DTO) – entities of the data layer, received in raw form and not yet converted into domain entities.

  • Repository — a component that manages sources and converters for generating data in the domain layer entity.

  • Converter — a component that converts models into domain layer entities.

  • DataSource — data source component.

  • API Interface – a component with which we receive models (DTO) via the network.

It is clear that maintaining such a structure costs us a lot. This is especially true for features such as “give-show,” where you just need to display some model obtained from a request. And there are quite a lot of such features in mobile development.

What can be done?

First, I suggest removing unnecessary interfaces:

We have removed interfaces for DataSource And Converter.

It can be difficult to remove unnecessary interfaces from a project, because in any project there may be “interface advocates”. Let me note right away that Robert Martin was not like that. If someone refers to his book, saying that, according to the Barbara Liskov substitution principle, we need interfaces, then know that Uncle Bob in the book uses or describes interfaces as necessary only in two cases:

  1. To protect against external implementations that do not depend on us. For us, this could be third-party libraries or the platform itself for which we are developing.

  2. For dependency inversion.

As for the very principle of substitution by Barbara Liskov, the author uses this principle in a more global sense, replacing not implementations of specific classes, but services (in mobile development, one could equally talk about substituting some modules or technologies among themselves).

But if we talk about the principle of substitution at the component level, then what arguments could there be in defense of interfaces? I will give an example DataSource.

Example 1: substitution of local and remote data sources

These sources are very contextual. They may differ from each other in the number of methods or their signature. For example, local sources should have the ability to clean them up, which would mean having an appropriate method that a remote source would not have. You can get around this problem by introducing more interfaces. One interface will be the same for both sources, and the second will be purely for cleaning. This will increase the number of components, and therefore the cost of your architecture. And all this work will be useless if you use them in code like this:

Namely, if you use the context of each source in the name of the variables:

You can claim that you have a bloody nose, you need to follow the substitution principle here, but it will be violated if you use them at the same time, indicating with the names of the variables which of the sources refers to what.

If you want to cache data, then you will have to forcefully differentiate between these sources, otherwise how will you put data from a remote source into a local source? Moreover, it is better to cache already converted entities locally, so as not to have to do multiple conversions. This means that there will be nothing to talk about the general interface. And if you want to get around this problem by using generics, this will greatly worsen the perception of the code and increase the cost of support.

Example 2: Different implementations of local sources

Substitution of data sources working with RAM and built-in memory

The advantage of RAM is obviously speed, and therefore no need to access it asynchronously. By bringing it under the asynchronous interface of the built-in storage, we lose this advantage.

We can, of course, talk about cases where, due to synchronization and multiple access to a field from different threads, we can get some significant delay, but I can’t imagine how to tie up the built-in storage in such a situation so that it does not harm the process. And the case itself for mobile development will most likely be artificial. If your project encounters problems like this, there is most likely something wrong at the conceptual level.

And in cases where RAM and built-in memory are simultaneously involved in the process, we get the situation from the previous example: when a class that uses both sources knows which source is which, which was discussed above.

We can talk about cases where the file stores small enough amounts of data to access them synchronously, but all sources are used for different purposes, due to which they cannot be interchangeable. RAM can be replaced with local memory, but not vice versa due to significantly different storage periods.

Substitution of data sources working with a file and a database

These two sources are fundamentally different in their potential. To bring them under one interface, you need to either forget about the potential of the database and not take advantage of all its advantages, or manually configure the work with the file as a database, which will be equivalent to trying to create a database manually. Whatever you say, it’s a thankless task and signals that something wrong is going on in the project.

Substitution between different database implementations or tools that work with the database

It would seem that here it is, an example in which there are no obstacles to using interfaces. But such a substitution will most likely be artificial. But in reality, it will be more likely not about substitution, but about complete replacement of implementations with each other. If we are talking about a complete replacement, then the interfaces will no longer help in this process, and may even interfere.

Example 3: Unit Tests

I think this is a really important reason to use interfaces if you have no other way to mock finalized classes to define the behavior of aggregate fields in the class under test. But we have such an opportunity in mobile development for Android, and I can only hope that everything is fine with this on other platforms as well. If not, congratulations, the interfaces are justified for you, and I sympathize with this inevitability.

Example 4: dynamic substitution

If we are talking about dynamic substitution, then the use of interfaces may be justified, but is not mandatory. Depending on the conditions, you can, of course, consider the option of substitution through interfaces, or you can implement such substitution in other ways, this is at your discretion, because situations may be different.

But even if in this case the interfaces are justified, how many such cases can there be in the project. If there are a couple of hundreds of DataSources, is it worth creating interfaces for each of them?

Extra inversions

I will show this problem using the example of a multi-module structure, in which modules will be cut by feature into the main module feature and its shared part sharedwhich can be used by other feature modules.

Let's look at a few examples of such a structure:

Interface Repository often left out of habit when moving from a monolithic project structure, where it was needed to implement dependency inversion. But, as we see, we are unable to preserve the purpose of this interface, even if we select the shared part from the data layer, let alone other options.

Let's try to understand what the Dependency Inversion Principle (DIP) is and what it is really needed for. The book about CA says that DIP is based on SAP (Stable Abstractions Principle) and SDP (Stable Dependencies Principle).

SDP, or sustainable dependency principle.

This principle tells us that dependencies should be directed towards sustainability. It is important to understand that stability is not the same as rare variability. A stable component is one that, if you change it, you cannot avoid changes in other components.

In his book, Uncle Bob identified Entities as the most resilient layer. Behind it there is already a layer of business logic, etc.:

In practice, for a feature-shared structure, changes in the data layer will always lead to the rebuilding of all dependent modules.

It turns out that in our scheme sustainability is not properly ensured and exists no more than in the minds. And the book doesn't talk about mental limitations. It's always about practical application.

How could we ensure this sustainability in practice? In the case of the shared part of the data layer, we could select an additional module:

But in this case, we will get a negative tendency towards an excessive increase in the number of modules in the project. The overpayment will be so large that it is unlikely that anyone will decide to support such a structure.

For complete happiness in this scheme, all that remains is to move UseCase into a new module and make a separate module for the UI. Then we will get the same monolithic structure, but finely chopped, and our structure will not be multi-modular, but multi-modular.

SAP, or the principle of stability of abstractions

The translation of the book says that the stability of a component is proportional to its abstractness. And in the original – that the abstractness of a component is proportional to its stability. But in the original, the author himself ultimately reduces everything to the first version that the translator used, so let’s not blame the latter.

What the author tells us about SAP in his book can be interpreted in different ways. Personally, I prefer the interpretation in which we can use abstractions to provide sufficient flexibility to manage dependencies.

And only now can we talk about DIP

This principle tells us that the most flexible systems are those in which dependencies in the source code are aimed at abstractions, rather than at specific implementations.

DIP literally echoes the definitions of SDP and SAP. The author himself says that DIP is impossible without SDP and SAP. How can we reconcile these principles? SDP is the goal. SAP is the tool. DIP is the result. Taken together, DIP is the use of SAP to provide SDP.

To put it simply, we use abstractions to expand dependencies towards stability. If this is understood this way, then DIP ceases to be a principle at all. The real principle is what underlies it and what we strive for: SDP.

As a result, we see that with the current structure we cannot properly provide DIP, which means there is no point in using SAP thoughtlessly. As for SDP, we will return to it again in the second part. In the meantime, you can safely remove the interface for all schemes.

This is where the first part of the article ends. Next time I'll be back to talk about more extreme ways to save money.

Stay in touch and see you in the next part 🙂

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *