how to design and not get lost

Hi! My name is Dmitry. I am a solutions architect in a large Russian company, I have been designing, writing code and managing teams for over 15 years. I collaborate with Praktikum as a reviewer of the Java course and as the author of the course “Software Architecture”.

Let's say you decide to have fun with System Design, even if not voluntarily, during an interview. If the company is too lazy to share the work context, the task might be in the format of “design Twitter”. A more candidate-oriented company N might ask “design search on service N”.

Although there are quite a few articles like “how to start Twitter”, not all of them will help you navigate a real interview. In this article, I suggest digging deeper and creating a checklist, a kind of algorithm. It will be a little broader than is accepted “for Twitter”, although it will not be possible to make it universal. This scheme has helped me and helps me conduct interviews and go through them myself, although everyone has their own tricks and preferences.


I'll say right away that everything below is not an ideal architecture, but no one really needs it in an interview. It's usually more important to understand how a person thinks and whether he or she will be able to create a gem in the future in a reasonable amount of time and with the availability of information from… the company's available resources.

Beginning of the interview and introduction

Vasily came to the interview. Most likely, he has an hour for everything. About 40 minutes for the design itself, and the rest of the time – for questions about it. Sometimes questions come in the process, then the time is eaten up and may not even be enough. Ideally, he would get to the finished version with a picture and answer most of the questions that they wanted to ask him. So, the countdown has begun.

Task: design a gift purchasing service. This service should help with a situation familiar to many: before a colleague’s birthday, a person in charge is chosen, money is sent to him, he buys something, and then you have to decide what to do with the change.

Conditions: the service will be located on a large marketplacewhere you can find everything from flowers to Tesla. Any registered user can put money in the piggy bank, it will be available to everyone who chipped in. After collecting the required amount, it will be spent on selected goods. After purchases and closing the piggy bank, the change should be returned as cashback from the marketplace to the users' accounts.

All of these are business requirements. Requirements can be functional and non-functional, and they need to be clarified. Usually, the person who assigns the task puts on the “product hat” and is ready to answer questions.

Functional requirements

Vasily needs to clarify the functional requirements: how the business sees the system's operation and how the current architecture works. He should record what is there now and, ideally, sketch out a high-level architecture, that is, a general outline of the future system.

Rushing to draw “how it will be” right away is not a very good option. Most likely, it will not turn out to be what the “product” wanted, and some of the requirements will be lost.

He should ask questions, and if they can't answer something, fantasize out loud. It is very important when designing to lead the meeting and ask questions, and not wait for the interviewer to do it. Questions that can be asked:

  • What other services are there?

  • How many DAU supposed?

  • What are your development plans?

  • What are the deadlines for implementation?

And any others that come to mind in the business context.

Updated data: The project has billing, user service, product catalog, search, logistics service, integration with partners, shopping cart, cashback and promotions service, analytics, and the website itself.

Currently, the daily audience averages 200,000 unique users, and this number will only grow. Total users — 1 million. There are no forecasts for how many gifts will be purchased. User surveys have shown that the feature will be in demand. We would like to release it in production for some users in the next quarter.

Now we need to make a diagram and display the data in it. This will make it easier for Vasily to navigate the landscape and ask further questions. There is no need to do it very beautifully: a sketch is enough to synchronize and confirm that he and the “product manager” have understood each other. There are cases when a candidate draws arrows for a long time, but does not have time to finish everything else – this should not be done.

At the same time, no one expects it to display the entire context correctly and to the last detail. For example, the product manager deliberately skipped the bank and warehouses as not particularly important for the current task. And also added unnecessary information to the context, such as integrations with partners and search.

High-level service architecture

High-level service architecture

Non-functional requirements

Now it is necessary to find out the details of the non-functional requirements for the service. The interviewer puts on the “CTO/Architect/Manager's hat” and is ready to answer questions. Vasily may ask:

  • Is there any data on the current RPS?

  • What are the response time requirements?

  • What are the reliability requirements?

Updated data: The current RPS for the service is 1000, the response time of the new service should not exceed 200 ms, and the reliability should be 99.9.

All this needs to be recorded.

Time for getting acquainted, receiving the task and archeology for both types of requirements is ≈15 minutes.

API and Integrations

At this stage, someone starts to assemble a diagram of the architecture “as it will be”. It's too early. It won't take long to draw, but first it's better to go over the integrations and API.

Based on the task, the user should be able to perform the following actions with the piggy bank:

  • create,

  • close,

  • edit,

  • get link,

  • get one piggy bank or list.

To create a piggy bank you will need: a name, description, deadline, picture.

To edit, use the same data as when creating, plus an identifier.

To close, get a link and get a specific piggy bank or list – identifier.

Vasily figured out the data for the API, it seems that everything is sufficient for now and nothing contradicts the requirements.

An equally important point is to choose the interaction protocol (the method here is obviously synchronous, which can be called asynchronously from JS, for example). It is chosen from text and binary.

Here Vasily should ask the split personality of the “product-STO” what is currently used in the marketplace. Well, it's strange to drag gRPC if there is no strict need, and now everywhere there is the usual REST style with JSON.

Once he knows all the data, he weighs the pros and cons, and explains his choice out loud. In 99% of cases, HTTP with REST style and JSON for change will do, but it is better to clearly state this point. An example of an API description:

/code
GET /giftmoneybox  — получить список согласно авторизации пользователя
GET /giftmoneybox/{giftmoneybox_id} — получить конкретную копилку
POST /giftmoneybox — создать копилку
application/json
{
      “title”,
      “description”,
      “edge_date”,
      “image_link”  
}
/code

Vasily can explain, for example, why he has a link to an image instead of base64 (most likely, he will be asked anyway, but it is better to take the role of “presenter”). The thing is that the service is unlikely to store many images in the database, and in general it is not very good to store them there. Most likely, there will still be external storage, so it is more logical to use a link. The limit on the image can be set to 1 MB.

Will there be any more integrations here? It doesn't seem to be in sight yet.

API time: 5-10 minutes.

Data life cycle

It's still too early to draw, while it makes sense to talk about data. It's better to start with fixing the list of service data. In the context of the task, we are not particularly interested in user services, baskets and the like – they just exist. Or they will already be finalized during the launch process. Although a good architect should think about them too, but for now Vasily is at an interview 🙂

Service data:

  • id — identifier, for example numeric;

  • title — name, just text, 150 characters;

  • description — description, 1024 characters;

  • edge_date — edge date, timestamp;

  • image_link — link to the image, 256 characters;

  • status — closed or open, bool.

Connections: since the user can enter different piggy banks of other users or create several of his own, the standard “many-to-many” will do here. What to choose for storage? It seems that a regular relational and a couple of tables will solve the problem. Here again comes the moment of – well, you get the idea – choice.

Vasily chose PostgreSQL as a database management system. To do this, he recalled various products and evaluated them according to the criteria:

  • how it behaves under load,

  • license,

  • resources,

  • scalability,

  • technical policy of the company.

He explained that PostgreSQL can be replicated quite easily, but sharding is a bit more difficult. It is an open-source product, and it is not particularly demanding on resources (well, it depends), unless you fill one server with terabytes.

The interviewer will tell you how well the product fits the company's policy, if he hasn't said so before, and may ask you to choose something else. After that, he may ask several questions. For example:

  • What will happen to the indexes for the data?

  • What indexing algorithm will be used?

  • When and by what key will you shard?

Here, everything depends on Vasily's knowledge and the depth of the interviewer's digging. For example, if there is a lot of data, it makes sense to shard by identifier. By name – not unique, by date there may be a bias, the rest is very specific.

Tables in the database

Tables in the database

Time for data with questions is ≈10 minutes.

Architecture diagram

The finest hour has come — now it is time to draw the architecture scheme “as it will be”. Now Vasily draws the service, which creates all the surrounding binding: storage for images, DB and everything else that he deems necessary. In the process, he explains what exactly he does and why.

For example, he draws an S3-like storage and says: “Great for storing images, has an API, …”. And he says that he chose MinIO because it has an open license, it can be easily deployed internally and scaled. It is very good if he talks about fault tolerance right in the process or right after.

Vasily learned reliability indicators at the very beginning, so he can safely place services in data centers, three or more. Along the way, he can explain why exactly three data centers, how switching will occur in the event of an accident. If there is a lot of data, he can immediately talk about geosharding if clients are spread out across geography, and how best to choose a CDN.

Gradually, a complete scheme is being assembled, about which the interviewer will have more questions. Now it has a service (several copies), a cache cluster for “hot” piggy banks, a DB cluster, and a MinIO cluster.

Authentication can also be safely added and just tell that by the user's ID you can get his piggy banks, by the piggy bank ID – all the information about them. And all this can be transferred to the basket when choosing a payment method. And the service will be located, of course, in Kubernetes, if it exists 🙂

Architecture diagram

Architecture diagram “as it will be”

Time for application is ≈15 minutes.

Resource calculation and finishing touches

At the very end, Vasily should calculate how much space will be needed with a reserve for 3-5 years. If the growth potential and the number of users are not clearly indicated, there is a reason to fantasize: let the audience grow by 20% per year, and 30% of it will use the piggy bank.

Let's roughly count the number of piggy banks:

1 year: 1,000,000 (users) * 30% (how much will be used) = 300,000

Year 2: 1,200,000 (users) * 30% (how much will be used) = 360,000

Year 3: 1,440,000 (users) * 30% (how much will be used) = 432,000

Total piggy banks for 3 years: 1,092,000

Total images: 1,092,000 MB (1 MB per image, 1 image per piggy bank)

To calculate the data, Vasily will assume that there are 10 users per piggy bank on average. The data volume in gitbox:

  • id — 8 bytes,

  • title — 150 bytes,

  • description — 1024 bytes,

  • edge_date — 8 bytes,

  • image_link — 256 bytes,

  • status —1 byte.

Total: 8 bytes + 150 bytes + 1024 bytes + 8 bytes + 256 bytes + 1 byte = 1447 bytes

In gitbox_users, given an average of 10 users: 8 bytes + 8 bytes = 16 bytes *10 = 160 bytes

Total for one piggy bank: 1447 bytes + 160 bytes = 1.57 KB

Final calculation for three years: 1,092,000 * 1.57 = 1.6343258321285248 GB — that's how much space future piggy banks will need. Not much, you can do without shards for now.


As an exotic idea, Vasily can think about how many processors will be needed. For example, make a very rough assumption that now out of 1000 RPS of the main service, 300 RPS (30%) will go to the piggy banks. Each request to the main service is 200 ms, and here let it be no more than 30 ms.

Formula for calculation: RPS=X∗(1/(TD/1000)

Where X is the number of cores, TD is the request processing time (ms), 1000 is the number of milliseconds in a second.

300 = ? kernels * (1/(30ms/1000))

Final calculation: X = 300/(1/(30ms/1000)) = 9 cores

The final touches are left: monitoring, logs and traces, so that everything looks really nice. Vasily adds them to the diagram, names the technologies: for example, Prometheus, Grahana for monitoring and graphs, ELK Stack for logs and Jaeger for tracing.

Time is all that is left.

Now the scheme looks complete. Vasily shakes hands with all the “hats” and waits for feedback.

Final checklist

▢ Get the task: a mixture of incomplete functional and non-functional requirements.

▢ Ask “business” questions that will help you understand the functional requirements.

▢ Create a high-level architecture, a sketch of the current state.

▢ Clarify non-functional requirements.

▢ Work out contracts and integrations.

▢ Process the data.

▢ Work out application architecture and fault tolerance.

▢ Consider infrastructure resources.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *