Proper data architecture from the first sprints

Perhaps there is not a single large IT company that has not suffered from a huge and clumsy legacy, consisting of closely intertwined solutions. Each solution is connected to several others, and adding one new, seemingly insignificant, feature requires changing something in each. Separate solutions are so outdated that they, in principle, cannot provide the new required functionality, and they need to be completely rewritten or replaced with a modern solution – and, accordingly, it should also be built into the entire architecture. This makes new launches difficult and long.

Sometimes the amount of technical debt turns out to be so large that separate projects are launched to eliminate it, designed for months and years – and this is comparable to 5-10 time-to-market new solutions. Accordingly, all new launches are delayed until the Augean stables are cleared, painted and rebuilt. A separate spectacle is the CTO that convinces the CEO and a good half of the board, as well as shareholders, to spend several thousand man-hours of precious developers on something that, upon completion, will not help the business immediately, but will only accelerate for an unknown time the benefits of new products in the future. True, after the launch of each of these products will be delayed for several months.

At the moment, company leaders have to go through many difficult discussions, time estimates and prioritizations. Strategically, the neglect of technical debt and lack of vision in the creation of IT architecture can result in the fact that a smaller and more flexible competitor will bypass the company in the market, as it will be able to quickly enter the market with a promising product.

Digging into the details, you can find a large set of specific problems that come from the short-sighted development of IT architecture. Today, I would like to focus on two specific problems that can be largely avoided by spending literally two weeks at the dawn of a new IT architecture.

  1. Problem 1. Your architecture is built on microservices that communicate. The same data can be stored in multiple places, but none of these places is designated “the single source of truth”. As a result of the work of various processes in the company, data in different places changes, is not synchronized, and therefore begins to contradict each other. Without a hierarchy of data sources, it is not clear which data is the most relevant. A striking example is meta data about counterparties (address, decision maker, details, TIN, bank account number, etc.), which do not change very often, but are used in several places: in accounting, in operations, in sales. The contacts of the counterparty have changed once in each place – and now it is no longer clear how it is now possible to really contact the counterparty in order, for example, to sell something to him.

  2. Problem 2: Your architecture consists of microservices communicating via APIs, and many microservices are connected to many. You needed to add a new feature, but existing APIs and services do not provide fields for new data. As a result, you have to finish most of the entire IT landscape in order to launch a new product.

Recently, I had the opportunity to lead the launch of a new vertical in the business, for which I had to create several new products. Having hit the ground running with legacy in the old verticals, we’ve done an exercise that I hope will make the two problems above not as big as they could be. Below, I share an approach that astute products and developers (actually, business leaders) can also keep themselves safe.


  • You are about to launch several new products that will rely on the same set of services.

  • You don’t know exactly what each product will be – at the current stage, there are only assumptions about what products will need to be able to do and which ones will be in demand.

  • However, do you have a rough idea that new products exactly should be able to do. For example, to collect an event history for analytics, calculate the cost of services and store its details (billing), generate documents for mutual settlements and reconciliations, enrich CRM for the work of a commercial team, and so on.

  • You may also have a set of “big” business processes that you are likely to build new products into. That is, you will not write all the modules from scratch, you will try to use existing processes and commands with their usual software. You are also interested in the fact that there are many similarities between the old and new processes in places.

Let’s imagine that you started building features and products sequentially, and a few months later you were faced with the two problems above. What could help you avoid these problems?


  1. Determine the circle of people who will actually be responsible for building a new IT landscape. As a rule, this is a product / SRO and tech. lead / service station. If, as in my case, some of the existing “big business” modules are used, then this circle should include leads for the corresponding modules. They must understand, on the one hand, how the modules work now, and on the other hand, where the development of modules is going based on the tasks of a “big” business (at least approximately).

  2. Determine the circle of people who will use the solutions being created, or have a good idea of ​​the potential needs of customers, and which of them are more likely to be important (for example, “competitors have feature X, this is the market norm and customers actively use it. We don’t have ideas for -better, we will have to do X ourselves, otherwise we will not convince you to choose us, and not competitors.”). This could be sales, marketing, operations, accounting, analytics, etc.

  3. Sketch out a set of products that are likely to be launched and/or developed. Not all of them will take root in the future or go to the market exactly in the form that you now imagine. But that’s okay – at some point you’ll probably need to put together an MVP anyway to test the viability of your idea. And there is still no better information.

  4. Conduct a round of conversations with representatives of each function from point 2, trying to answer the following questions

    1. What processes / activities / work in general are we likely to do on the side of this function to work with / support the ABC products that we currently have in mind?

    2. What data will be needed for these processes / activities / work?

    3. How detailed should they be?

    4. How often should this data be updated? Every second / minute / hour / day / month?

    5. How often will you need to access this data in the course of work? That is, how much load should be taken into account?

  5. By collecting the answers to the questions above, you will have some idea of ​​​​future processes and the necessary data. The next step is to draw an approximate architecture by the product and technical team, which should answer the following questions:

    1. What modules should the future architecture consist of?

    2. What data should be available to each module?

    3. Where should this data be stored and, most importantly, which place should be the “place of truth”, that is, containing the most correct and up-to-date information?

    4. How can modules be linked (eg via an API?)? That is, what information and on what routes should they exchange?

  6. The architecture drawn in the previous step is the first draft. Before you focus on it, you need to check it with “customers”. To do this, you need to show the architecture, and speak it. For example, in this format:

    1. Here are the actions that you will need to do – we wrote them down in the previous step. And here is the data that is needed for the processes.

    2. Here’s how each of these activities will play out in the architecture we’ve sketched out. This is where the data will “live”, this is how and where it will be forwarded, this is how the relevance and correctness of this data will be ensured. Here is how we will provide the required bandwidth (if this is an important issue).

    3. Questions for customers:

      1. Does everything seem reasonable and appropriate to you?

      2. Are there new requirements that we should take into account additionally? For example, did you not remember them the first time, or did you learn something new between steps 4 and 6? If there are, then you need to repeat steps 5 and 6 pointwise.

  7. Having collected feedback, you will make sure that you have not missed anything critical, and have taken into account everything important for business development in the near future. You now have, for example, a schema in Miro, which should be used like this:

    1. During development, when planning the next sprint, you need to correlate the upcoming tasks with the architecture and build modules, repositories, APIs and interfaces in accordance with it. This way you will ensure that you have not forgotten to throw in the cherished “pen”, which was suddenly needed for a new product.

    2. At the same step, you need to check that you do not forget to observe the “single places of truth.” That is, for example, if you are writing a support interface that allows you to update customer data, then you need to remember to include in the sprint a piece that ensures customer data is updated in the corresponding “place of truth”.

    3. As the project team gains experience, its ideas about the market, its needs, and therefore – about future products, will change. This will mean that the target architecture will also change. Therefore, once every few weeks or months, steps 3-6 should be repeated, and thus maintain the relevance of the target image. The CRO or product manager seems to me to be a reasonable leader in this process. The good news is that the second and subsequent iterations of steps 3-6 will require a fraction of the effort. Perhaps a day or a few hours will suffice.

Similar Posts

Leave a Reply