How We Visualized 150+ Pages of DS Solution Documentation

Why did we decide to visualize DS solutions?

Our team at Kontur has been automating technical support processes using Data Science for 6 years. During this time, we have created more than 10 large-scale DS solutions, which are described on 150+ pages of documentation.

It is very difficult to keep such a volume of information in your head, and it is unrealistic to quickly immerse new team members in it. Therefore, we thought about a convenient way to present information.

The solution was to visualize DS solutions on a diagram, where everything is collected in one place:

  • technical details of DS solution

  • scenarios in which the solution works

  • interaction with the backend

  • relationships between models and data

  • interaction between different solutions

What they were doing

1. Collection of information

It seemed like a simple task – documentation for each DS solution exists, you just need to represent it in a different form. But we quickly found out that documentation of the DS part of the solution is not enough to understand the scenario in which the model is used and to understand how the backend interacts with our DS solution.

So we went to talk to other roles – analysts, developers, managers. Discussing scenarios with people of different roles is useful because a person's view of the solution differs depending on their role.

2. Visualization

We started with drawings on a piece of paper.

Once we had an understanding of how the diagram would look, we moved on to implementing the diagram in a convenient tool.

We have infrastructure services that allow a data scientist to solve a typical task from research to delivery of a solution to production without involving a developer. For example, we have services for:

  • ML model hosting

  • storing vectors and quickly searching them

  • combining models, vector spaces and other entities into a single algorithm

  • data storage for each project

Therefore, it is enough to indicate the name and version of the model in our hosting service on the diagram – and any data scientist understands what we are talking about and where to look for the deployed model.

3. Review

Not only data scientists participated in the review of the scheme. Thus, the developers validated the accuracy of the display of the backend interaction with DS solutions. The development manager and analysts checked that the DS solution application scenarios were correctly recorded and corresponded to the clients' business scenarios.

How useful such a comprehensive review was can be estimated by the number of comments on the diagram. An additional benefit is that other roles also began to better understand the structure of our solutions.

This is what the comment board looked like during the review. We'll look at what's what in more detail later.

What is on the diagram?

For each solution, information is displayed:

  • diagram and text explanation of the user scenario with links to documentation

  • paths to data used in the solution

  • models and vector spaces with links to the hosting and code in git that they were obtained from

  • pre-trained models that are used in the solution

  • logic of interaction between parts of the solution

The solutions in the diagram are divided into projects that create solutions for different areas of technical support. These names will be encountered further in the illustrations, so let's get acquainted briefly: Sirena works with chats, Naiad with letters, Merlin with calls, and Turbo simplifies the life of Kontur's internal technical support. The DS solutions of these projects route messages to the right employees, determine customer intentions, and help consultants answer customer questions.

The entry point on the diagram is the scenarios that are in the center of the entire diagram, for example, “Call routing by topics”. From the central part, you can go left, towards the business (visualization of communication by the client). Or to the right, towards the DS description of the solution.

The place of the DS solution in the user scenario

Mapping the business part is an important part of the diagram, as data scientists need to know which scenarios are covered by a particular solution. Without this, you may run into a problem: the problem is solved, but the solution is not applicable to the business scenario.

Interaction between DS solution and backend

A schematic description of the backend is needed to display the model contracts (what is fed into the model, what comes out of it), as well as the pre-processing and post-processing of data before and after the model runs.

This “cheat sheet” helps data scientists check that they fit into the existing system when refining and creating new solutions. Thanks to this information on the diagram, there is no need to check with the developers every time, study the code or keep everything in your head.

Relationships between models and data

An important and interesting aspect was to show the relationships between machine learning models and training data in different tasks. For example, several tasks use the same base model as a starting point for training, but the data in them is different.

Recording such things can be useful. For example, to understand that a model initially trained on chat data can work well in emails. Or to notice that one model is used in several tasks with the risk of performance loss, since several services access this model at once. And, therefore, when retraining this model for one task, it is important to check that the quality on another task has not dropped.

Technical details of the DS solution

It is often useful for a data scientist to see how a task is solved from a technical point of view: what model architectures were used, what data was used for training. This way, knowledge about the applicability of different methods is stored in his head. In addition, such information can help in solving new problems: the data scientist will have hints for choosing the right solution architecture.

We record information about all our experiments, but the information is stored separately — the experiment code is in several repositories of our team in gitlab, on wiki the results are described in different sections. The scheme allowed us to show a complete picture of all the team's decisions in one place. And thanks to links to the model training code, you can quickly go to git and study the decisions made.

Results

The implementation took about a week of work, which stretched to a couple of months, as the creation of the scheme was a lower priority than the business tasks. A few more days were spent on preliminary consultations with other roles and reviews. But it was time well spent.

Visualizing all solutions on a single diagram will help:

  1. See the details of interactions between solutions to better manage risks.

  2. Spend less effort to remember what we did before.

  3. Share the context with future generations so that it doesn't leave the team along with people. Although the information is recorded in extensive documentation on the wiki, reading it is not the easiest way for a newbie to grasp all the interrelations of scenarios and tasks.

If you have your own methods for storing large amounts of information about solutions, then write them in the comments – we'll discuss them.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *