Receive seven times, cache once

In this article, Alexey Aksyanov, lead of the analytics department, will use a real case to look at how to organize data caching in applications. The following is his author's column:

I have been working as an analyst for more than 10 years, and it just so happens that developers often expect more “business” solutions from me, preferring to independently work out the technical component of the solution. In this article I want to show that analysts can and should delve into technology. And as an example, we will analyze a technology such as data caching. As the story progresses, I will try to show the areas of responsibility of a business, systems analyst and developer.

A bit of boring theory

Caching is a mechanism for temporary online data storage between the client and server. It allows us to temporarily store the data we need in a buffer for use in certain situations. The technical implementation may be different, we’ll talk about this a little further.

Benefits of using caching:

Accelerate data acquisition. We do not need to contact the server or database every time for the same type or rarely changing data.
Reducing the load on the master system. This is a useful corollary to the first point. Fewer requests to the server means fewer calls to the database, less traffic in general. This is especially useful if we do not have the most powerful hardware or have strict non-functional system performance requirements.
Possibility of additional enrichment and conversion of data. This is a very useful feature that often helps to circumvent some of the technical limitations of the current implementation. I’ll talk about this in more detail later during the case analysis.

Caching also improves the user experience, I’ll talk about this in a little more detail.

Firstly, we have ample opportunities to work offline or with a slow Internet. For Russia, where fast mobile Internet is available only in large cities, this is very important.

Secondly, we can perform background data refresh for some scenarios. Users may not even notice these processes, and slow Internet will no longer be such a serious problem.

Finally, you can implement instant data retrieval in the most common use cases. We store data on the device, so we don’t go to the server and we delight users with response speed.

The technology also has disadvantages:

Expiration of data in the cache. This is a fundamental flaw. Irrelevant data can cause a lot of trouble, so it won’t be possible to cache everything, and it will be especially important for the analyst, based on business requirements, to clearly determine what we cache and what we don’t.
Complicating the logic of request processing. The analyst will have to describe more detailed use cases, the developer will have to write more lines of code, and then all this must be tested and implemented. However, if we want to make a really cool product, we can’t do without it.
Using additional memory on the device or server. Cached data needs to be stored somewhere, but now, with the availability of various fast and inexpensive memory technologies, this drawback is not very critical.

For ease of understanding, I would highlight several options for cache classification.

First, by storage location, to understand:

on the client (local database, RAM, built-in mechanisms)
on the server (Redis)
combined (server+client)

According to storage duration. To simplify perception, I will name more “human” terms:

“Short” is relevant within the user’s screen or session.
“Long” – saved outside the user session or even after deleting the application

Caching and analytics

We have dealt with the theoretical part, but the main question remains: what is required from the analyst to describe the requirements and set tasks for this technology? The answer is simple: nothing special, just do your job well 🙂

The business analyst collects and formalizes the requirements for the feature:

find out the need and list of functions for offline mode
find out the need to cache some data
agree on the format of the article (one article for all scenarios or a section about caching in certain articles)
describe the conditions for caching, de-actualizing and deleting the cache for certain scenarios of working with the system

The systems analyst highlights the technical aspects:

supplement from a technical point of view and indicate which scripts and queries will be cached
select cache options for each scenario
add to the original article from BA

Developer:

choose the optimal technical method for implementing the cache
implement caching according to requirements

Also at the intersection of the responsibilities of a business and systems analyst, one can highlight the joint development of non-functional requirements for the system. This will make it possible to understand, for example, whether it is possible, through the use of caching technology, to tighten the requirements for system performance (reduce the acceptable response time).

Let's analyze the case

In this part of the article, I will share a case of using caching from my practical project experience.

Let's start with the starting conditions.

We are developing a mobile application for bank clients. We came to the project in the active development phase and received the following introductory notes:

The application must work in slow Internet conditions
The bank's middleware has already been implemented, we can influence it to a limited extent
the interface is already rendered, we can influence it to a limited extent
The primary implementation of caching already exists, no analytics have been carried out on it, it was made by the developers at their own discretion

Now let's talk more about the current implementation. The interface looks something like this:

The main screen displays all the client’s banking products, divided into categories (“Cards”, “Accounts”, “Deposits”, “Loans”). In general, a fairly standard picture for any banking application. When you tap on a product, a detail about it with extended information opens. The main “trick” from the designers on this screen is the carousel, which allows you to scroll through all the client’s products with horizontal swipes.

API for receiving client products (using cards as an example):

GET /cards – we get a list of all cards with minimal information sufficient to be displayed on the main screen:

GET /cards/{cardId}/details – detailed information on the card with expanded attribute composition (limits, details, etc.):

Taking a look at the current caching implementation

Implemented separate cache for /cards and /details requests. Individual triggers have been configured to deactivate each cache.

First, let’s coordinate with the team and record in analytics these same triggers when the query cache becomes irrelevant:

termination of a user session (including when the user is inactive, due to timeout)
logout or forcefully shutting down the application. Returning to the classification at the beginning of the article, the cache of these queries can be considered “short”, since the data is tied to a specific user and can be updated frequently
successful completion of a card transaction (payment, transfer)
successful change of card settings (renaming, changing limits)

Next, we will analyze the pros and cons of such an implementation.

Advantages:

fast data loading on the main screen (easy query + cache)
the simplest logic. The problem was solved virtually head-on.

Flaws:

When you first go to the detail section, you always have to request /details
The product carousel in the detail section works slowly after entering the application, since for each swipe a request for data for the detail section is requested
data inconsistency when one of the queries fails. This is an extremely undesirable story that greatly spoils the user's impression of the product.

Let's try another option

We speed up the loading of details – for both screens we take data from the enriched query /cards. The /details request is used only for subsequent screens (settings, details). We force both caches to be updated when at least one of them is updated.

To implement this idea, it took a long time to persuade the middle team to finalize the service layer. And after implementation in the test, it seemed that the solution was quite good. After all, we got an instant display of data in detail and a perfectly working carousel.

However, not everything turned out to be so perfect, because the volume of data in the /cards request increased, and this began to have a negative impact on the loading speed of the main screen. And if the vast majority of clients (who have a maximum of 2-3 banking products) would not feel this difference, then for VIP users (who may have dozens of different products) this would definitely become a critical problem.

Therefore, the proposed solution did not go further than the test circuit, and I went off to come up with the next implementation option.

The final solution is a merged cache

A conversation with a developer about the technical implementation of caching on mobile devices prompted me to come up with a new solution. And from him I heard a remark that “in fact, the cache is a regular json file that is stored on the device and updated according to certain triggers.”

What if we make the structure of this cache in the format we need and learn to fill it from 2 different requests according to the given logic? On the device, the list (cache) of cards will have a general query structure /cards, but the attribute composition inside each array element will correspond to the query structure /details. This results in a structure like this:

Then all that was left was to describe the logic for filling and updating the new data structure and submit the task for development.

The logic turned out like this:

Data can be updated from any query in any order. When updating, we try to overwrite the old data in the model, but for some of its elements you can specify special conditions
On the main screen we call the request /cards, on the details – in the background we call /details. To make the carousel work quickly in the background, we proactively trigger a request to receive detailed information on the previous and next products.
We add skeletons to some blocks in the detail, the data for which is loaded in the background. Here we had to ask the designers to update the layouts a little so that the analytics and implementation matched the design.
Triggers for cache deactivation do not change

Implementation and testing showed excellent performance of this model. Users received fast loading of the main screen, instant display of basic information about the product in detail, as well as fast operation of the carousel. But we did not need to modify the service layer, and the solution itself turned out to be more stable, since there was almost always information for displaying the main screen and details.

Results

As a result, I would like to draw a single simple conclusion – do not hesitate to go beyond the current task and think in a formulaic way! It is the creative approach and insight that distinguishes high-level specialists from everyone else.

Author: Alexey Aksyanov, head of the analytics department at Technocracy

Also subscribe to our telegram channel “Voice of Technocracy”. Every morning we publish a news digest from the world of IT, and in the evenings we share interesting and useful articles.

Receive seven times, cache once

A bit of boring theory

Caching and analytics

Let's analyze the case

Taking a look at the current caching implementation

Let's try another option

The final solution is a merged cache

Results

Linux kernel heap quarantine

Description of combinational circuits without truth tables

Parsing HeadHunter API with R

Creating a digital elevation model (DEM) from open data

Connecting and Using Impinj R420 with Python (without Speedway Connect)

Burnout Prediction Using Interpretable Machine Learning Method

Leave a Reply Cancel reply

A bit of boring theory

Caching and analytics

Let's analyze the case

Taking a look at the current caching implementation

Let's try another option

The final solution is a merged cache

Results

Similar Posts

Leave a Reply Cancel reply