If you can't win, automate. We simplify the routine in analytical tasks

comparison and selection service various educational courses, that is, we act as an aggregator in EdTech. For example, if a user does not know which data analytics course to choose, he can go to our website, look at the list of such courses and select the most suitable one.

We work with partners using the CPA model. That is, the user lands on our website, performs some targeted action, for example, a click, and for this we receive a commission from the partner.

To accurately track target actions, you need a tracking system that will record them. We use TUNE (formerly HasOffers). The system needs to be configured on our side and on the partner’s side. It’s easy to set it up for us, but alas, not always for partners, since events are recorded on their backend, and the partners themselves must transfer them to the tracking system. Not all partners have such technical capabilities – this requires a certain number of developers. Therefore, there is a risk that certain conversions will not be included in the tracking system.

In order for the company to avoid losing money due to incorrect calculation of conversions, it was necessary to organize a reconciliation process – to a first approximation, manually, and therefore quite labor-intensive. Every month, the partner sent us an upload with all transactions from his tracking system (as a rule, these are CRM systems); our account managers asked analysts to download all transactions from our system. Managers compared the uploads and looked for differences. When they were discovered, they went to their partner to discuss issues regarding a particular transaction. And we asked analysts to upload it to our database.

We have dozens of partners. As you might guess, the routine process took a lot of time and nerves from our accounts and analysts.

We decided to change this – using “improvised tools” we organized a simple automation process. And now the monthly task that took managers a week of work time has been reduced to several hours of work.

Automation

The main difficulty in this case is not even technical, but managerial. It was important to agree with each partner so that they regularly send us the correct table with all operations. Initially, I made a fatal mistake by saying that you can send us tables of any format: “it will be less labor-intensive for a partner, but I will still process them.” This slowed down my work significantly: there were significant differences between the partner tables, all of them needed to be put into a single format.

From a technical point of view, I structured the process this way: we receive a Google table from each partner, load it into our database, and then do a check that will highlight differences in transactions and provide the manager with a ready-made file.

Let's take a closer look at each stage.

First, we needed to load dozens of Google Sheets from our partners into the Snowflake database. As a tool for this large-scale ETL process, we chose the low code n8n platform, which had already been used in the company. Personally, I liked it due to its ease of use, user-friendly UI and an impressive list of integrations with other tools.

The platform allows you to load data incrementally, that is, add only new data, without completely scanning and rewriting the table. Plus, convert data right at the stage of filling it.

In general, the functionality of n8n fully corresponded to our task and covered our basic needs.

*This is what the data loading flow looks like. Unfortunately, due to the number of nodes (117), it was not possible to fit everything into one picture*

We started with basic data standardization—the most insidious part of the entire process. I held a meeting and wrote documentation for our managers so that they could clearly explain to their partners the rules for designing tables with data. When developing this specification, I tried to take into account all the corner cases that could appear in the tables. However, the human factor has not been canceled, so sometimes I see scary things in uploads. For example, the date in the corresponding column is inserted literally in words. Instead of 09/10/2024, the analyst sees “the tenth of September” – imagine his face at that moment!

When this happens, Snowflake writes to an error because the data type is not what is expected. And here the alert system, which can be configured in n8n, is very helpful. When an error occurs, I receive the following message in the messenger:

Standardization itself, aka data processing in n8n, is easily done through python code. Here is an example of such code:

from datetime import datetime

#Переименовываем столбцы
for item in _input.all():
    item.json['COURSE_NAME'] = item.json.pop('Название курса')
  
for item in _input.all():
    item.json['LEAD_DT'] = item.json.pop('Дата заявки')
  
#функция, которая заменяет пропуски в выручке на 0
def revenue(data):
    if not data:
        return '0'
    else:
        return data

#Применяем функцию
for item in _input.all():
    item.json['SCHOOL_REVENUE_VAT'] = revenue(item.json['SCHOOL_REVENUE_VAT'])

for item in _input.all():
    item.json['SRAVNI_REVENUE_VAT'] = revenue(item.json['SRAVNI_REVENUE_VAT'])
  
#Функция для обработки даты
def remove_time_from_date(date_time_str):
    if not date_time_str:
        return date_time_str
    
    current_year = datetime.now().year
    
    #Перебираем форматы дат
    formats = ['%d.%m.%Y %H:%M:%S', '%Y-%m-%d %H:%M:%S', '%d.%m.%Y %H:%M', '%d.%m.%Y', '%d.%m', '%d-%m-%Y', '%d.%m.%Y, %H:%M']
    
    for fmt in formats:
        try:
            #Пробуем распарсить дату
            date_time_obj = datetime.strptime(date_time_str, fmt)
            
            #Если формат без года, добавляем текущий год
            if fmt == '%d.%m':
                date_time_obj = date_time_obj.replace(year=current_year)
            
            #Возвращаем дату в формате 'YYYY-MM-DD'
            return date_time_obj.strftime('%Y-%m-%d')
        except ValueError:
            #Если текущий формат не сработал, продолжаем проверку
            continue
    
    #Если ни один формат не подошел, возвращаем исходную строку
    return date_time_str

#Применяем функцию для форматирования даты
for item in _input.all():
    item.json['LEAD_DT'] = remove_time_from_date(item.json['LEAD_DT'])     

for item in _input.all():
    item.json['SALE_DT'] = remove_time_from_date(item.json['SALE_DT'])  

#Прописываем тип курса
for item in _input.all():
    item.json['POSTBACK_TYPE'] = 'Платный'

#Прописываем модель оплаты
for item in _input.all():
    item.json['MODEL'] = 'CPS'
  
return _input.all()

n8n has a special feature: when reading a table, the program converts the table into JSON. Therefore, in the code we work with JSONs, although the data is actually presented in tabular form.

After we have downloaded all the data, we can begin the process of reconciliation between the two sources.

The ETL process from n8n to us at Snowflake runs on a daily basis to automatically analyze incoming data as quickly as possible. If new data is available, we record the date of the last check in a special table with logs so that the python script does not check the same thing every day.

The verification itself consists of taking the transaction id from our database and comparing it with the partner’s transaction id; if they differ, this is recorded in a special Excel table, which the bot sends to the messenger via the API. Moreover, if we see that there is no transaction id in our database, then the script automatically records it in our tracking system. This allows you to see all lost transactions in our reports, for example, on the charts in SuperSet.

This is what the verification logs look like:

As a result, the verification message looks like this:

After receiving such a file, managers can only forward it to the partner with the words: “Where are the transactions, Lebowski?”

Building this simple flow helped save a lot of time and nerves for our account managers, freeing up their resources for more useful business tasks.

The whole process can be represented as follows:

We plan to improve the process in the future. Receive data more frequently (weekly rather than monthly) and visualize it through BI tools. This will allow our sales people and POs to make decisions based on even more relevant data.

The automation that we eventually achieved is based on the specifics of the work of the “Education” stream). In other areas (insurance, loans, etc.), technical and organizational details of implementation may differ, taking into account the specifics of interaction with partners. But anywhere, not only in the task of comparing data from internal and external sources, two key principles from the case above can be useful.

First, instead of solving a single problem (“interaction with a specific partner”), try to solve a class of such problems at once (“interaction with all partners”).

Second, get more out of the tools you already have.

***

We will be happy to answer questions about our case in the comments. Also share your examples of automating routine processes in analytics: we are sure that almost every company has its own similar approaches and life hacks, it will be interesting to learn about them!