How we built end-to-end analytics and tamed business requirements

Hello! This is Daniil, a data engineer at iSpring. We have been creating tools for corporate training for 23 years. In this article, I will tell you how and why we decided to implement end-to-end analytics in the company, what difficulties we encountered, and how we overcame business requirements.

End-to-end analytics touches engineering, marketing, and analytics, so I've put together a small glossary of terms that I'll use in the article. It will help you immediately understand the meaning of special words and not get distracted by searching for them while reading.

Glossary

End-to-end analytics is a method of analyzing marketing effectiveness by collecting data on the customer's interaction with the business. It allows you to build a chain of events that the customer has completed from the moment of first contact to purchase, so that you can understand what path the customer took and what prompted him to make a purchase.
An attribution model is a method of distributing conversion weights according to a given algorithm across a set of events. Conversion determines how valuable a specific event was for achieving a goal. For example, how much a visit to a website page influenced the decision to purchase a product.
A site visit (Visit) is one of the events in end-to-end analytics that reflects a visit to a web page.
Visitor log is the entire history of user visits to website pages.
Lead is a potential client who has left their contact information.
Lead Event — an event that a potential client performs. It results in a lead. For example, contacting an online chat, sending a request for a trial version of a product, calling a sales manager, etc.
Trial is a trial version of a product with limited functionality and period of use.
Data Warehouse (DWH) is a centralized data storage system that collects information from various source systems.
Data Mart is a subset of information in a data warehouse that describes a specific subject area.

What is included in end-to-end analytics

End-to-end analytics is a system that collects and combines data from different sources, such as CRM, advertising accounts, website visit trackers, and other internal and external systems. It helps not only track the customer's path to purchase, but also understand which channels and touchpoints work most effectively in order to better manage the budget and increase conversion.

The key to end-to-end analytics is to correctly build a chain of events and calculate attribution. In addition to online events, such as clicks on a website, offline activities (such as calls or meetings) must be taken into account. To do this, data from these activities must be digitized to reflect events along the customer’s journey.

To store and process such data, a high-quality data warehouse (DWH) is required, which collects information from various data sources and ensures high speed of query processing. Our company already had such a data warehouse.

How iSpring Analyzed Performance Before End-to-End Analytics

Before DWH and end-to-end analytics, marketing analysis was based on Google Analytics, Yandex Metrica, and the history of website visits collected by the tracker we wrote. All this diversity had a big problem: three sources of truth about the same thing, which show approximately the same dynamics. The data from our own tracker could theoretically be linked to other types of events, since almost all the necessary information was stored in our CRM system. But a direct SQL query to the CRM turned out to be very heavy. The database had difficulty executing this query or did not execute it at all. Therefore, we abandoned the idea of building end-to-end analytics without DWH.

At this stage, we limited ourselves to analytics on the first touch (First Click attribution model). This model evaluates the effectiveness of events when a customer gets acquainted with a product, and ignores all other events. The disadvantage of this model is that we do not evaluate events that occurred in the middle and at the end of the customer’s journey, and therefore we cannot say anything about their effectiveness (what attribution models exist, you can see Here). Plus, for the model we used, we did not break down the chain of all customer events into conversion units, which means that all conversion went to the very first event that started the customer's journey. This is the imperfection of the model. Let's look at an example of how it looked:

The picture shows the customer's path, which included various site visits, lead events and three payments. The first entry of the customer was from the direct channel: the person pasted the link to the page into the search bar. The second time, he entered the site from advertising (cpc), took a trial version of the product (trial) and after some time made a purchase (payment 1). The company received the money, but the entire contribution to attracting the client (100%) was attributed to the direct channel. Marketers are unhappy, since the statistics ignored the contribution of advertising, CPC (Cost Per Click) channel.

After some time, the client visited the site again, clicked on the ad (cpc) and created a lead magnet (for example, left their contact information in exchange for a promo code / gift / access to closed materials, etc.). After this event, the second payment occurred (payment 2). And we again counted the entire conversion in the first direct channel. Marketers are unhappy again.

And the third payment: the client visited the site from an email newsletter (email). After some time, he visited the site again via a link (direct). Then he attended a conference, talked to our managers (offline lead event) and made a purchase (payment 3). And again, all the conversion went to the direct channel. Here, not only marketers are unhappy, but also sales managers who talked to the client at the conference, and we did not take their contribution into account.

The First Click model does not reflect the real contribution of marketing to customer acquisition, so it is impossible to evaluate the effectiveness of marketing activities and manage them based on numbers.

How we started building end-to-end analytics

Since we decided to build end-to-end analytics (we started calling it “draft” within the team), we also decided to implement a new attribution model. The first-touch model didn't suit us very well, and besides, we have a Data Warehouse, in which we can implement fairly complex business logic and process a large volume of data. And instead of the first-touch attribution model, we decided to implement a U-shaped model. Unlike the first model, U-shaped distributes conversion across all events. This is implemented according to the following rules:

The first and last events in a conversion are each assigned 40% of the conversion credit.
The remaining 20% is distributed evenly across all other events.
If there are only two events, then we assign 50% conversion to each.

If we take the same chain of events that was considered in the example of the First Click model, then here is how the conversion weights change by model:

Now we assign a contribution to customer acquisition to each event. Everyone is happy – both marketers and sales managers, although the degree of satisfaction depends on how the weights are distributed.

At the design stage of the first version of end-to-end analytics, we decided to limit ourselves to a set of three types of events: site entry, lead event, and payment. We built the following chain of sequences:

A person gets to a website where he/she gets acquainted with the product. At this stage, a visitor's visit log is collected. What pages of the website he/she visits, what advertisement he/she made the transition from, what country the visit was made from, etc.
After getting acquainted with the product, a person takes a trial version of the product or, for example, leaves contact information so that we can contact him. Here a lead event is created, i.e. the site visitor is converted into a potential buyer.
After studying the trial version or talking to the sales manager, the potential buyer makes a purchase. And we record this moment as a completed customer journey and consider it a successful conversion.

The chain of events for the customer’s path was built as follows: “website visit → lead event → payment”, although in reality everything turned out to be not quite like that, but more on that later.

We took the payment per conversion unit and distributed it as follows:

If this is the first payment, we distribute the weight across events from the very first in the customer’s journey to the last event preceding the payment.
If this is a subsequent payment, then we distribute the weight across all events between the previous payment and the current one, considering the previous payment to be the first event in the conversion. It was important for us to understand what contribution the purchase of one product makes to the purchase of another product.

As a result, the attribution model looked like this:

It took us about a month to implement the first version of the data mart for end-to-end analytics. During this time, we:

migrated data that was missing from DWH;
created the showcase itself and wrote a pipeline for updating it;
wrote business logic and covered it with tests;
We gave the finished showcase with data to analysts, who had already put together a dashboard for the customers.

When the data showcase and the logic for building the customer journey were ready, we went to test and see what we had. And already on real data, we discovered that we had not taken into account some points in our model.

The first thing we noticed when checking the model was that there were conversions that had no events other than a payment, or that only had one of the two event types (no information about a site visit or lead event). Diving into the data, we dug up a few cases.

Case 1 – No data

The very first and most obvious case is that there really is no data. In the source systems, we found no information about the client and his actions before the purchase (here we took a deep breath and said: “Pu-pu-pu”).

Since there is no information, we decided to generate a dummy one. We added a check for the number of site visits and lead events for each conversion to the conversion calculation logic. If some entity is not in the conversion, then a dummy entity is added, we called them No Weblog And No Lead Event accordingly, conversions were also counted on them. The distribution of conversion weights by fictitious events was done in order to highlight the problem of lack of information in end-to-end analytics. So that everyone would know that we know nothing.

Case 2 – Offline Lead Events

We regularly participate in various conferences and exhibitions on the topic of corporate training and LMS (Learning Management System, corporate training system), plus we organize and conduct similar events ourselves (for example, the conference iSpring Days). Lead events from such offline events are loaded into the CRM as a list from an Excel file. In the data, this looks like a multitude of lead events created at one point in time. But there are no visits that indicate where they came from. Although in reality, there was both a visit (in this case, a visit to an event) and a lead event. For offline events, we considered it incorrect to create fictitious No Weblog visits, because there was an event where the client came and where we contacted him. So for offline lead events, we added another visit channel – Offline Visit. This visit type is added for each offline lead event and is associated with that lead. Hooray, one less uncertainty.

When a sales manager forgot to upload offline leads

Case 3 – old visits and fresh leads

The lead event is linked to the last visit and inherits its acquisition channel. When we looked at this link, we discovered that there are leads for which the linked visit occurred some time ago. For example, the last visit to the site was from an ad and occurred a month ago, and the lead event appeared only today. It is unlikely that the transition from the ad led to its creation. Most likely, there were other activities that occurred during the month, but which we do not take into account in the current model. Or there was a failure, and information about events for this period was not saved. It would be incorrect to set No Weblog, because the client was on the site. Having collected statistics and discussed with customers, we decided to limit ourselves to a period of 24 hours. If the time between the visit and the lead event is less than 24 hours, then we consider that this visit led to the creation of the lead, and we link them. But if more than 24 hours have passed, we add a new fictitious visit – No Visits. That is, the lead event was created not from a visit, but for some other reason.

Case 4 – the client is not the only one who pays

Most often, our clients are other businesses, that is, not only the buyer specified in the payment, but also, possibly, his colleagues participate in the decision-making process. And here is the picture: we look at the buyer of the product – there are no leads, no visitor log. We go up to the company level, look at events for all contacts – the visitor log and leads are in place. We concluded that one person from the company could visit the site, study the information, communicate with managers, and another person make the payment. We decided to improve end-to-end analytics and look at all events at the company level. But having assessed how many resources it would take us to move the model to the company level, we decided to do this in the next release, and left the first version of end-to-end analytics at the client level.

How end-to-end analytics helps in solving business problems

To assess how end-to-end analytics affected marketing and sales, we collected feedback from users and customers. Below is a short list of what we got from its implementation:

We reflected what real revenue each attraction channel brings.
We were able to evaluate the effectiveness of advertising campaigns and adjusted the advertising budget. Some advertising was completely turned off, while others had their budget increased and their frequency increased.
We highlighted which activities and events warm up the client to the point of purchase, and were able to analyze them.
Adding dummy events highlighted the problem of missing data for the business. This led to a large project to improve the tracking of site visits and link site clicks to the customer. As a result, our No Weblog dropped from 20-25% to 10%.
Marketing and sales teams now see their real contribution to the success of the business. Here I would like to quote directly from the collected feedback: “This has become a pleasant emotional reinforcement and motivator for the guys – to see that their work brings real profit.”

How we will develop end-to-end analytics

We will continue to improve and develop the end-to-end analytics project. We have outlined the following development paths for the project:

Add new types of events to end-to-end analytics. Currently, there are requests to add events such as “call”, “meeting”, “actions in the product”, etc.
Analyze not only successful conversion cases, when the chain led to a purchase, but also look at unsuccessful chains, when the client got stuck somewhere along the way and did not reach the conversion.
We are also considering the possibility of moving from using the U-shaped model to a data-driven attribution model. In this model, weights are also distributed across all events in the chain, but the weight value is calculated based on a machine learning algorithm.

How we built end-to-end analytics and tamed business requirements

What is included in end-to-end analytics

How iSpring Analyzed Performance Before End-to-End Analytics

How we started building end-to-end analytics

Case 1 – No data

Case 2 – Offline Lead Events

Case 3 – old visits and fresh leads

Case 4 – the client is not the only one who pays

How end-to-end analytics helps in solving business problems

How we will develop end-to-end analytics

Apache Camel and Spring Boot

Testing the asynchronous contract of a Spring Boot application (Kafka Consumer+Producer)

17 Metrics to Measure the Effectiveness of Your Hiring Process

Lasers, Servo, WiFi MESH Networks and Snowboarding. Part 2

Python and JS – The view function did not return a valid response

Quickly launch PostgreSQL via Docker Compose

Leave a Reply Cancel reply

What is included in end-to-end analytics

How iSpring Analyzed Performance Before End-to-End Analytics

How we started building end-to-end analytics

Case 1 – No data

Case 2 – Offline Lead Events

Case 3 – old visits and fresh leads

Case 4 – the client is not the only one who pays

How end-to-end analytics helps in solving business problems

How we will develop end-to-end analytics

Similar Posts

Leave a Reply Cancel reply