How to cook call tracking: Cyan experience
My name is Slava, I am the team leader of one of the product teams in Cyan, which is developing call tracking. Today I want to tell you how the Habr reader will get call tracking in his project, what challenges he will face and what you should definitely think about in advance.
So, pick up the phone – you have an incoming call on the line!
Let’s deal with the concepts
Call tracking is a feature that is gaining more and more popularity in large ad platforms. This is when instead of a real number, some other number is shown on the ad, but calls come to you. It’s very similar to call forwarding – now any phone can do that.
The advantages of the substitution number are obvious – the real number is protected from the spam database of dialers, and when the advertisement is removed from the publication, the phone ceases to be torn from already irrelevant calls. In other words, you no longer need to have a second “working” phone on hand to turn it off when the deal is closed.
In addition to protecting numbers, call tracking gives you the opportunity to better convey efficiency. By installing a welcome sound clip with the company name on the replacement numbers, you can additionally show customers that your resource is working, and they pay money for good reason.
There are other business opportunities. The list of calls made will allow the manager to selectively check the work of his employees. Recording a conversation, always available at hand, can change familiar patterns of client behavior. For example, during a conversation, you no longer need to take notes on the run on a notebook so as not to forget the details. In addition, with the help of ML tools for speech analysis, you can get valuable metrics for business that were simply not available before.
Where to begin
If I had the opportunity to go back, the first thing I would do was explicitly agree with the provider on a service level agreement (SLA) and spelled it out in a legal contract.
To build such an agreement, you need to carefully select technical indicators (service level indicator, SLI), which most accurately describes the health of the provider. For example, indicators may include a delay in the execution of a request, the number of requests per second, the number of errors, or the availability of a service.
For indicators, goals (service level objective, SLO) are defined – those values that the partner agrees to provide to us. For example, if as one of the indicators we chose the delay in query execution, then as a goal we can assume that the average delay in query execution should be less than 200 ms. Or so: the average delay in query execution should lie in the range from 200 to 300 ms for the 90th percentile.
With a new external partner, you need to publicly agree on what consequences the non-compliance with SLO entails for them. For example, an increase in the average delay in query execution in prime time to 500 ms entails the payment of a fine.
By agreeing transparently on the shore, further interaction with any partner can be greatly simplified. The sun will shine brighter, the grass will be greener, and the hair will become smooth and silky.
What is the provider inside
A provider is a telecom company that will take on all the hard work of providing communications. Providers usually have their own telecommunication towers (cells, hence the name cellular communication), which provide coverage by a radio access network (RAN). RAN is the first large logical component of the network of any provider. If you do not go into details, then the enlarged picture looks like this.
Mobile users are mobile, they are constantly moving from tower to tower. Obviously, providing continuous transmission of voice traffic is not a trivial task. The dynamic routing topology is what you will pay for and get a simple integration API in return.
Knowledge of the RAN, or the regions where the provider is present, is very important. Just buy numbers in the Moscow region and use them, say, in Sochi, will not work. First, locals will pick up the phone with an unfamiliar region code. And secondly, you can end the end of the month with large roaming bills. You need to consider the geography of your users. If you plan to work in Crimea, then this issue requires even more thorough study.
We are integrating
Providers are very different. Realizing similar functionality, they nevertheless provide a whole palette of API methods, and each one is unique. For convenience, we in Cyan use a microservice version of the famous architectural facade design template.
Opposite the external provider API, we expose our small microservice, which would smooth out all the roughness and bring all providers to one form: the minimum necessary set of methods with the same interfaces.
All providers, in turn, are hidden behind one facade, which provides a clear API for internal microservice clients:
- setting and removing call forwarding,
- receiving calls
- receiving conversation records,
- work with the black list.
Any team inside Cyan, interested in using call tracking to achieve their business goals, simply turns to the facade with a request to change the number. The facade itself does the rest: it selects the appropriate provider and determines which pool to use in the region.
Call tracking as a component supports one team. This eliminates the need for other product teams to ensure that there is a supply of available numbers in the pool, that the numbers are replaced and the system as a whole works as it should.
Throughout the development of call tracking in Cyan, we have faced various problems that directly affect our final product. We thought about some of them beforehand, some turned out to be a whole discovery. In each individual case, I had to decide what to do both from the grocery and from the technical point of view.
You are posting a car for sale and are looking forward to the first calls. And they really happen soon. The bank calls and offers to take a loan. Collectors call and offer a loan back. A microloan wants to give money. You are asked about buying metal. For some reason, no one is in a hurry to buy your car.
This sometimes happens due to reuse of numbers. More recently, this number could be placed on a completely different subject, not related to the sale of cars. Automatically tracking such calls is not easy. If a client addresses such a problem, we immediately remove such a number from all ads in Cyan and replace it with a new one. We do not use the problem number anywhere else – we return it back to the provider.
Users may complain that they are not receiving calls, or that only part of the calls reach, or the call ends at a certain point.
Such calls, as a rule, we immediately address to the provider: you need to analyze the call progress graph, to figure out which other telecom operators have a specific problem call and if the user has their own software or hardware telephone exchange (automatic telephone exchange, for example, is used to organize call center work).
It is also worth checking the most obvious case – can you even get through to the original user number? If not, then a miracle will not happen – they will not pass through the substitution calls either. We do this as the first step – Cyan customer service staff make a test call to the original number and check its availability.
Mass spam dialing
Suddenly, robots start ringing all over the place with volley fire and offer something. And they do it quite intrusively, for example, every 5 minutes. Your clients themselves will not be able to do anything with this, just report the problem to you.
First, try to localize the problem. We in Cyan faced with massive spam calls in a specific region. What are the reasons and goals of such calls – you have to figure it out later, but now you need to act quickly. You need to add all identified dialer numbers to the blacklist and block calls at the provider’s network level.
How quickly it will be possible to solve the problem will be determined, among other things, by the signed SLA with a partner. A formally signed contract may imply the establishment of a ticket in the provider’s bug tracker. But a separate chat in any messenger with guys from the technical unit, where you can report the problem in literally one line, can help solve the problem more quickly. Ultimately, coordinated joint actions will help to solve the problem faster, be more customer-oriented and show concern for the user.
The whole region seems to be lying
It does not seem. The telecom operator may also have incidents on its platform, and you need to be prepared for them. If you find out that some of the numbers are unavailable for a call and the solution to the problem is delayed, turn off these replacement numbers. In Cyan, we decided for ourselves – it would be better for some time on the site the original user number will be displayed than the client will lose their targeted calls.
Of course, you need to be prepared for such a turn of events. The system should be able to quickly switch between displays of the replacement and the original number, including on mobile devices.
How to monitor
Monitoring is a separate big side of the issue, which will determine the speed of detection and reaction to problems. Below we will briefly look at what we focus on in Cyan.
The first important metric is the volume of the number pool that is now available for spoofing. Due to the peculiarities of tariffing in each region, we have our own pool, therefore, our own metric.
There are two extremes, and both are bad. This is too much stock of available rooms and too small stock. The only plus of a large margin is that they will not end quickly. But you still have to pay for them. According to the theory of restrictions, Elyahu M. Goldratt is an Inventory that needs to be disposed of in order to be effective.
On the way to getting rid of excess Inventory, you can overdo it and leave too little stock. Then, perhaps at night, an alert will come that there are no numbers left at all. Further developments will depend on the business process. If the substitution of numbers is a mandatory step, then the lack of numbers will give a big impact. But eliminating the shortage problem will not work quickly. Firstly, at night the support of the provider may have an increased reaction time (this should be clearly stated in the contract), and secondly, connecting and shipping a new portion of numbers is not a quick process, it can take up to two weeks.
The answer is somewhere in between. In general, when and how much to order is the classic task of queuing theory. We have adopted a flexible pool management system depending on the specific region and do additional orders for numbers when reaching 80% of the load.
Number of calls
The number of calls made is the next important metric that shows the health of the service by region.
Anomalies, both up and down, may not always mean a problem. A strong reduction in the number of calls compared to the usual level can be explained by weekends or public holidays, and an unexpectedly sharp surge by a marketing campaign conducted by colleagues from another department. But if the change occurs for no apparent reason – this is an occasion to think and talk about it with the provider.
At a particular hour, we consider the deviation of the number of calls from the median for the previous 7 weeks at the same hour in the region. If the deviation is too high (both up and down), we analyze this with the provider.
It is also useful to look at the attitude of successful calls to everyone else. The graph on the one hand shows the anomalies in the moment, and on the other – the general trend slope. A strong drop in the share of successful calls is an indicator of the mass problem and a call to action.
In Cyan, number replacement is built into the process of publishing an ad. It’s important for us not to show the original numbers on the ad in order to provide protection against dialing. But it’s also important to post ads quickly.
If calls to the provider’s external APIs begin to noticeably increase the average publication time, you need to respond quickly. In the worst case, in the moment, disable the spoofing altogether, and then quickly solve the problem with the provider.
In the end, I want to share the funny stories that took place with us at one or another stage of development.
How we tested autotests call tracking
We decided to cover the whole process of substitution with autotests. In order not to waste expensive substitution numbers, they started fake with a separate provider – Qa Provider. We wrote tests and started to drive them every hour.
After some time, a problem arose – in some regions numbers began to run out. They solved it in the standard way – they ordered additional pools and added them to their numbering capacity. But the numbers quickly ran out again.
After the third additional purchase, the products sounded the alarm: the content in the regions is not growing much, and the numbers are spent like SUV gasoline. Gathered a working group and began to research.
It turned out – due to an error in the code, the autotests did not use the fake Qa Provider, but connected real substitute numbers. Since tests were run frequently, it was easy for them to use up a pool of any size.
The bug was fixed, the numbers were freed. Testing on combat numbers is, of course, good, but expensive!
How we went on tour with customer service
Upper level all problems with call tracking with us can be divided into two categories: problems in our integration (for example, APIs are called incorrectly) and problems on the provider side (calls do not pass).
At some point, we realized that our customer service sends absolutely all the bugs to the product call tracking team, which passes them on to the provider and returns the response back along the chain. We decided to stop this broken phone.
We washed down a separate call tracking admin panel, prepared explanatory articles with providers’ contact persons, prepared a colorful presentation, and went with this to the client.
The effect was overwhelming. Particularly successful was the visual demo using the old Soviet red disk telephone to demonstrate how call forwarding occurs.
After that, the flow of bugs to the product team significantly decreased, and for technical problems, communication began to occur directly with the provider’s support service.
Call tracking, of course, provides new tools and opens up great opportunities. Maintaining his work is not easy, but if you do everything carefully and know in advance what problems await you on the way, you can deliver the highest quality service to your user, solving both his problems and the tasks of the business.