how students tried ML on the services of telecom operators

What is the future of the IT industry? It’s not at all difficult to identify the main trend – artificial intelligence and machine learning.

Nexign always looks boldly into the future, so in one of the projects we decided to go beyond our usual approaches: we took new technologies – ML, new specialists – students from specialized universities, a new internship format – an independent team of theoretically savvy guys. Then we decided to mix everything and see what comes out of it.

In this article we will talk about the successes achieved within a small ML project and how we were able to not only solve several cases, but also develop a full-fledged engine for developing, testing and analyzing new machine learning models.

Introductory

The industry has long formed a hypothesis that in telecom systems, with the help of machine learning, it is possible to predict the behavior of subscribers, combat their outflow, and recommend optimal new services. Nexign has already done work in this direction, but without using machine learning. The next logical step was to develop the process using ML technologies.

The trainees were united into an independent group, assigned a mentor, provided with data and tried to give them freedom in their work. This is how we came up with a small project called CARP: classification – analysis – recommendation – prediction, which should simplify and speed up the development of cases using machine learning.

At the beginning of the work, the team was faced with the task of solving three clear cases, which will be discussed further:

  • build a model for predicting the probability of subscriber churn (case No. 1);

  • develop a scoring system for deferred payments (case No. 2);

  • create a recommendation system for product offers (case No. 3).

Excursion: how machine learning works

Many people associate the terms “artificial intelligence”, “machine learning” and “neural networks” as if they were one concept. But neural networks are only one subtype of machine learning and one of hundreds of algorithms that can be used.

What is machine learning? Some collaboration between linear algebra and magic? In fact, everything is a little more complicated.

Machine learning (ML) is an approach that allows a model to automatically learn and make decisions based on data and patterns. It is based on methods of statistics, computer science, mathematics and artificial intelligence. ML is suitable for data analysis, pattern recognition, behavior prediction and decision making in various fields.

The implementation of any machine learning case consists of five stages:

Stage 1: data analytics. First, we determine what we want and on the basis of what data it can be predicted.

Stage 2: collecting the necessary data. We form an array of input information on which the model will be trained in the future. The data can lie in 1, 2 or 10 systems, so even just collecting it in Excel takes man-days, weeks.

Stage 3: data processing and transformation. We configure the model to correctly understand what data is in front of it, what the range of possible values ​​is. For example, there is a significant difference between training on data from 0 to 1 or from 0 to infinity.

Stage 4: training and building a model. At this stage, we feed test data, look at the metrics, and retrain the model until we achieve the desired logic.

Stage 5: predicting the result – climax. We come with a specific request, magic happens (data analysis using a predictive model), we get the same result.

ML solution concept

So, we had three cases of predicting the behavior of telecom subscribers. We used an array of data from the billing system: payment information, information about users, connected tariffs and options. To collect them, we implemented separate connectors that pulled up the necessary information.

All this data was added to Clickhouse – our database. This is necessary in order to subsequently train a model based on them simply and relatively inexpensively.

According to the solution concept, we use this data only at the time of creating the model. And you need to retrain it only when the user’s behavior pattern changes: once every six months, once a year, or when there are significant changes in business needs.

Next, when we need to predict user behavior, we select current data by ID, feed it into the model’s input and get our prediction.

We have chosen a stateless approach, that is, we do not store subscriber data inside our solution. Our backend services are completely isolated from each other, while allowing for horizontal scaling. Due to this, you can easily process large volumes of data with minimal hardware costs.

Let's look at each case in more detail and use examples to analyze: what the task was, what they did, and what result they got. We will move from the simplest to the most complex.

Case No. 1 “Subscriber departure”

Target – develop a model that predicts the probability of subscriber churn. With this forecast in hand, the telecom operator can attempt retention. For example, by offering a new tariff plan or new services.

Implementation. The database for the case consisted of 32,000 subscribers, 80% of them became the training sample, 20% – the test sample. The array consisted of historical subscriber data and service consumption data. Based on the data array, we trained the model, identified key features, and identified relationships. An example of this case is that a large number of SMS messages sent can become one of the key signs and indicate the subscriber’s loyalty.

Result. The overall accuracy on the test set, that is, on data that the model did not see during training, was 94%. That is, in 94% of cases the model correctly predicts user behavior.

Case No. 2 “Scoring system”

Target – develop a system that allows you to predict what part of the deferred payment the subscriber will return on time.

Implementation. We collected a large amount of data about subscribers, including historical data on how they previously used similar services and personal data. Based on the data array, we trained the model, identified key features, and identified relationships. Using the example of the second case, we realized that in order to forecast debt repayment, it is important for us to take into account the historicity of payments for previous periods.

Result. The trained model makes an average error of 70 rubles or 10% of the payment amount. The graph shows the dependence of the prediction error on the payment amount. The lower the point is to 0, the smaller the error.

We were not satisfied with the result obtained, and we decided to analyze the performance of our model in order to increase accuracy. During the process, we noticed an interesting feature of behavior, which can be called sudden non-payment. There are deviant cases when a subscriber uses the deferred payment service for a long time, regularly pays off the debt, but suddenly at some point does not pay at all. Such cases make up approximately 13% of the entire sample. In the remaining 87% percent, we predict quite accurately; we are mistaken on average by 20 rubles, or by 2.5% of the payment amount.

Plans. Our next step in this case is to take more data, train a model on it, find the very risk indicator that could predict the likelihood of a subscriber suddenly not paying a payment, and make sure that such a result is obtained in 100% of cases.

Case No. 3 “Recommender system”

Target – selection of the optimal tariff and options in order to prevent/prevent a decrease in subscriber loyalty.

Implementation. We have developed a model that predicts user consumption in subsequent periods, and on this basis selects the optimal services and tariff plan.

For preprocessing, we used logarithmization, aggregation of data over time periods, and combination of highly correlated tariff indicators. This all helped our model predict the same information, but much more accurately.

We combined two approaches: decision trees and the gradient optimization method – we got gradient boosting. This resulted in one of the most powerful model architectures for prediction on tabular data. In particular, we took the CatBoost model, which is an open source architecture from Yandex.

Result. The model was chosen correctly. When predicting consumption, the error was on average less than 1.5 rubles, 3 SMS and 5.5 MB of Internet. At the same time, the data obtained allows us to select optimal service packages.

Since our operator only has a certain list of tariffs for connection, we decided to display the TOP 5 options that are close to the optimal one. To do this, we presented the tariff as a certain point in space and presented the available tariffs in the same space. In the figure, the red dot is the tariff we predicted, and the blue dots are the tariffs that are available to the user. We decided to offer the user the tariff closest to this red dot. We compare tariffs just like points in space. Ultimately, we ended up with 17 CatBoost models, and the training included more than 90 different indicators about the user and services.

Project results and prospects

To evaluate the solutions obtained and their scalability, we decided to conduct a small load test. For this purpose, a sample of 1 million, 10 million and 80 million subscribers was generated. For 80 million subscribers we needed a 16GB SSD. The average training time is about 30 hours per model. This is quite a bit, considering that the model needs to be trained once a quarter, once every six months, or when the behavior pattern has changed. It is worth noting that The model was trained on the CPU. If using a GPU, the time would be significantly lower.

In the process of implementing the case, we came to the conclusion that CARP is a platform on which you can quickly and efficiently implement various cases with machine learning: you just need to provide training data and select the optimal one from the proposed model architectures. Possible applications include: load/traffic prediction, tracking suspicious activity, finding the optimal time for notification, and much more.

Our next step is a slight reworking of the architecture and adding a no-code interface to the solution, where, using ordinary blocks, an administrator, a business analyst, or an operation specialist will be able to use existing cases as templates or create completely new ones with a few mouse clicks.

Let us summarize the results of the double experiment. Firstly, the interns did an excellent job with the task, showed independence and developed a full-fledged ML engine. Secondly, we received confirmation that machine learning can not only be applied, but also in demand in the telecom sector. We thank all project participants and wish them success: Andrey Efremov – SQL guru, Pavel Avramenko – backend professional, Mikhail Stepanovsky – ML specialist, Maria Antonova – coordination master.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *