RFM analysis of your behavior in the bank

image

We analyze your actions in order to understand what is relevant to you across all contact channels. This affects various kinds of tips, the ability to offer certain promotions and conditions. But first of all – to increase loyalty, and therefore – to reduce customer outflow.

It is based on a modified RFM segmentation method. The difference between RFM and other segmentation methods is that it only looks at customer behavior. Does not use demographics, hobbies, views, or anything else other than information, in our case, transactional activity.

My name is Irina Skorynina, I am developing new models for analyzing customer behavior in the Retail Modeling Department of Gazprombank and launching campaigns using such models. Now I’ll tell you how mathematics helps to better understand customer needs.

RFM segmentation is a marketing analysis method that allows you to rank customers and predict their future buying behavior based on an analysis of their past actions. For RFM analysis, we considered customer behavior as their transactional activity.

RFM is an acronym for Analyzed Key Quantitative Metrics:

How we modified the RFM analysis

We've improved on this approach by taking it as a starting point and changing the metrics. In our method, called FMCD (this is also an acronym), there are four parameters by which we determine the transaction behavior of customers. Now this:

Categories are determined taking into account standard MCC codes (Merchant Category Code), which classify the type of activity of the organization in payment transactions using bank cards.

This code indicates what the company does, what goods or services it sells: food, entertainment, fuel, building materials, etc. It is assigned to an organization when it starts accepting bank cards for payment. Our model has a total of 14 unique categories.

By the way, we are monitoring the emergence of new MCC codes so that they can be taken into account in the model in the future. In particular, Yandex recently launched its own ecosystem of MCC codes 3990–3999.

The existing base of customer transactional activity allows us to analyze their needs, segment customers into groups and develop personal offers based on such segmentation.

Segmentation helps target a specific sector of the database that is most likely to respond to marketing efforts.

I note that we, of course, can only analyze transactions using bank cards; we cannot track purchases for cash.

We recently supplemented the FMCD calculation with an analysis of the client’s propensity to purchase online or offline. That is, an online purchase is a purchase through an online store, and an offline purchase, accordingly, is simply a personal trip to the store and payment by card on the spot.

Using these metrics, you can position the customer based on how they complete transactions. To make a more accurate determination, for each consumer the metrics were sorted by their value and distributed into groups, or so-called buckets. Each bucket is 10% of the sample. That is, the smallest buckets will have the smallest metric value, and, accordingly, the largest buckets will have the largest.

To make it clearer, for example, clients who buy a lot in different places end up in one bucket, clients who spend a lot of money end up in another bucket, etc.

As a result, we received four scoring points for each client. In accordance with the scourwall, each client falls into one or another sample. You can also understand how many transactions a client makes in quantitative, total terms in relation to other clients, as well as in how many unique categories and in what time periods.

In addition, you can determine how engaged each customer is in transactional activity by adding up these four scoring scores. The maximum total number of points is 40.

We divided all clients into groups that differed in the degree of POS activity, that is, the use of bank cards. In percentage terms, these are 20% (TOP), 40% (MIDDLE) and 40% (BOTTOM), respectively. The first group includes people who have the highest indicator of this amount of points, that is, they are distinguished by a high degree of POS activity and transactions in various categories. The middle group includes customers with moderate transaction activity, and the third group includes people with weak POS activity, that is, those who make purchases rarely, irregularly, and in a small number of unique categories.

All four metrics are taken into account. For example, even a client who has spent a lot of money in some transaction falls into the lower category. That is, a person who has made one or two very large purchases, but his other metrics are slipping because he very rarely used his bank card.

image

This way you can understand which clients make a lot of transactions and frequently, and which ones rarely.

Also, using such metrics (scorballs), we can determine what is happening with our client base over a certain period of time, for example, over six months. Understand which part of the clients has improved their position on grief, which has worsened or maintained its status.

By analyzing grief data, we can understand how best to work with different groups of clients. For example, sending discounts or promotions to your oldest customers, customers who only transact in one or two categories, or even customers who haven't purchased in a while (return customer campaign).

Using FMCD analysis, you can reduce the amount of discounts and bonuses issued. It is not necessary to offer additional discounts to those who constantly buy, and they will help “asleep customers” return to shopping.

This information can be useful to understand which mailings specific customers should receive. If some of the clients reduce their POS activity, their metrics “slide” down, then they need to be included in the marketing campaign and start communicating with them.

In order to encourage the client to move to a higher scoval group (to the next bucket), we determine the value of the potential. It is calculated from a discrete difference and a linear coefficient indicating the trend in the buckets. The next goal is calculated as:

Target value = Current value + Potential.

That is, when the client reaches his potential, he moves from a low bucket to a higher one.

image

Moving to a group with a higher scoring score means that in the future it will be possible to work with such a client differently, for example, he will receive a certain type of newsletter.

For customer segmentation and FMCD analysis to be valuable to the business as a whole, they must help make decisions that drive results. The activity of grouping indicators in itself is worthless. What matters is the decisions made when creating campaigns.

Our model also allows us to predict a non-trivial category in which a customer is more likely to transact. A non-trivial category means that the client has not yet made transactions in the last three months.

We use Market Basket Analysis methodology here, which is widely used to analyze transaction data and is designed to identify strong rules found in such data through interestingness metrics.

To search for frequent and associative elements, we use the Apriori algorithm from the Arules library. The algorithm uses a level-by-level search for frequent elements. Association rules are rules of the form “if this is done, then that will be done.” For example, “if a client bought a warm jacket, then he will also buy winter boots.” Having analyzed all the client’s transactions using such an algorithm, it is possible to predict possible POS activities and determine the category in which he is more likely to make a purchase.

The calculation is carried out according to the formula that determines the associative rules:

image

Where:
Supp (support) is an indicator of frequency, that is, how often a particular item (in this case a category) appears in the cart in customer receipts.
T is the total number of transactions.

When we evaluate customer baskets, customers with similar behavior are placed in the same group. And if some client from such a group does not yet have purchases in a certain category, but other people with similar behavior do, then such a client is very likely to conduct a transaction there soon.

image

Let's take two unique categories, Food and Entertainment, as a simple example.

If we want to determine the frequency indicator, then we add the amount of checks, where both categories are present, to the numerator. There are two such checks (3 and 5), a total of 5 checks. As a result, we see that in 40% of checks these two categories are present together.

With this data in hand, you can now calculate the Confidence Score. This is a measure of how often this rule is triggered for the entire data set:

image

For the rule “whoever spends in the Food category also buys in the Entertainment category,” to find out the Confidence value, we place 2/5 in the numerator, and the Support indicator only for the Food category in the denominator. It is equal to 3/5, since purchases in this category are noted in three out of five receipts.

Dividing one by the other, we get:

image

We see that out of three checks in which clients made transactions in the Food category, in two checks clients also made transactions in Entertainment. That is, we are saying that customers in the Food category are more likely to also make a transaction in the Entertainment category. And this can be taken into account during the next mailing.

In total, we predict three non-trivial categories of transactions, three levels. And accordingly, when launching a campaign, we take into account all these three levels.

Difficulties in implementing the model

During all this time, we encountered a number of difficulties and dealt with them quite quickly. For example, when programming a module that implements prediction of non-trivial categories, we realized that the code we wrote completed the task in three days. After this, the code was optimized, replacing string values ​​with binary ones, and instead of set comparisons in Python, simpler action operations were used. As a result, the code now completes the task within a day.

There were also technical problems with the Python code, since it was processing a very large database, and the system only allows processing to be parallelized by processes, not threads. We also had to take into account various nuances – checks, return transactions, and the like.

As a result, we came to the conclusion that the computing power of the entire algorithm is approximately 24 hours.

There were difficulties related to business, due to which the algorithm underwent a number of changes over the year. This is quite understandable, since business colleagues always want to improve the system, for example, change customer segmentation. At the same time, difficulties with segmentation were also associated with the lack of bank-wide product dictionaries. However, this problem will soon be solved.

Already when putting our model into commercial operation, we encountered a problem that the ticket that works to connect to the database was disconnected after 24 hours. I had to make amendments to the code to make this ticket function longer. In addition, we needed a lot of power to develop and operate the model, which raised a number of questions among our IT specialists. To run the model and calculate it so that it could run within a day, we needed to run 16 cores and 150 GB of RAM. At the same time, using the Dask library, we parallelized the process and accelerated the calculation.

As a result, the algorithm was developed as an MVP (minimum viable product) for about a year, after which it was recently put into commercial operation. And now the launch of campaigns based on the FMCD model is on stream and every month some marketing campaigns are launched.

Results and near future

In general, it can be argued that the model we have developed is already helping to reduce customer churn and increase their loyalty. But we continue to develop further improvements.

In the future, we also plan to introduce clustering of MCC codes to identify groups of clients with complex expenses and replace the Apriori algorithm with a faster one, in particular, with ECLAT. This algorithm, unlike Apriori, does not scan the entire gigantic data set (breadth-first search), but performs a depth-first search.

Currently FMCD is calculated monthly, but we want to switch to weekly calculations. Up-to-date information – up-to-date forecasting.

In addition to grouping categories of customer transactions, it is planned to introduce categories of stores in which transactions were made, with a gradation from the least to the most popular, as well as calculating the average check and time since the last purchase. In addition, it is planned to develop mechanics for tracking credit card debt.

The better we determine how the client behaves, the more effective our interaction with him. And despite, of course, the commercial approach, one can say that this is about reciprocity – after all, both sides benefit.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *