Marketing optimization in the bank

image

Hello, Habr.

Marketing optimization, setting limits on a portfolio of loan products, logistics and product analytics, optimization of production processes, … – the list of applications of mathematical optimization methods is far from limited to the listed tasks, and optimization methods began to solve business problems long before data science began to be called sciences about data.

With the development of adaptation of ML / DS technologies, one can expect an increase in the popularity of optimization methods, primarily due to the fact that solutions to business problems are becoming more complex. That is, instead of making one or two models that give out almost final decisions, the decision-making process is decomposed into separate constituent components, in which there is a place for predictive models, and for the very decision-making, taking into account all these components and constraints, the optimization is already working. model.

In the article, we will talk about a possible formulation of the optimization problem in the banking sector and methods for its solution.

In a team GlowByte Advanced Analytics We are actively promoting the approach according to which ML projects are best formulated initially as optimization problems, that is, as a decision support system with measurable business indicators.

There are many open source frameworks for solving optimization problems such as Gekko, Pyomo, Python-mip, as well as various proprietary software such as IBM ILOG CPLEX Optimization Studio

Article outline

  • Optimization problem
  • Marketing optimization in the bank
  • Some code

Optimization problem

The optimization problem is to find the extremum of the objective function in a region of space bounded by a certain set of conditions (equalities and / or inequalities). Depending on the type of objective function and the type of constraints, optimization problems are divided into problems linear and nonlinear programming
In general, the mathematical formulation of the problem may look like this

$$ display $$ begin {aligned} { large f ( vec {x}) rightarrow underset { vec {x} in X} { min}}, ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; \ X = left { vec {x} ; | ; g_ {i} ( vec {x}) leq 0, ; h_ {k} ( vec {x}) = 0, ; \ i = 1, ldots, m, ; k = 1 ldots, p right } subset R ^ n. end {aligned } ; ; ; (1) $$ display $$

Those. among all vectors of the set $ X $, which is limited by conditions $ g_ {i} ( vec {x})  leq 0 $ and $ h_ {k} ( vec {x}) = 0, $ it is necessary to find such a vector $  vec x ^ * $at which the value $ f ( vec {x} ^ *) $ will be minimal on the whole set $ X $
Under linear constraints and a linear objective function, problem (1) belongs to the class of linear programming problems and can be solved simplex method

Marketing optimization in the bank

Imagine a modern bank working with individuals and selling them its main products: credit cards, cash loans, etc. The bank can offer each client one of the products $ P_i, ;  i = 1,  ldots, n $ in one of the channels available for communication $ C_ {k}, ;  k = 1,  ldots, m $ (call, SMS, etc.). At the same time, the number of communications available for sending per week / month in each channel (channel volume) is limited

$  begin {aligned} &  sum Calls  leq N_1 \ &  sum Sms  leq N_2 \ &  sum Email  leq N_3 \ &  sum Push  leq N_4 \  end {aligned} ;  ; ; (2) $

Suppose that the bank has a model (or several models) that predicts with good accuracy response probability customer to a specific banking product in a specific channel. It is possible to assess the quality of such a model from the point of view of how its estimates relate to real probabilities by constructing the distribution of the predicted probability and real response over the buckets of the model scoring on a deferred sample.

image
Fig. one

Now having on hands in a table in the database for each customer response probabilities on individual products in a specific channel, restrictions on the volume of communications (2) and the fact that only one product can be offered to a client per communication, let us ask ourselves what product and in which channel is it best to offer each of the clients available for communication?

image

When setting the problem, it is important to understand what kind of metric we want to maximize at the output. For example, if as a result of communications we want to get the maximum response, then the corresponding optimization problem can be posed as follows

$  begin {cases} p_ {11} x_ {11} + p_ {12} x_ {12} + p_ {13} x_ {13} + p_ {21} x_ {21} + p_ {22} x_ {22} + p_ {31} x_ {31} + p_ {32} x_ {32} + p_ {33} x_ {33}  rightarrow  max \\ One ; client ;  not;  more ;  one ;  product \ x_ {11} + x_ {12} + x_ {13}  leq 1 \ x_ {21} + x_ {22}  leq 1 \ x_ {31} + x_ {32} + x_ {33}  leq 1 \ \ Limit ;  quantities ;  calls \ x_ {12} + x_ {21} + x_ {31}  leq N_1 \ \ Limit ;  quantities ;  sms \ x_ {13} + x_ {22} + x_ {33}  leq N_2 \\ Limit ;  quantities ;  emails \ x_ {11} + x_ {32}  leq N_3 \ \ x_ {ik}  in  left  {0, 1  right }  end {cases} ; ; ;  (3) $

This is a classic linear programming problem that can be easily solved using the open source frameworks mentioned above.

If the goal of communications is to maximize future profitability, then the objective function in problem (3) can be written as

$  begin {aligned} D_ {11} p_ {11} x_ {11} + D_ {12} p_ {12} x_ {12} + D_ {13} p_ {13} x_ {13} & + D_ {23} p_ {21} x_ {21} + D_ {22} p_ {22} x_ {22} + \ + D_ {34} p_ {31} x_ {31} + D_ {32} p_ {32} x_ {32} + & D_ {32} p_ {33} x_ {33}  rightarrow  max,  end {aligned} ; ; ;  (4) $

Where $ D_ {ik} $ – profitability from kth product on ith client. The values $ D_ {ik} $ can be obtained using predictive models or estimated in some other way.

Remarks

  1. The approaches described above assume that we have reasonably good predictions / estimates for $ p_ {ik} $ and $ D_ {ik}. $
  2. If you are fast $ p_ {ik} $ for various products are obtained from different models and at the same time they (fast) do not agree well with the real probability of response (this can be seen, for example, from the graph as in Fig. one), then they must be calibrated before optimization. You can read about various calibration methods at link
  3. It is also assumed that the number of communications for which the bank is ready to spend funds is less than the number of customers to whom the bank is ready to offer its products. Otherwise, there will be nothing to optimize.

Some code

Let’s try to solve the marketing optimization problem posed in the form (3) using the MIP library mentioned above. Let’s take a randomly generated dataset of 6,000 lines, which contains 1,000 customers, each of which can be offered one of 3 products in two channels – SMS and a call.

The code

import pandas as pd
from mip import Model, MAXIMIZE, CBC, BINARY, OptimizationStatus

frame = pd.read_csv('table_for_optimization.csv')
frame.head()

image

Suppose we have a limit on the volume of communications: 500 SMS and 200 calls. Let’s write a function to solve the optimization problem.

The code


def optimize(frame: pd.DataFrame, channel_limits: dict) -> list:
    """
    Возвращает массив оптимальных предложений
    """
    
    df = frame.copy()
    
    #создание модели
    model = Model(sense=MAXIMIZE, solver_name=CBC)

    #вектор бинарных переменных задачи
    x = [model.add_var(var_type=BINARY) for i in range(df.shape[0])]
    df['x'] = x

    #целевая функция
    model.objective = sum(df.score * df.x)

    #ограничения на количество коммуникаций в каждом канале
    for channel in df.channel.unique():
        model += (sum(df[df.channel==channel]['x']) <= channel_limits[channel])

    #ограничения на количество продуктов для каждого клиента
    for client in df.client_id.unique():
        model += (sum(df[df['client_id']==client]['x']) <= 1)
        
    status = model.optimize(max_seconds=300)
    
    del df
    
    if status == OptimizationStatus.OPTIMAL or status == OptimizationStatus.FEASIBLE:
        return [var.x for var in model.vars]
    elif status == OptimizationStatus.NO_SOLUTION_FOUND:
        print('No feasible solution found')

Let’s set limits on the volume of communications in channels, launch the solution to the optimization problem and see how the optimal offers are ultimately distributed across channels and products.

The code


#объем доступных коммуникаций в каналах
CHANNELS_LIMITS = {
    'call': 200,
    'sms': 500
}

optimal_decisions = optimize(frame=frame, channel_limits=CHANNELS_LIMITS)
frame['optimal_decision'] = optimal_decisions

#распределение продуктов в каналах
frame[frame['optimal_decision']==1].groupby(['channel', 'product']).
                                    agg({'client_id': 'count'}).
                                    rename(columns={'client_id': 'client_cnt'})

image

All code and data are available by link

PS

Depending on the type of predictive models, we can have more than just the average estimate of the response probability $ p_ {ik}, $ and also have a distribution of this value for each customer and product. In this case, the optimization problem (3) can be supplemented by the condition

$  sum (response ; with ; probability  geq K)  geq N. ; ; ;  (5) $

Moreover, if we have a distribution for each probability $ p_ {ik} $, then we can also solve the inverse problem: to minimize the number of communications under conditions like (5), taking into account certain restrictions set by the business.

Thanks to colleagues from the team GlowByte Advanced Analytics for help and advice in preparing this article.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *