A specialist is like a flux: his fullness is one-sided
Banks and predictions.
Consider the simplest task of predicting behavior, it couldn’t be easier,
but very common and in high demand in some circles.
This is a scoring task in a bank. To receive an array of parameters, a vector, a matrix, a cube – this is already the taste of developers, process and issue the probability of a refund if they are issued to the person who described the array of information.
Let’s simplify it and imagine that there are only three factories and a bank in the city. Well, the residents.
And we will use pure AI, without impurities, we will make a decision only based on the prediction of the network, without any other intelligence and common sense. It’s like 96% C2H5OH + 4% H2O and no snacks.
It is necessary to build an algorithm for predicting the return probability only based on AI.
For DS, this is a routine task. I mean “predict”. And not only with the help of regression and boosting, but even easily with the help of a neural network.
I will not talk about different networks now. All of them are the same in some ways – I fed them data, they answered “give”, “not give”, “send”.
So, all the factories paid their wages this month. Everything is fine, the work is being done, both the owners and the employees have income.
And the bank provides loans to all employees of these factories several times.
Any AI will learn to determine to whom, what and how much.
But then the day H came and the first plant lost orders and would not pay more wages.
But the bank (banking AI) continues to issue and issue loans, because there is not a single delay among the employees of the plant.
After two or three months from day H, delays begin, and after 4-5 months from day X, almost all the workers of the first plant stopped servicing their loans.
Having collected a dataset sufficient for training, the AI of the bank changes the weights and now the employees of the first plant do not approve loans and the bank does not issue them. That’s it, no money!
Another AI, already shareholders of the plant, decides to transfer duties for the employees of the first plant to collectors.
It would seem that everything is correct, but the solution only by means of DS is vicious and here’s why.
On H-day, the first plant was sold and the new owners loaded it completely and even added wages. But the AI of the bank still does not approve the loan applications of the employees of the first plant, the AI does not have a sufficient dataset and there is no information that the employees of the first plant are now solvent! And all the last loans of the employees of the first plant are not serviced.
Well, then as an iterator through the factories – the bank is ruined, but the business is in order!
At first glance, the solution is obvious – we throw out the “place of work” from the parameters.
We also throw out the place of residence (district, region, etc.), height, weight, hair color, etc.
(If nine redheads came and bought an apartment, then the tenth one will buy an apartment with a probability of 9/10.!)
This is not nonsense, if there is a red-haired club in the city and they discussed it there, then if 9, then 10 is there. And if there is no club! But no AI knows this. There is no such tool in the statistics for determining dependence from newspapers. With matstat, we gain knowledge only from accumulated errors!
Do not forget to check that the parameters do not include nationality, religion, skin color, etc.
Experienced bankers may misunderstand something about DS, but they most likely will not allow such a situation.
But an inexperienced DS will offer to look at the borrower’s credit history – if a potential borrower has been successfully servicing his debt for a long time, then you can give more (here the bankers, if they read this text, of course, grin). I must say right away that this is a very bad criterion.
If you give only to those who have a credit history, the bank will burn out. There will just be a natural loss of customers.
To develop and capture markets, you need to be able to give to those who have no history. They have the bank’s future profits!
And those who were released by his bank to you, to your bank, probably contain a flaw in their credit history. The bank will never release a solvent borrower, the bank knows everything about him. All problems, advantages and disadvantages.
And if this borrower came to you, then he was denied in his bank. Your scoring, looking at the Bureau, happily reports – a cool borrower, not a single delay. But in that bank they know for sure – raw materials for the business of this borrower have risen in price, there have been no more customers, etc. And they let him go.
And a customer with a great credit history from another bank is not a very good customer. Either he is doing badly, or he is arbitrating bets and squeezing the most convenient bet out of you, on the verge of your profitability. And then it will show zero turnover)) (This is of course not about DS, but the way it is)
Bureau credit history is a poor criterion for DS. And if the client has a good history in his native bank, then the banker does not need DS at all.
Well, a client with a bad story is the same as a client without a story. Better, but not by much.
So credit history is not the best evaluation criterion. Just like any other story. It contains no information about the future. This is with physicists – I measured the trajectory, carefully and thoughtfully studied it, and now you know and understand how the planets move. But even then, for the time being. Then suddenly it turns out that without “dark matter” and “dark knowledge” is also inaccurate.
The solution, which is based only on matstat, is fundamentally flawed.
Of course, you can come up with crutches – continue to approve applications for some employees of the first plant and wait for them to start servicing loans. But you just don’t need to tell the bank owners about this method – they know how things are going at the first plant without AI and can perfectly decide outside the framework of DS.
An attempt to break the basics: “applied mathematics is the axioms of the subject area and the axioms of logic” – will lead to unhappy consequences.
In this case, in credit scoring, the problem statement should be changed in principle. You need to select and add the axioms of the subject area. And such systems exist.
I know, for example, that in order to solve predictive problems, adult uncles began to monitor all available and, to the extent possible, known sources of information and tried to evaluate their contribution to the objective function. Or they built a system describing the movement of information, almost like thermodynamics.
But even here there are pitfalls, but about this in another article.