how we optimized defect management for a top-5 leading bank in Russia
We are MM.SUP – Glowbyte’s analytical CRM maintenance team – we want to talk about how not to complicate things and come up with simple and correct solutions in the field of building maintenance processes. The story is based on my own experience of normalizing the processes of managing defects in the industrial environment of the leading bank in the top 5 in Russia.
How it all began?
Some bank has implemented a marketing management system. Now it has become easier for him to live: he does not need to manually update rates in hundreds of tables when the Central Bank rate changes, customers do not need to wait from 3 working days until their loan applications are checked by a number of specialists. Everything is done automatically, it is easy for the bank to adjust thousands of offers to the needs of the client, to interact through different channels of communication with each consumer, and many other business “goodies” have been added that have increased the bank’s income from marketing campaigns in general. But, like in any system, bugs happen in the automation product: someone clicked something wrong, the technical debt of the development team got out, users created a huge load on the system, and it became slower, etc. Cases like these come up every day, and someone needs to solve all the problems and educate users to do the right thing.
At that time, many departments were involved in system support in the bank: the database department, the hardware support department, the retail business segment analytics department, the automation development department for small and medium-sized businesses, etc. Each department was guided by the rule that “everything that is not written in the job description does not belong to my area of responsibility.”
As a result, a large number of defects flowed from one contractor to another, the defects were not solved, everyone gave replies that “this is not mine”, the bank was losing money on system downtime.
Our bank collected regular meetings of some departments with others, resulting in many combinations of event participants, which increased the number of these same meetings. It was not entirely clear which composition of the participants and from which team to call: to discuss the issue of a certain bug, Vasya from the database management department called Petya and Kolya from functional support for a meeting, but forgot to call Valera, who is most aware of the problem, because for which the result of the meeting did not justify itself. The calendar of meetings of each employee grew so much that it was simply impossible to insert another one there. Employees began to ignore events, and this made the situation even sadder.
Bottom line: we talk a lot with the wrong people, at the wrong time, and we don’t come to agreements. Defects remain unresolved.
Another problem with this mechanism was that users were not allowed to start consultation requests. If someone needed to ask something, they brushed him off, referring to the existing volume of problems: there are unresolved defects hanging, and you just came here to ask.
Users did not know what to do in order for the marketing system to achieve their goals, they independently came up with workarounds that created even more problems and slowed down the work of all colleagues.
Finally, there was the problem of testing and implementing defect fixes. Each fix is editing configs, code, parameters, objects and data in the database, etc. And each fix must go through a series of tests – functional, load, and then get into the production environment. This whole process is accompanied by a package of mandatory regulatory documents and annotations, which the bank cannot refuse due to the organization’s norms. If at any stage something went wrong (even if it was an error in the accompanying documentation), the delivery was rolled back and passed back for testing and went through many validation hands. Deliveries rolled up at night, since during the day everyone had meetings, which reduced the quality of work on the withdrawal to the PROD.
If the defect was not an accident that blocked everything in general, then its correction was postponed, and a tail of unresolved incidents grew.
All this was reminiscent of the Goldberg machine, which performs simple actions through complex and non-trivial mechanisms. Performing simple actions in complex ways, the bank leaked SLA on almost every defect, the employees of the organization had regular overtime.
We audited the bank’s processes in terms of support management and identified the following issues:
there is no specific responsible executor;
a lot of phone calls that are not useful;
correction of defects through deliveries without a formed implementation flow;
there is no entry window for “I just ask”.
The result of all the problems was a significant slowdown in the resolution of incidents, the loss of SLA, in extreme cases – the absence of a solution in principle.
How We Optimized Everything
They tried to make candy out of chaos twice before us. Teams came to the project, looked at the confusion of processes and left. We went the other way: we selected a responsible team leader, guided by the principles of permanent proactivity and initiative, who could not only look at the problem, but also propose corrective measures, and even correct it. So, in all the confusion, a person appeared who was the entry point for all problems, responsible for the quality and speed of solving defects and the timely change of their status. The team that was entrusted with the activity took responsibility for literally everything in supporting the bank’s marketing systems: communications, collecting meetings on defects with the required composition of participants, answers in all chats to the bank’s divisions, performed the function of a reminder timer to do something, synchronized deadlines and statuses according to solving defects from all teams, took the brunt if someone didn’t manage to do something / didn’t do it as expected, and then looked for ways to do everything so that this would not happen again in the future. In general, on the project, we did not support the rule “everything that is not spelled out is not included in our area of responsibility”.
“Real responsibility is only personal. The man blushes alone. Fazil Iskander
When a defect has thousands of performers, responsibility is blurred. There is always a neighbor about whom you can say: “He should have done this.” When one person is responsible for all defects, he will find someone to turn to for a solution, go through all circles of communication and find someone to ask for a solution if it did not arise on time or did not arise at all. In the meantime, while our employee acted as this person in charge, the bank was able to exhale and do other important things.
“Why are we talking, let’s do it”
The bank was guided by this principle before we came to support their system. But in fact, conversations can be productive, and it is better with them than without them, if you build a meeting correctly. We have implemented a weekly defect meeting, which has become a requirement for all defect workers to attend. If someone does not come or rejects an event on the calendar, it is imperative to write a letter with a reason or send a replacement. The meetings have an agenda:
– we discuss exclusively the issues of defects;
– be sure to set deadlines for the performer;
– check compliance with deadlines by the performer;
– synchronize on further steps of solving the defect;
– we present statistics on the number of incidents for the period so that the efficiency of work on defects is transparent to everyone.
So productive meetings brought clarity to who and by what date should complete certain work.
Status-meetings for news exchange appeared in the system: if developers implement something, they must announce their future delivery at the corresponding meeting.
“I just ask”
We have implemented a separate space in the jira of the bank, which is intended only for consultations. So the users had the opportunity to ask, and the questions of this space were not confused with defects, and the bank’s desire not to mix bugs and defects was taken into account. Users began to ask questions, and the trend showed that the number of defects decreased: it became more obvious to people what they need to do to solve their problem, and how not to break everything with their actions.
As mentioned above, rolling out a delivery that would pass all the rules, regulations, tests, fire and copper pipes is not an easy task. For the solution, an employee was allocated who began to deal exclusively with the roll-up of deliveries. They began to plan for the output in an industrial environment
PROD through the calendar: any correction must have a rollback timing and be completed at the scheduled time. In addition, a “moderation committee” was created from the support of Glowbyte, which checked that deliveries with defect fixes were correctly executed, the bank scheduled the release of the correction in the PROD calendar. As a result, the number of rejected deliveries has decreased.
In addition to everything described, we offered the bank an expertise in support: we ourselves began to investigate and fix defects, and advise users on their issues. All this eventually simplified the processes even more and helped to turn the complex into simple.
Now the processes of supporting the marketing systems of this bank are combed, neat and worked out. About 9 months ago, we entered the project to provide support services, and in parallel we built a process that suits everyone now. For all the time from regular 50+ open defects per day, the number has decreased to 10-15. Now:
The statuses of each defect are transparent, the deadlines and the executor are known.
The bank had time to build the business logic, and not to deal with the service.
The effectiveness of marketing campaigns has grown, production defects
It has become easier for new employees (developers, analysts, business users, etc.) to dive into the features of the project and start working on their tasks.
Today, we keep the established process up to date and help the bank develop in new horizons by offering the best ideas, implementation methods and our expertise.