Nuances of process management using the example of the IT Incident Management process

The pharmaceutical company Fox-Meyer Drugs, worth about 40 billion dollars, went bankrupt due to incorrect implementation of the ERP system and was sold to competitors for $80 million. Bankruptcy occurred because the company's warehouse logistics were not measured, not monitored, and were not covered by metrics, indicators, or KPIs. When implementing ERP, they did not notice the destruction of key business processes: warehouses were overcrowded, customers did not receive products. The company tracked profits, but did not formalize logistics business processes, which collapsed within a few days. Lack of management is an unpleasant nuance present in all enterprises. This must be taken as a given, the issue of criticality is which areas of the enterprise are uncontrollable.

Management these days is based on KPIs. A good, balanced system of KPIs, indicators and metrics is a digital twin of the enterprise. Indicators and metrics must be based on processes, otherwise it will be like in England.

In England, the Department of Health, in an effort to reduce waiting times in emergency departments, decided to penalize hospitals where waits were longer than four hours. The program turned out to be outwardly successful, but… In fact, some hospitals have begun keeping incoming patients outside their walls in ambulances in order to meet the allotted four hours(“Bevan and Hood, 'What's Measured Is What Matters'“).

Here is a typical “race for KPIs”. Data distortion is another unpleasant nuance, which is also present in all enterprises, also an unpleasant nuance. But there is another curious point: the KPI “waiting time for treatment” turned out to be “suspended in the air”; it was not based on a clear and transparent process. The process defines the objectives of the indicator, its context, and the factors that influence its actual value. In my opinion, an out-of-process metric creates the illusion of control rather than benefit. Indicators without managed processes are not a source of optimal management decisions.

Process management is a relatively new type of management activity, which is fascinating and interesting in its own way and is still poorly described. Many managers are in vain afraid to manage with the help of processes and call certified, pass-by shaman consultants – in vain.

Let's start with modeling.

Process management organizes management taking into account the organizational contradictions that exist in the organization. Process modeling is part of process management. Let's develop a simplified management model taking into account the above in order to lift the veil and show how. To answer the question: “Can an ordinary manager manage processes?”

A system that fails is like a sick person. A failure is an incident. An incident is an unscheduled termination of a service or a decrease in its quality. Let's talk about incidents, and note that in IT, as elsewhere, you need to take care of the quality of data.

In IT there is an indicator MTTA (Mean Time To Acknowledge), the average time for confirmation and taking into account an incident after the first symptoms of a failure occur. The parallels between MTTA and the “waiting time for treatment” from the “English” example are obvious. There are many ways to manipulate MTTA (I know at least ten). The distortion of information here is based on hiding the first symptoms of inability to work by distorting the classification: user requests about inability to work can be framed as a request for consultation. Monitoring system warnings about slow page loading may be dismissed as false. Information about symptoms may be “lost” from reporting…

There is another indicator in IT: MTTR(Mean Time To Repair). Here under R there can be four words with different meanings and methods of measurement: solution (repair), response (respond), elimination (resolve) or recovery (recovery). Don’t ask why this is so—I’m “in shock.” repair will suit us. I will write a few lines about manipulations with this indicator.

Let the target MTTR be 58 minutes and the actual MTTR be 65 minutes after five failures. Let's take the place of the head of the administrators department, responsible for the operation of IT resources. In his arsenal there is a whole set of “nuances” that allow him to “disavow” a violation of the target value. Here are a few simple techniques:

  • Fake crashes. If you have poor incident resolution rates, all you have to do is quickly overload some unimportant server. For example, in 10 minutes. After a quick restart, you will improve your score and receive a bonus instead of a catch-up. Let's check: (65+10)/6=12.5 minutes. You are really great!

  • Dividing a long failure into several short ones. Let system A supply information to system B. A failure in the functionality of system A causes a failure in system B. Restoring the functionality of B requires sequential restoration of system A (42 minutes) and system B (47 minutes). The total time of such restoration is 89 minutes. MTTR target of 58 minutes violated (89/1=89). However, if we consider MTTR as the recovery time of the two systems separately, it will turn out well (89/2=44.5). Well done again in terms of performance in this case.

  • “Early” resolution of the incident. In pursuit of KPIs, you can take a risk and say that the incident has been resolved without checking the functionality and without even waiting for the IT service to be fully restored. Let the server be overloaded, it has restored its functionality, but the system on the server has not yet deployed or the accumulated message queues have not been processed. In this case, you will not lie if you say that functionality has been restored, despite the service being unavailable to users.

So, we have two indicators that regulate the efficiency of eliminating incidents that may turn out to be unreliable. “Hanging in the air”, without a process, they seem ridiculous and useless. How to manage if the percentage of low-quality software has grown exponentially as part of import substitution?

Let's overcome the described unpleasant nuances in a separate process. We will develop a primary process model taking into account the shortcomings.

The primary model is built on the basis of the developer’s experience, taking into account the opinions of experts. This is a sketch and sketch of a future painting.

Let's consider the Incident Management process, which describes the technology for eliminating IT failures. Target process: minimizing the negative impact of incidents on the customer’s business. Most often, minimizing the negative impact comes down to minimizing the elimination time. Schematically and very enlarged, the process looks something like this:

Management modeling begins with a sketch of the process life cycle.

I recommend drawing for yourself, you can do it on a piece of paper, with a pencil or crookedly – it helps in management and explanation. Detailed to the level at which control is possible and justified. After this, you can set KPIs, indicators and metrics. Here we see that it is possible to minimize incident resolution time (MTTR) in the following ways:

  1. Reducing the time it takes to handle an incident

  2. Reducing the time it takes to find a way to resolve an incident

  3. Applying an optimal, pre-designed “workaround” solution

Someone needs to control these three points. So, how incidents are an operational levelthen the process must be controlled IT Operations Director or an employee with his role. Let’s introduce the MTTA metrics, the “solution search efficiency” metric, and the MTTR indicator. We are well aware that indicators can be distorted; we understand that abnormal deviations in indicators require investigation and analysis, and a search for solutions. Let's give this role to an analyst, which almost all enterprises in a row have acquired. I recommend assigning data quality control to the analyst. With that said, the Incident Management process will schematically look like this:

Workaround – Reducing or eliminating the impact of an incident or problem for which full resolution is not currently available.

The use of workarounds is the most important tool for increasing the reliability of the IT infrastructure. But he is also the most difficult.

Please note that the development of workarounds occurs in another process called Problem Management, where their effectiveness is also assessed. But in the Incident Management process it is useful to enter mmetric that measures the percentage of incidents resolved using workarounds.

By the way, most often good (fast) solving an incident does not eliminate its causefor example, a reboot frees memory but does not eliminate the encoding error. Addressing the root cause is done as part of resolving the IT problem.

The model has been developed.

What does this process give us and why spend money? The process produces the following:

  • Ability to manage incident resolution planning using workarounds.

  • The ability to control the time of diagnosis and search for solutions, which should tend to zero.

  • Ability to control response time to incidents.

  • The ability to more accurately adjust the SLA for troubleshooting

  • Ability to manage data quality, including actual values ​​of metrics and indicators

  • You can dominate the IT people and yell at them once a week, for example, that they shouldn’t overload the servers for hours when the network is down.

We've developed an incident management model—yay! The next stage of process management is to organize the execution of this process.

However, before driving employees into the framework of the process, you need to know that there is one significant drawback, which is that with process management it becomes relevant problem of stereotyped thinkingWhen the employee thinks only through the prism of rules. For example, in our judgments, an incident is understood as one or another inoperability. The inconvenience of the interface is not an incident in our paradigm. Meanwhile, Avon lost more than one hundred million dollars due to the inconvenience of the interface, with implementation of SAP and terminated the project. The company's salespeople began to quit in the hundreds, unable to cope with the complexities of the interface. In this case, if you look broadly, the inconvenience of the interface turned out to be an incident that disrupted the business process.

When introducing process methods in management, it is important to understand that they regulate not only operational activities, but sometimes also thinking.

PS Process management is a simple science that any good manager can master. You might even fall in love with this business. If you fall in love and immerse yourself in the Internet for a month on this topic, then you can give odds to any consultant with the most impressive and even foreign certificate.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *