Reliability in Processes. Part 1

Before we unite, we must decisively differentiate ourselves (Business continuity management vs Business Process Continuity vs Dependability in technicals)

Synonyms: Reliability in processes = reliability of processes = reliability of operations = operational reliability (taking into account the synonymy of the phrases “noun + noun.” [Морф23]).

En: dependability, reliability, resilience (availability, stability) Business Process. Continuity of processes – in the context of “business continuity” (Business Process Continuity, BPC), etc.

Methodological introduction: the text will be encouraged by typical threats (common risks):

a) we make simple things complex, i.e. we formalize the simple through complex constructions (excessive clutter), which is often either unfounded (“the game is not worth the candle”) or is a diversion, as is apparently the definition of operational reliability (operational resilience) in paragraph 1.4 716P.

b) we decompose complex things poorly: we incorrectly break them down into simple components;

c) we call the same thing by different words, and different things by one term.

1 Process and reliability

Process and reliability are two very simple terms. Further, by process we will understand a “business process”, which, unlike a natural process (chemical, physical processes), is implemented not by nature, but by a “man-machine”, i.e. in the general case – “man-made” (artificial, artifact).

Process

In general, synonyms: doing, process, operation, function, action, activity, see the correct “Business Process Management”, for example, the books ARIS or [BPM23] – everything is shown there in detail.

“When I take a word, it means what I choose it to mean, no more and no less,” said Humpty Dumpty contemptuously.

For example, the term “operational process” has meaning only if it is given explicitly. Without specification, it would be equivalent to “operational operation” (oil oil).

Reliability

This is also a simple term, but it (as well as “process”) has long been complicated, the history is shown in “Table 1. Definitions of the term “reliability” in Soviet documents” [Depend1].

“Reliability in processes” (reliability of processes) and “Reliability in technology” [27.002] has one nature, in the first case the term “object, product, system” means “process”. For example, in Appendix 2 [27.003] Reliability requirements include:

– The probability of producing a given quantity of products of a certain quality per shift;

– The probability of completing a typical task within a given time.

These are all indicators that determine the reliability of the process.

“Objects, products, systems” can be static or dynamic, the latter being the process.

What is reliability? Let's not look at modern textbooks and GOSTs, as a rule, they have taken the path “a”, see “Methodological input”. Let's open Ushakov's “Reliability Calculation Handbook” [Ушаков66]:

Reliability is a property of a device that ensures that it performs the required task in the volume established for it under certain operating conditions.

Let's replace “device” (in subsequent GOSTs it was changed to “object, product, system”) with “process” and we get: Process reliability is the property of the process to perform the required operation (function, task). Even simpler: process (operation) reliability is its ability to function (operate, be performed). And this was originally in 1962: Reliability is the property of a system to perform a task, see the story about the history of the term “reliability” [Depend1]. Also see one of the definition options: “the ability (of an object, – read process) to function as and when required”, i.e. as the ability of a process to be executed.

It is important to make two clarifications. [Depend1]:

1) we are talking about the mode of using the device/process for its intended purpose (to illuminate a dark entrance – an electric light bulb, not a bright computer monitor), i.e. those modes that are defined in the TU (technical conditions) for the product/process, including “non-standard” scenarios, but formalized within the framework of the process;

2) “the consumer, user, customer is interested only in the final result, regardless of the reasons why it may not be achieved (where the “own” reliability of the object is only one of them), such an interpretation of the MS may seem entirely justified from the point of view of the end consumer, who does not deeply understand the problems of reliability.” That is, the entire spectrum of negative factors affecting the process is considered (determined).

However, such a definition is not enough, because the definition must answer the question: “how much to weigh in grams?”, i.e. the ability must be measurable. Probability theory will help us with this. Everyone can see how it works: we toss a coin many, many times and see that on average (approximately) an equal number of heads and tails will fall out: an even pattern of probabilities of both events.

However, the applied theory of reliability was “crushed” by risk management, and in turn by quality management (ISO 9000), and even “total” (TQM). The beginning was laid in 1993 through double standards IEC 300-1 / ISO 9000-4 and it was called: harmonization of IEC standards on reliability management (300 series) and ISO standards on quality management (9000 series) [Depend3]The intro to the article plays on the reverse path (“return to the roots”): From Quality Management to Reliability Management.

Mysterious risk

Risk is probability of an eventweighted by the severity of the consequence (possible danger). We do not say: “risk of winning”, but only “risk of losing”. If, when tossing a coin, it is determined that heads is a loss, then the term “risk” can be used: risk of losing = risk of heads.

Thus, risk is just the probability of something bad (a negative consequence, “at your own risk and peril”). If we do not like winter, then numerically “the risk of winter coming after autumn” will be equal to 1. The probability of an inevitable event = 1, and of an impossible event = 0.

Let's rewrite the definition of “process reliability” taking into account risk orientation:

Reliability of a process is its ability to withstand negative factors acting on it. Quantitatively, reliability of a process is an integral assessment of its risks, expressed quantitatively. By the way, if a risk is already a probability of something, then it is meaningless to say “risk probability” (probability of probability).

Reliability is assessed through reliability indicators, which usually characterize the positive (i.e. the opposite of risk), for example, the probability of failure-free operation of a process (process operation) versus the “risk of failure”. The process availability factor (readiness function) is the probability of finding a process in the state “process is correct”. In 1960, J. Hosford introduced the term dependability [Depend1]: the probability that a system will be able to function when required.

Rejection Criteria

This is a key concept in reliability theory, since “by and large” everything (the assessment of the entire model) depends on it. There are three methods for increasing the reliability of a system (process):

– use of more reliable system elements;

– redundancy of elements (fault tolerance), both structural and temporary;

– softening of the refusal criterion.

No matter how absurd the last method may seem – deliberately lowering the criterion for refusal, in practice it is often effective, especially in GRC, which in its ultimate form are reports, plans, and purely formal regulations – as evidence of reliability in reports to the regulator.

Typically, a categorization of criticality levels is carried out and for each level a failure criterion is given or a value is assigned to the reliability indicator, for example, the degree of service degradation.

It is important to understand that some failures affect the process shutdown, others affect its effectiveness (marriage). A failure can lead to both an increase in the time of production of a unit of product (service) with the satisfaction of the deadline, and with an unacceptable delay (SLA), and can simultaneously transfer the result to the category of “marriage” according to technical parameters (TU).

Quality of the algorithm

Let's remove the cost of resources from consideration of the “efficiency” parameter and evaluate the efficiency (quality) of the process algorithm. In the “Coin Toss” business process, the efficiency of the algorithm will be 1, and the effectiveness 0.5, since the result will always be half (heads – defective or vice versa) and it does not depend on the qualifications of the performer and the quality of the material. In other processes, we can increase the efficiency of the algorithm by reserving, for example, the number of control procedures.

A high level of the algorithm efficiency coefficient should increase the efficiency to the maximum value with low quality of the input (blanks), tool and performer skills. The quality of the performer and tool can also be entered through probability \ reliability, where,

1 – this is an ideal performer (works without errors);

0 – absolute hack.

2 Reliability in processes

2.1 Environment “Reliability in Processes”

The introduction of the concept of “Reliability in Processes” (as an element of BPM, Business Process Management) is an attempt to cut off the area between the “big / immense BCM” (Business continuity management) see. [BCOR23] and “Reliability in Engineering”. For such “Process Mechanics” (process engineering as a tool / framework for corporate architects) in terms of quantitative assessment of business process reliability, we can compile something like “Handbook for calculating business process reliability”. There are different names: “Reliability of processes and operations”, “Operational reliability”, “Continuity of business processes”, “Process BCM” (engineering BCM) and other Mgmt (although I don't like the word “management” \ “control” in the names of anything).

“Reliability in processes” includes a reliability (probabilistic) assessment of the level of safety and security from external influences on the process (attacks, accidents and disasters), and an assessment of the human factor.

Below the domain “Reliability in processes” is a floor – the domain of systems – automated and non-automated (technical and mechanical means, buildings, workplaces). In reliability calculations, they play a key role and at the same time ensure the “transit” of the reliability characteristics of the equipment to the domain “Reliability in technology” [27.002]. For example, a fault-tolerant cluster (Fault Tolerant Cluster) is the area of ​​”Reliability in Technology”. There is also the domain “information” with the area of ​​”information reliability”.

This distinction (cutting off “comprehensive BCM” and “Reliability in IT systems”) – from the commandment / epigraph to the article emphasizes that the holistic picture of ensuring business continuity / processes / systems must be viewed through the prism formed by the isolation of the corresponding containers.

Let's consider the difference between the failure of an IT system and a process using the example of an incident and RTO (Recovery Time Objective, target recovery time) of IT systems\process.

A client comes to a bank office to deposit money into an account (current or deposit) or to transfer funds deposited in cash. At this time, the bank's automated system (ABS) breaks down. From the position of the bank's IT system, an incident of “ABS failure” occurred and all IT specialists rushed to fix the ABS and try to meet the RTO. From the point of view of the customer service process, there was no failure (there was only service degradation): money is accepted, receipts are issued, the client's orders are stacked, and the reflection of the received funds on the account will be carried out after the ABS is repaired. The bank may have even more time to transfer funds, since the terms of banking services sometimes include the phrase: “No later than the next business day.” Thus, at the level of “processes” (Reliability in processes) in terms of the specified nomenclature of processes – there was no failure at all (all conditions of the banking service agreement are met), while at the level of “systems” (Reliability in technology) a critical failure occurred, since the bank's “heart” stopped.

In this case, an example of a temporary reservation of a process was given. Other examples of temporary reservations in processes are control procedures, re-entry and comparison of results.

Sometimes this is declared as follows: the client does not need to see (know) the “ins and outs” of the process, only the result of the process is important to him.

Let's consider “Reliability in processes” using the example of structural redundancy in processes. Let's say you need to pay interest on a loan. When trying to log in to the site via a home computer, a cable break to the provider (“last mile”) was detected. An attempt to log in via a mobile application was also unsuccessful, for example, due to the need to update the application, and the application itself was removed from the corresponding store due to sanctions. As a result, we take a suitcase with money and go to the nearest bank.

The example shows the reservation of two automated processes by a non-automated one (in general, tripling), i.e. in the final case the task was completed, because the money ended up in the bank.

Isolation of layers of continuity (reliability) provision is carried out by creating containers for each layer according to the type of nested matryoshka (as in the OSI layer, Open Systems Interconnection). As an example of demarcation of “Reliability in processes” and “Reliability in technology” we will consider Fig. …. Which shows a common matryoshka of a process and its supporting IT components.

An example could be modeling the end-to-end chain of components: process (subprocess) – Application system – infrastructure elements. Even when a financial company does not have a full-fledged CMDB, this obliges the regulator to build it manually, for example, when forming a form 0409072 – operational reliability report from 6406-U (for financial credit and non-credit organizations). The most valuable thing in the form is a set of long and unclear lines containing the meaning shown in Fig. 2.1 (slightly corrected from [TAB24]).

Fig. 2.1 Business process and its IT infrastructure

Fig. 2.1 Business process and its IT infrastructure

A difficult to understand line (could have been made clearer, human-readable) in the form 0409072 coded (18-MR Appendix 1) sequence of necessary systems and infrastructure elements for performing business process “XXX” (does not correspond to Fig. 2.1):

TprKO8|Lekton Classic|OU|BfKO22|Front-office accounting system using plastic cards|lekton|Urv1|AC1|AC2|Windows 10|Prin1|Ri2|microsoft|21H2|Klso99|Kls1|840|Not applicable|Lic9|Arch3.1|Not applicable|Zakup5|Podder4|Avt1|Obnv4|Uyazv1|Upr1|FSTEK9|Not applicable|FSB9|Not applicable|Absent|Not applicable|Absent

RacProposal: create an online or on-premises (open source) calculator that will generate both a diagram (Fig. 2.1) and a fancy string for form 0409072 from a clear table. The general implementation mechanism is shown in [SmartDesign24]because generating a diagram from a table via an intermediate dot \ graphviz . Thus, there will be practical benefit for BPM\ EA from GRC (Risk management in accordance with regulatory requirements) 787P / 779P / 6406-U – since financial institutions do not need to collect and submit form 0409072, it is possible to simultaneously, with minimal costs, provide corporate architects (Enterprise Architecture) with an architecture visualization tool directly linked to the company's business processes.

2.2 Process composition

Let's define “Process” as a function with arguments:

– incoming supply (input) – input elements, including materials, blanks, external services;

– resource provision, including human resources (labor force) in the form of the process performer and his tools: mechanisms, automation tools, workplaces.

For simplicity, we will assume that the function itself is defined only by the algorithmic support expressed through workflow & docflow. The function with arguments, see Fig. 2.2:

function(input, hr, tool) = Y

Fig. 2.2 Process composition and its reliability components

Fig. 2.2 Process composition and its reliability components

The tables in Fig. 2.2 play with the quality standard expressed in nines. The table on the left is the availability factor (an indicator of the BPC class, Business Process Continuity) – as the probability of finding the system (process) in a working state: how well the process works, without taking into account its result (its effectiveness). For example, “five nines of continuity” (readiness) is a downtime of 5.25 minutes per year (525,600 minutes), which is equivalent to 10 minutes of downtime per million minutes (slightly less than two years). Compare this same indicator with the “sigma number” (“six sigma”, Motorola, LSS), which determines the quality of products, output, and process results, from the right table in Fig. 2.2: the same “five nines”, but of quality – effectiveness.

3 Process efficiency

Reliability in processes largely repeats “Reliability in technology”, but is implemented on a higher domain and introduces specifics, for example, on the one hand, it is difficult to talk about indicators of process transportability (except perhaps replicability), and on the other hand, it is convenient to evaluate the effectiveness of the process using statistical analysis methods.

As shown above, process reliability is characterized by two “definition zones”: readiness to execute a process instance (readiness to work, to perform a task) and quality – the result of the process (process effectiveness). Let us dwell on the latter in more detail.

Efficiency is the parameters of the “useful” output of the process: High-quality (not worse than the limit value characterizing the quality of the product) and on time. For example, if the deadline for reviewing an application (loan) specified to the client is exceeded, it may no longer matter whether the processes of this review and the result of the review itself were efficient, since the client will go to another service provider.

Efficiency is measured by various values: % of good and through yield of good, defects per unit (PPM) and defects per million opportunities (DPMO), the number of sigmas (Sigma Short Term) and Z-score (Z table), as well as Ср-СрК, Рр-РрК and others. DPMO is always greater than or equal to PPM, since PPM does not take into account possible multiple defects in one module / process instance.

Business process “Tossing a fair coin”. Let's say that according to the TU, tails meets the conditions, and heads is a defect.

The efficiency of the process in terms of “percentage of usable” will be on average 0.5. We do not consider the cost of resources, so nothing can be said about the efficiency.

DPMO = 500,000. The “Number of sigmas” from Appendix A (informative) of GOST R ISO 13053-1-2013 will give a value of exactly 1.5 sigmas, and Z.bench = 0 (1.5 sigma difference).

For more details on the calculation, see lean-group TG, including a meticulous discussion of the notorious 1.5 sigma delta.

For an assessment of process reproducibility and suitability indicators, see the article of the same name. GOST. For calculation of the percentage of good products yield based on the data of each operation, see formula.

In order to maintain a process in a stable statistically controlled state, statistical process control methods are used, for example, control charts, which reflect the current state of the process, provide an assessment of the degree of process variability, and determine the presence of statistical controllability, see GOST R 50779.42 “Statistical methods. Shewhart control charts”.

The concept of result is sometimes defined as “output effect” (GOST 27.003).

The “process domain” includes effectiveness (the degree of achieving the result) and efficiency (the cost/cost of achieving the result), but the latter is already outside the “process reliability” domain. The demand for the result on the market (demand for the company's products) is already the “organization” domain (not the “processes” domain) and the “organization reliability” domain. At this level (“process domain”), it is only important that the process complies with the TU and SLA and its reliability is determined based only on their requirements (as a correlation with the specified values).

4 Questions

4.1 Question on the subject: please provide a link if something similar has been discussed, i.e. “crossing” BPM and reliability theory, identifying the concept of “Process Reliability” and systematizing approaches.

4.2 Question: Mercantile, but practical:

Please share examples of documents required for financial companies that cover (or disclose) the implementation of requirements 787P/779P, something like “Operational Reliability Ensuring Policy” (anonymization, templates, “fish”, etc.).

Recommend forums \ TG channels (preferably not vendor ones) where these “masterpieces” of “bloody” are discussed Enterprise ruGRC No. 787/779П.

4.3 Which of the Western Best Practices most closely and in detail describes (resembles) the GOST series: R 57580.1/2/3/4

Rice. 4.1 Information protection of financial organizations / ensuring operational resilience

Rice. 4.1 Information protection of financial organizations / ensuring operational resilience

What is meant is not the various general concepts of the “comprehensive” BCM (BS 25999, ISO, etc.), but where does this specific “copy-paste” (text fragments) come from, or are there no analogues?

Instead of a conclusion. Above is a general approach, the concept of “Reliability in processes” with examples. Next time we hope to see the “process reliability” indicators of the “Business Process Continuity” class and their calculation.

Some links

[BPM23] In the explanatory dictionary of Business Process Management: Business function vs. Business process

[SmartDesign24] VRM. Smart tools “Table -> Scheme” for formalizing business processes. Restyling ARIS SmartDesign

[27.002] GOST 27.002—89 Reliability in technology. Basic concepts. Terms and definitions

GOST 27.002—2015

[27.003] GOST 27.003—90 Composition and general rules for setting reliability requirements

[BCOR23] Business Continuity & Operational Resilience: Yesterday, Today, Tomorrow. Where did it come from and what's next?

[Ушаков66] I. Ushakov, B. Kozlov “Handbook of Reliability Calculation” 1966

[Морф23] Morphology of the modern Russian language: basics of theory, exercises, tasks. T.E. Nikolskaya and others.

item 1.1.2. Relative adjectives (p. 71):

Synonymy is possible with the phrases 'noun + noun', including prepositional-case constructions (institute library – institute library, mountain climate – climate in the mountains).

[Depend1] V. Netes et al. How do we define what “reliability” is?

[Depend2] S. Alpeev, Reliability Terminology

[Depend3] L. Aleksandrovskaya et al. Modern methods of ensuring the reliability of complex technical systems

[sapland] Approximate calculation of the degree of system readiness

The degree of readiness of the system is described through the readiness coefficient, while it is a dimensionless quantity and cannot be greater than 1

[ISO9000] ISO 9000-1-94
GOST 27.015-2019 (IEC 60300-3-15: 2009) Reliability Management: A Guide to Designing for System Reliability

ISO 9000 Quality management systems – Fundamentals and vocabulary. Unofficial translation “Russian Register”

GOST R ISO 9000-2015 (html)

[sigma] Sigma-Level-Table (Six Sigma Online Certification)

https://www.sixsigmaonline.org/Flash_Videos/Supplemental/Printable/Guides/Sigma-Level-Table.xls

Basics of 6 Sigma:

Introduction

https://www.lean-consult.ru/blog/kak-rasschitat-uroven-sigma-processa-v-metodologii-6-sigm/

https://www.six-sigma-material.com/Protected-Pages.html

https://westgard.com/resources/29-resources/416-sixsigtable.html

https://www.isixsigma.com/sigma-level/yield-to-sigma-conversion-table/

https://sixsigma.ru/glossary_term/yield/

[EasyEA23] Simple Enterprise Architecture. Architecture of a gardening company

[TAB24] 16.07.2024. Webinar. Operational Reliability Report Answers to questions. Technologies. Automation. Business.

Basel: https://www.bis.org/bcbs/publ/d515.htm https://www.bis.org/bcbs/publ/d516.htm

P.S.1

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *