Statistical analysis of the results of load testing of DBMS in cloud infrastructure conditions

median smoothing with a period of 1 hour.

The sample is considered suitable for further analysis, If the values of the median and mode are the same.

median random variable: in this case, it is defined as the number that divides the distribution in half. In other words, the median of a random variable is the number such that the probability of getting the value of the random variable to the right of it is equal to the probability of getting the value to the left of it (and they are both equal to 1/2). We can also say that the median is the 50th percentile0.5-quantile or the second quartile samples or distributions.

Median (statistics) – Wikipedia (wikipedia.org)

Fashion — one or more values in a set of observations that occurs most frequently

Fashion (statistics) – Wikipedia (wikipedia.org)

Analysis of samples for normal distribution

There are quite a few statistical tests available to determine whether a sample follows a normal distribution:

Shapiro-Wilk criterion,
asymmetry and excess criterion,
Durbin's criterion,
D'Agostino criterion,
excess criterion,
Vasicek criterion,
David-Hartley-Pearson criterion,
chi-square test,
Anderson-Darling criterion,
Philliben criterion.

The problem is that there are no implementations of tests for normality of distribution in PostgreSQL yet. Statistics in PostgreSQL are generally poor.

Therefore, for a practical solution to the problem, it was decided to greatly simplify the process of testing the sample for normal distribution.

To simplify the process of checking the sample for approximation to a normal distribution, the following technique was proposed using the PostgreSQL function – normal_rand

normal_rand(int numvals, float8 mean, float8 stddev) returns setof float8

Функция normal_rand выдаёт набор случайных значений, имеющих нормальное распределение (распределение Гаусса).

Параметр numvals задаёт количество значений, которое выдаст эта функция. Параметр mean задаёт медиану нормального распределения, а stddev — стандартное отклонение.

Given the values of the number of observations, median and standard deviation, using the function normal_rand a test sample is being formed
The assessment of the sample's approximation to normal is performed on the basis of the obtained value of the variance of the differences between the values of the original sample and the sorted values obtained using normal_rand

There are 2 approaches you can use:

We sort the sample by the median value of performance. Because we are looking for a sample in which the performance is maximum.
We sort the sample by the dispersion value. That is, we look for a sample with the minimum dispersion of performance values.

Practical implementation

Scenario

Test load testing for 4 days.

19.08.2024 14:00 – 16:00
20.09.2024 11:00 – 17:00
21.09.2024 10:00 – 17:00
22.09.2024 10:00 – 17:00

During the test load testing and formation of the initial set, a set of 235 samples satisfying the condition Median = Mode.

Preparing samples for normal distribution fit analysis

Maximum performance value

Fig.1 Maximum performance value

Fig. 3. Probability distribution for the period 12:01-13:01 08/21/2024

Fig. 4. Probability distribution for the period 12:01 - 13:01 08/21/2024 - graph — Fig. 4. Probability distribution for the period 12:01 – 13:01 08/21/2024 – graph

Minimum variance
Fig.5. Minimum variance of performance
Fig. 6. Probability distribution for the period 10:32 – 11:32 08/21/2024

Fig. 7. Probability distribution for the period 10:32 - 11:32 - graph — Fig. 7. Probability distribution for the period 10:32 – 11:32 – graph

Testing for normal distribution, generating test sample using normal_rand

Maximum performance value

Fig.11. Difference table between the original and test samples

Fig.12. Variance according to the difference table

Minimum variance

Fig. 17. Dispersion.

Result of the experiment

As can be seen from the comparison, quite expectedthe original sample with the minimum dispersion value is closest to the normal distribution and therefore is the solution to the problem.

Thus, based on the results of the load testing conducted during the period:

19.08.2024 14:00 – 16:00
20.09.2024 11:00 – 17:00
21.09.2024 10:00 – 17:00
22.09.2024 10:00 – 17:00

The following results of DBMS performance can be recorded under this load, in this infrastructure:

Performance value: 2028
Lower bound for performance degradation: 2022
Performance degradation: -1.28%

Statistical analysis of the results of load testing of DBMS in cloud infrastructure conditions

Analysis of samples for normal distribution

Practical implementation

Scenario

Preparing samples for normal distribution fit analysis

Testing for normal distribution, generating test sample using normal_rand

Result of the experiment

Monitoring NetApp Volumes over HTTP

Testing an Integer Adder with AXI-Stream Interfaces on SystemVerilog

Large language models – a race to a dead end or a breakthrough into the future?

Roblox, Scratch and more

How the data management platform in Yandex Go has evolved. Yandex report

Qbot is back. Varonis presented a detailed analysis of the Qbot banking trojan

Leave a Reply Cancel reply

Analysis of samples for normal distribution

Practical implementation

Scenario

Preparing samples for normal distribution fit analysis

Testing for normal distribution, generating test sample using normal_rand

Result of the experiment

Similar Posts

Leave a Reply Cancel reply