On the issue of the stochastic nature of DBMS and problems with load testing in a cloud environment
Background to the study
Research of the hypothesis DBMS is by its nature stochastic, and not deterministic system.
In order to verify the statement and in connection with the start of work on the preparation of a methodology for statistical analysis of DBMS in a cloud environment, a series of experiments were started to determine the impact of external/random infrastructure factors on DBMS performance.
Testing tool and scenario
Standard tools are used for testing – utility pgbench
Test scenario and parameters
pgbench_init_param= –no-vacuum –quiet –foreign-keys –scale=100 -i test_pgbench
pgbench_param= –progress=60 –protocol=extended –report-per-command –jobs=1 –client=100 –time=14400 test_pgbench
The initial series of experiments consists of 4 measurements of statistical indicators of the state and performance of the DBMS over the course of 1 hour.
To reduce the impact of performance indicator emissions, median smoothing with a period of 10 minutes.
DBMS performance is calculated using the methodology described in Correlation Analysis for Resolving DBMS Performance Incidents
Observation results 1st hour
DBMS performance statistics
Probability distribution
Correlation between wait events and DBMS performance
For simplicity, only events with a correlation coefficient > 0.5 and a percentage of observations > 50% are shown.
Observation results 2nd hour
DBMS performance statistics
Probability distribution
Correlation between wait events and DBMS performance
For simplicity, only events with a correlation coefficient > 0.5 and a percentage of observations > 50% are shown.
Comparison with the result of the previous hour
Productivity – decreased
Statistical indicators – have changed slightly
The waiting events with the largest correlation coefficient modulus have remained virtually unchanged
Observation results 3rd hour
DBMS performance statistics
Probability distribution
Correlation between wait events and DBMS performance
For simplicity, only events with a correlation coefficient > 0.5 and a percentage of observations > 50% are shown.
Comparison with the result of the previous hour
DBMS performance – decreased
The dispersion of DBMS performance indicators has increased
The event with the largest correlation coefficient in modulus is IO/DataFileImmediateSync which was absent in previous observations.
Waiting for immediate synchronization of the relation data file with reliable storage.
Apparently, this event, which has a significant impact on the performance of the DBMS, was caused by a change in the state of the infrastructure.
Observation results 4th hour
DBMS performance statistics
Probability distribution
Correlation between wait events and DBMS performance
For simplicity, only events with a correlation coefficient > 0.5 and a percentage of observations > 50% are shown.
Comparison with the result of the previous hour
DBMS performance – increased
Dispersion of DBMS performance indicators has decreased
IO/DataFileImmediateSync – has no significant correlation with performance
IPC/BufferIO Waiting – Correlated with DBMS Performance
Waiting for buffered I/O to complete.
Preliminary results
During the observations, a significant spread of performance indicators was established under the same load on the DBMS.
The dispersion of DBMS performance varies over a fairly wide range.
Waiting events correlated with DBMS performance are generally not constant.
Using the results of load testing to analyze the impact of changes in the DBMS configuration parameters and conducting load testing when conducting a single test cannot be reliable due to the unpredictable impact of the infrastructure on the performance of the DBMS.
To conduct load testing and analyze the impact of changes in DBMS configuration parameters on DBMS performance in a cloud infrastructure environment, a series of tests and statistical analysis of the results are required.
The results of load testing are probabilistic in nature.