results you don't expect

Hi all! My name is Konstantin Maloletov, I am a cloud services architect at Arenadata. Today I want to tell you how we solve the problem of efficiently hosting resource-intensive systems, such as Arenadata DB, in the cloud.

Arenadata DB (ADB) is…

massively parallel analytical DBMS based on the open Greenplum project. Currently, due to the archiving of the Greenplum project in May 2024, ADB is moving to the new open source Greengage DB project as its basis. As an enterprise data warehouse solution built on architecture MPP,ADB requires high performance of computing resources, network, disk subsystem and stability of each cluster node.

In this article we will look at several scenarios for using computing resources and their impact on the operation of ADB, and also share the results of the tests performed.

Purpose of the study

The purpose of the study was to determine how the density of virtual machines (and, accordingly, the level of competition for computing resources) on a physical server (in this case, we mean a physical server with an installed hypervisor that allows running virtual machines) affects the performance and stability of the Arenadata DB cluster .

In order to smoothly move from the density of virtual machines (hereinafter referred to as VM – Virtual machine) to the scenarios under study, you will have to make a small theoretical digression.

A bit of boring theory

Here you need to remember how the Linux computing resource scheduler works, taking into account technology SMTin particular Intel Hyper-threading. I will not dwell on the technology in detail, I will only remind you that each physical processor core is represented in the operating system by two logical processors, to each of which the scheduler can assign its own thread of execution.

As long as the number of execution threads requiring CPU time is less than or equal to the available number of physical cores, the scheduler will distribute these threads to logical processors belonging to different physical cores (the scheduler knows which physical core a particular logical processor belongs to). That is, in fact, each execution thread is allocated the entire physical core for use.

When the number of threads requiring CPU time exceeds the available number of physical cores, the scheduler begins to use those logical processors whose physical core neighbors are already occupied by another thread. Thus, it begins to distribute two threads of execution per physical core.

This behavior of the scheduler is also true for the distribution of VM execution threads, if we are talking about the scheduler of a physical server. The operating system installed on the VM also has its own scheduler, which distributes execution threads inside virtual machines to virtual processors (vCPUs) allocated to them.

It is worth noting that everything described above is standard scheduler behavior. When hosting Arenadata DB in the cloud, we recommend using CPU pinning/processor affinity. It modifies the standard behavior of the scheduler and allows the VM not to depend on the number and load of virtual machines neighboring on the physical server, and also to obtain consistently the same performance for each cluster node, which is important for MPP systems. Unfortunately, if we talk about public Russian cloud providers, this technology is not implemented in most cases.

VM Density Scenarios Investigated

Scenario #1: “Perfect.”

In an ideal scenario, the total number of vCPUs of all VMs located on a physical server should not exceed the number of physical cores on this server. With this scenario, we wanted to emulate a situation where each VM execution thread is allocated the entire physical core for use.

Legend:

  • VM – virtual machine;

  • vCPU – virtual processor;

  • Scheduler – physical server scheduler;

  • T – logical processor;

  • pCore – physical core;

  • CPU – physical processor;

  • Server – physical server.

In practice, the ideal scenario can be obtained, for example, by renting dedicated servers or server clusters and, by technical or administrative means, meeting the condition: the sum of all vCPUs is less than the sum of all physical cores. Thus, you can expect performance close to that of a baremetal installation. That is why the scenario is conventionally called ideal.

Scenario #2: “Basic”.

In the base scenario, the total number of vCPUs of all VMs located on a physical server should have been greater than the number of physical cores on this server, but less than the number of logical processors (or twice the number of physical cores). In this scenario, we emulated a situation where part of the VM execution threads shares the same physical core.

The vast majority of cloud providers have hyper-threading functionality enabled and one vCPU is equal to one logical processor. We did not consider scheduling two threads of execution on one physical core with hyper-threading to be oversubscription.

That is, in practice, this is a cloud without oversubscription. We conditionally called this scenario basic because it satisfies our recommendations about the absence of oversubscription and at the same time can be easily implemented in any cloud.

Scenario #3: “Oversubscription.”

In the “Oversubscription” scenario, the total number of vCPUs of all VMs located on a physical server must exceed the number of logical processors (twice the number of physical cores). This is how we emulated oversubscription, when the number of VM execution threads waiting for processor time is greater than the number of logical processors in the physical server.

In practice, this situation may arise when you purchase computing resources that are oversubscribed.

For clarity, I will summarize the conditions of all scenarios in a table:

Scenario #1

Scenario #2

Scenario #3

Oversubscription

No

No

Yes

Target VM Density

Σ vCPU < Σ pCore

Σ pCore < Σ vCPU <ΣT

Σ T < Σ vCPU

It’s worth making a reservation here that the diagrams are specifically simplified for clarity of VM placement. Of course, we did not use a single-processor three-core server and a VM with two vCPUs in our tests. I will give a description of the real test bench below.

Test environment

The public cloud of one of the Russian cloud providers, built on the basis of the Linux + Qemu + KVM combination, was used as a research site. Each physical server had 48 physical cores (pCore) with support for Hyper-Threading technology, resulting in 96 logical processors (T). If we talk about VM placement density, then a total of 96 vCPUs is the limit after which oversubscription begins. In fact, the real limit is even lower if you take into account the processor resources that the hypervisor itself uses for its needs.

The Arenadata DB cluster was deployed in the following configuration:

Parameter

Meaning

Arenadata DB version

6.25.1_arenadata49_b2-1 enterprise

Number of VM segment servers

8

VM segment server parameters

16 vCPU / 128 GB RAM

Number of logical segments on the server segment

6

Amount of compressed data in the database, GB

406

This is a typical configuration for basic testing of cloud provider infrastructure.

For clarity, I consider it necessary to supplement the previously given table of scenarios with the actual placement of a VM in the cloud:

Scenario #1

Scenario #2

Scenario #3

Oversubscription

No

No

Yes (1:1.33)

Target VM Density

Σ vCPU < Σ pCore

Σ pCore < Σ vCPU <ΣT

Σ T < Σ vCPU

Real VM Density

1 VM per physical server

16 < 48

4 VMs per physical server

48 < 64 < 96

8 VM per physical server

96 < 128

Number of physical servers

8

2

1

The oversubscription value for scenario #3 is 1:1.33 (128 vCPU / 96 T). Here the question may be quite appropriate: why was the ratio 1:1.33 taken? Why not 1:2, not 1:3? The answer is simple: this value was obtained after placing the entire Arenadata DB cluster on one physical server.

Testing tool

As the main tool for testing the performance and stability of Arenadata DB, we use our fork of the industry standard TPC-DS. The performance comparison parameter is the execution time of queries in multi-user mode (the CONCURRENCY parameter specifies the number of concurrent users). The original test consisted of 99 queries executed sequentially within a single user session. In our fork, we identified the 51 fastest requests and called this test lite. This test allows you to evaluate the performance of a cluster under a high parallel load (up to 100 user sessions inclusive) within an acceptable time frame. We can say that this is a stress test for the computing power of a physical server. We called the original set of 99 requests “full” and use it to fully load the input/output subsystem of the physical server. For such a test, a high parallel load is not needed (we test up to 8 user sessions).

Results

For each scenario, three measurements of the lite test series and three measurements of the full test series were carried out. A series of tests – one measurement with a sequential stepwise increase in the CONCURRENCY parameter. The table below shows the average execution times for all queries given CONCURRENCY. The time values ​​are easy to interpret – the faster the test is completed, the better.

An additional metric was also used to assess the lack of computing resources: the number of datagram delivery failure warnings in the Arenadata DB logs. The fact is that ADB uses the UDP protocol as a network transport by default, and implements delivery control itself. If there is no confirmation of delivery of the datagram from the receiving segment, the sending segment repeats its sending, and when 100 such unsuccessful repetitions accumulate in a row, it records one warning in the event log. In a virtual environment, we quite often see similar warnings on large CONCURRENCY. They are usually associated with high competition for computing resources and their insufficiency in order to process network traffic within the allotted timeouts. The later (at large CONCURRENCY values) these warnings appear and the fewer there are, the better. Ideally, they shouldn't exist at all.

The ideal scenario No. 1 was taken as a standard. The change in time in scenarios No. 2 and 3 was calculated relative to it. Positive % values ​​mean an increase in the execution time of user requests.

Test type

Concurrency

Scenario 1, Time

Scenario 2, Time

Scenario 3, Time

Scenario 2 vs Scenario 1

Scenario 3 vs Scenario 1

Scenario 1, Network Alerts

Scenario 2, network warnings

Scenario 3, network warnings

Lite

1

0:03:08

0:03:07

0:04:07

0%

32%

0

0

0

4

0:04:26

0:07:26

0:12:27

68%

181%

0

0

0

8

0:08:51

0:13:53

0:23:53

57%

170%

0

0

0

12

0:13:18

0:20:19

0:51:40

53%

288%

0

0

0

16

0:18:04

0:26:46

1:22:46

48%

358%

0

0

0

20

0:22:27

0:33:52

1:00:14

51%

168%

0

0

80

24

0:27:33

0:40:38

1:11:38

47%

160%

0

16

312

28

0:32:00

0:47:22

x

48%

x

0

64

x

32

0:36:23

0:54:27

x

50%

x

0

48

x

64

1:12:04

1:48:17

x

50%

x

0

80

x

100

1:52:42

2:50:45

x

52%

x

0

64

x

Full

1

0:45:08

0:56:08

1:17:40

24%

72%

0

0

0

2

0:56:14

1:19:15

2:06:50

41%

126%

0

0

48

4

1:35:27

2:25:58

3:59:38

53%

151%

0

0

0

8

3:13:18

4:58:25

8:12:16

54%

155%

0

0

217

Median %

50%

160%

x – with CONCURRENCY = 24, only one measurement out of three was completed, starting with CONCURRENCY = 28 there are no successfully completed measurements.

The performance degradation in scenario #2 is explained by the logic of the Linux task scheduler described above. In this scenario, all vCPUs (64) lacked physical cores (48), and therefore hyper-threading was actively used. It is reasonable to note that when placing a VM with 96 vCPUs instead of 64 (still without oversubscription), performance will drop even more.

In scenario #3, the situation gets significantly worse. Of the 96 available logical processors, 128 are simultaneously requested (without taking into account the needs of the hypervisor itself). Execution threads are forced to stand idle in queues for processor resources, which results in an increase in query execution time by two to three times! Moreover, in scenario No. 3, in the process of running lite tests, the cluster virtual machines became unavailable, and a manual reboot was required, followed by rebalancing of the segments to restore the cluster's functionality. During the second measurement, a full copy of the data was required to restore the segments. At the third time, all VMs in the cluster went into an unavailable state.

Additionally, it is worth noting that as the density of VM placement on a physical server increases, network warnings begin to appear earlier. So, in scenario No. 1 they were completely absent, in scenario No. 2 they appeared starting with CONCURRENCY = 24, in scenario No. 3 – with CONCURRENCY = 20.

Below are graphs of CPU usage % and CPU steal, also in percentages. If I think everything is clear with CPU usage, then let me briefly remind you about steal time. This is the time during which the virtual machine does not receive processor resources for execution, that is, it is ready to execute code and is waiting for processor resources that are currently occupied. This metric is sent by the hypervisor inside the VM, and we recorded it from inside the VM, just like CPU usage.

Scenario #1 lite

Scenario #1 full

Scenario #2 lite

Scenario #2 full

Scenario #3 lite

Scenario #3 full

The graphs show that CPU steal in scenarios No. 1 and 2 show near-zero values, while in scenario No. 3 peaks of up to 40% can be observed, which confirms the shortage of processor resources.

Well, for the last time I will give a table of scenarios, supplementing it with research results:

Scenario #1

Scenario #2

Scenario #3

Oversubscription

No

No

Yes (1:1.33)

Target VM Density

Σ vCPU < Σ pCore

Σ pCore < Σ vCPU <ΣT

Σ T < Σ vCPU

Real VM Density

1 VM per physical server

16 < 48

4 VMs per physical server

48 < 64 < 96

8 VM per physical server

96 < 128

Number of physical servers

8

2

1

Increased query execution time relative to scenario No. 1

Reference

+50%

+160%

Cluster crash recorded

No

No

Yes

Results

Scenario #1 ensures the best performance and stable operation of the Arenadata DB cluster, as well as the same performance of each cluster node. According to these indicators, it is closest to placement on baremetal. This scenario requires hosting a private solution in the cloud.

VM placement by scenario number 2 is what you can expect in the cloud without oversubscription. The performance of the cluster is lower than in scenario No. 1, but, in any case, it works stably, without crashes. The performance of cluster nodes is not constant; it depends on the presence and “noisiness” of neighbors on the physical server. Consistent performance of nodes can be ensured by additional technical means, for example, the use of CPU pinning technology.

Scenario #3 using oversubscription of computing resources significantly slows down the work of the Arenadata DB cluster, and when a certain load threshold is reached, it may drop. The performance of cluster nodes is also inconsistent.

The study showed that the Arenadata DB cluster can be launched in an oversubscribed cloud environment and that it is capable of operating in such an environment. However, such placement leads to a significant deterioration in the performance and stability of the cluster. Possible system crashes and, as a result, downtime in servicing users make using this approach extremely risky. This behavior of the system does not allow us to provide high-quality technical support services. In this regard, it is strongly recommended not to use oversubscription of computing resources in production environments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *