Elbrus VS Intel. Comparing the performance of storage systems Aerodisk Vostok and Engine

Hello. We continue to acquaint you with the Aerodisk VOSTOK data storage system based on the Russian processor Elbrus 8C.

In this article, we (as promised) will analyze in detail one of the most popular and interesting topics related to Elbrus, namely performance. There are quite a lot of speculations on the performance of Elbrus, and absolutely polar ones. Pessimists say that the performance of Elbrus is now “none”, and it will take decades to catch up with the “top” producers (that is, in the current reality, never). On the other hand, optimists say that Elbrus 8C is already showing good results, and in the next couple of years with the release of new processor versions (Elbrus 16C and 32C) we will be able to “catch up and overtake” the world’s leading processor manufacturers.

We at Aerodisk are practical people, so we went the simplest and most understandable (for us) way: to test, record the results and only then draw conclusions. As a result, we conducted a fairly large number of tests and discovered a number of features of the Elbrus 8C e2k architecture (including pleasant ones) and, of course, compared this with similar storage systems based on Intel Xeon processors of the amd64 architecture.

By the way, we will talk in more detail about tests, results and the future development of storage systems on Elbrus at our next webinar “About IT” on 15.10.2020 at 15:00. You can register at the link below.

REGISTRATION FOR THE WEBINAR

Test stand

We have created two stands. Both stands consist of a Linux server connected via 16G FC switches to two storage controllers, in which 12 SAS SSD 960 GB disks are installed (11.5 TB “raw capacity” or 5.7 TB “usable” capacity, if use RAID-10).

Schematically, the stand looks like this.

Stand No. 1 e2k (Elbrus)

The hardware configuration is as follows:

  • Linux server (2xIntel Xeon E5-2603 v4 (6 cores, 1.70Ghz), 64 GB DDR4, 2xFC adapter 16G 2 ports) – 1 pc.
  • FC 16 G switch – 2 pcs.
  • Storage Aerodisk Vostok 2-E12 (2xElbrus 8C (8 cores, 1.20Ghz), 32 GB DDR3, 2xFE FC-adapter 16G 2 port, 12xSAS SSD 960 GB) – 1 pc

Booth # 2 amd64 (Intel)

For comparison with a similar configuration on e2k, a similar storage configuration was used with a similar processor in terms of characteristics on amd64:

  • Linux server (2xIntel Xeon E5-2603 v4 (6 cores, 1.70Ghz), 64 GB DDR4, 2xFC adapter 16G 2 ports) – 1 pc.
  • FC 16 G switch – 2 pcs.
  • Aerodisk Engine N2 storage (2xIntel Xeon E5-2603 v4 (6 cores, 1.70Ghz), 32 GB DDR4, 2xFE FC-adapter 16G 2 port, 12xSAS SSD 960 GB) – 1 pc

Important note: the Elbrus 8C processors used in the test only support DDR3 RAM, this is of course “bad, but not for long.” Elbrus 8СВ (we don’t have it yet, but will soon be available) supports DDR4.

Testing technique

To generate the load, we used the popular and time-tested Flexible IO (FIO) software.

Both storage systems are configured according to our recommendations for tuning, based on the requirements for high performance on block access, therefore we use DDP (Dynamic Disk Pool) disk pools. In order not to distort the test results, we disable compression, deduplication and RAM cache on both storage systems.

8 D-LUNs were created in RAID-10, 500 GB each, the total usable volume is 4 TB (i.e., approximately 70% of the possible usable capacity of this configuration).

The main and popular scenarios for using storage systems will be executed, in particular:

the first two tests simulate the operation of a transactional DBMS. In this group of tests, we are interested in IOPS and latency.

1) Random read in small 4k blocks
a. Block size = 4k
b. Read / Write = 100% / 0%
c. Number of jobs = 8
d. Queue depth = 32
e. Load type = Full Random

2) Random recording in small 4k blocks
a. Block size = 4k
b. Read / Write = 0% / 100%
c. Number of jobs = 8
d. Queue depth = 32
e. Load type = Full Random

the second two tests simulate the work of the analytical part of the DBMS. In this group of tests, we are also interested in IOPS and latency.

3) Sequential reading in small 4k blocks
a. Block size = 4k
b. Read / Write = 100% / 0%
c. Number of jobs = 8
d. Queue depth = 32
e. Load type = Sequential

4) Sequential recording in small 4k blocks
a. Block size = 4k
b. Read / Write = 0% / 100%
c. Number of jobs = 8
d. Queue depth = 32
e. Load type = Sequential

The third group of tests emulates the work of streaming reading (for example, online broadcasts, restoring backups) and streaming recording (for example, video surveillance, recording backups). In this group of tests, we are not interested in IOPS, but MB / s and also latency.

5) Sequential read in large 128k blocks
a. Block size = 128k
b. Read / Write = 0% / 100%
c. Number of jobs = 8
d. Queue depth = 32
e. Load type = Sequential

6) Sequential write in large 128k blocks
a. Block size = 128k
b. Read / Write = 0% / 100%
c. Number of jobs = 8
d. Queue depth = 32
e. Load type = Sequential

Each test will last one hour, excluding the 7 minute warm-up time for the array.

Test results

The test results are summarized in two tables.

Elbrus 8S (SHD Aerodisk Vostok 2-E12)

Intel Xeon E5-2603 v4 (Storage Aerodisk Engine N2)

The results are very interesting. In both cases, we utilized the storage processor power well (70-90% utilization), and in this situation the pros and cons of both processors are clearly evident.

In both tables, the tests where the processors “feel confident” and show good results are highlighted in green, while situations that the processors “do not like” are highlighted in orange.

If we talk about the random load of small blocks, then:

  • in terms of random reading, Intel is definitely ahead of Elbrus, the difference is 2 times;
  • from the point of view of random recording, it is definitely a draw, both processors showed approximately equal and decent results.

In a sequential load with small blocks, the picture is different:

  • Intel is significantly (2 times) ahead of Elbrus in both reading and writing. At the same time, if Elbrus’s IOPS is lower than Intel’s, but looks decent (200-300 thousand), then there is an obvious problem with delays (they are three times higher than Intel’s). Conclusion, the current version of Elbrus 8C “does not like” sequential loads in small blocks. There is clearly something to work on.

But in a sequential load with large blocks, the picture is exactly the opposite:

  • both processors showed approximately equal results in MB / s, but there is one BUT…. The latency rates for Elbrus are 10 (ten, Karl !!!) times better (i.e. lower) than those of a similar processor from Intel (0.4 / 0.5 ms versus 5.1 / 6.5 ms) … At first we thought it was a glitch, so we double-checked the results, did a second test, but the second test showed the same picture. This is a serious advantage of Elbrus (and e2k architecture in general) over Intel (and, accordingly, amd64 architecture). Let’s hope that this success will be further developed.

There is another interesting feature of Elbrus, which an attentive reader can pay attention to by looking at the table. If you look at the difference in read and write performance from Intel, then in all tests, reading outstrips writing by an average of about 50% +. This is the norm to which everyone (including us) is accustomed. If you look at Elbrus, the write rates are much closer to the read rates, reading outstrips writing, as a rule, by 10 – 30%, no more.

What does this mean? The fact that Elbrus “loves” writing very much, and this, in turn, suggests that this processor will be very useful in tasks where writing clearly prevails over reading (who said Yarovaya’s law?), Which is also an undoubted advantage e2k architecture, and this advantage needs to be developed.

Conclusions and near future

Comparative tests of Elbrus and Intel mid-range processors for data storage tasks showed approximately equal and equally decent results, with each processor showing its own interesting features.

Intel outperformed Elbrus greatly in random reads in small blocks, and in sequential reads and writes in small blocks.

When randomly writing in small blocks, both processors show equal results.

In terms of latency, Elbrus looks much better than Intel in streaming load, i.e. in sequential reading and writing in large blocks.

In addition, Elbrus, unlike Intel, copes equally well with both read and write loads, while Intel’s reading is always much better than writing.
Based on the results obtained, it can be concluded that the Aerodisk Vostok data storage systems based on the Elbrus 8C processor are applicable in the following tasks:

  • information systems with a predominance of write operations;
  • file access;
  • online broadcasts;
  • CCTV;
  • backup;
  • media content.

The MCST team still has something to work on, but the result of their work is already visible, which, of course, cannot but rejoice.

These tests were carried out on the Linux kernel for e2k version 4.19, at the moment in beta tests (in MCST, in Basalt open source software, as well as in our Aerodisk) there is a Linux kernel 5.4-e2k, which, among other things, has been seriously reworked scheduler and many optimizations for high-speed solid state drives. Also, specifically for the kernels of the 5.х.х branch, MCST JSC is releasing a new compiler LCC version 1.25. According to preliminary results, on the same Elbrus 8C processor, a new kernel assembled by a new compiler, a kernel environment, system utilities and libraries and, in fact, the Aerodisk VOSTOK software will allow you to get an even more significant performance gain. And this without replacing equipment – on the same processor and with the same frequencies.

We expect the release of the version of Aerodisk VOSTOK based on kernel 5.4 by the end of the year, and as soon as the work on the new version is completed, we will update the test results and publish them here as well.

If now we return to the beginning of the article and answer the question, who is right: pessimists who say that Elbrus is “no” and will never catch up with leading processor manufacturers, or still optimists who say that “we have almost caught up and will soon overtake “? If we proceed not from stereotypes and religious prejudices, but from real tests, then the optimists are definitely right.

Elbrus is already showing good results when comparing it with mid-range amd64 processors. Elbrus 8-ke is certainly far from the top in the line of Intel or AMD server processors, of course, but she did not aim there, for this 16C and 32C processors will be released. Then we’ll talk.

We understand that after this article there will be even more questions about Elbrus, so we decided to organize another online webinar “About IT” in order to give answers to these questions live.

This time our guest will be the Deputy General Director of MCST, Konstantin Trushkin. You can sign up for the webinar at the link below.

REGISTRATION FOR THE WEBINAR

Thank you all, as usual we are waiting for constructive criticism and interesting questions.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *