E1.S: micro … Supermicro

In the comments to one of the previous posts, we were asked about the tests of the Supermicro platform based on E1.S.

Fortunately, it turned out that at that moment we had a group of servers available in a laboratory in the Netherlands. So we ran the tests and are now ready to tell you how it went.

But first, let’s briefly explain what kind of disks they are and what they are for. Let’s start from the king of peas from the basics.

How the EDSFF form factor came about

Having got rid of outdated data access methods, NVMe drive manufacturers have taken up the form factor. This is how the new EDSFF standard appeared, which, according to Intel developers, was better adapted to operate in a data center environment. EDSFF, which stands for Enterprise and Data Center SSD Form Factor, and for many is known as Ruler, was created to solve just one problem: to provide the minimum total cost of storage on Flash drives on a data center scale on a simple principle “The drive takes up less space => you can shove more 1PB drives into 1 unit => we get less cost at the exit”.

But, of course, there are indicators that had to be sacrificed with this approach – for example, performance per drive.

Supermicro has servers that support two types of drives: long and short. These form factors are described in the SNIA specifications:

In addition to the direct density of TB, IOps and GBps, drives from the EDSFF family can significantly reduce power consumption. Intel and Supermicro in their marketing materials say tens of percent compared to U.2.

Unfortunately, in our laboratory we cannot verify this statement 🙂

How did our tests go?

So, our colleagues from Supermicro have installed an SSG-1029P-NES32R server in a Dutch laboratory. It is positioned as a server for working with databases, IOps-intensive applications and hyper-converged infrastructures. It is based on the X11DSF-E motherboard with 2 sockets for installing the second generation Intel Xeon Scalable processor – in our case there were a pair of Intel® Xeon® Gold 6252 processors, 8 32GB DDR4-2933 MHz memory sticks and 32 Intel® SSDs DC P4511 Series. The platform, by the way, supports Intel® Optane ™ DCPMM.

There were the following interfaces for communicating with the “outside world”:

  • 2 PCI-E 3.0 x16 (FHHL) slots,

  • 1 PCI-E 3.0 x4 (LP) slot.

It is worth noting that Supermicro supplies such platforms only assembled. As a vendor we understand them, but as a potential buyer we do not approve 🙂

We will not list the rest of the technical characteristics – if anything, all the information is available at website vendor. Better to pay a little more attention to the features of the test configuration and test results.

FIO configurations

[global]

filename=/dev/era_dimec

ioengine=libaio

direct=1

group_reporting=1

runtime=900

norandommap

random_generator=tausworthe64

[seq_read]

stonewall

rw=read

bs=1024k

offset_increment=10%

numjobs=8

[seq_write]

stonewall

rw=write

bs=1024k

offset_increment=10%

numjobs=8

[rand_read]

stonewall

rw=randread

bs=4k

numjobs=48

[rand_write]

stonewall

rw=randwrite

bs=4k

numjobs=48

iodepth was varied with a script.

system configuration

OS: Ubuntu Server 20.04.3 LTS

Kernel: 5.11.0-34-generic

raidix-era-3.3.1-321-dkms-ubuntu-20.04-kver-5.11

BOOT_IMAGE=/vmlinuz-5.11.0-34-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off tsx=on tsx_async_abort=off mitigations=off

So, we had 32 Intel® SSD DC P4511 installed in our system.

Like us and advised earlier, you need to start with the calculation – and how much can you squeeze out of them in theory?

According to the specification, the capabilities of each drive are as follows:

  • maximum sequential read speed – 2800 MB / s;

  • maximum sequential write speed – 2400 MB / s;

  • random read speed – 610 200 IOps (4K Blocks).

  • random write speed – 75,000 IOps (4K Blocks).

But when we ran a simultaneous test of the performance of all disks, we reached 9999 thousand IOps.

Almost 10 million! Although the expected performance should be close to 20 million IOps … We immediately thought about the need to straighten our arms, but a detailed study showed that the problem lies in oversubscription over PCIe lines. In such a system, the maximum per drive can be obtained only at half load by drives:

By reducing the block size to 512b, we were able to achieve a total disk performance of 12 million IOps for reading and 5 million IOps for writing.

It’s a shame, of course, to lose half the performance, but 12 million IOPs per 1U is more than enough. It is unlikely that you will find an application that with 2 sockets will provide such a load on the storage system.

But we wouldn’t be us …

if we hadn’t run two tests with RAIDIX on board 🙂

As usual, we tested RAIDIX ERA in comparison with mdraid. Here is a summary of the results at 1U:

Parameter

RAIDIX ERA
RAID 5/50

LINUX SW
RAID 10

LINUX SW
RAID 5

Rand Read Performance (IOps / Latency)

11,700,000
0.2 ms

2,700,000
1.7 ms

2,000,000
1.5 ms

Rand Write Performance (IOps / Latency)

2,500,000
0.6 ms

350,000
5.3 ms

150,000
10 ms

Useful capacity

224 TB

128 TB

250 TB

Sequential Read Performance

53 GBps

56.2 GBPS

53.1 GBPS

Sequential Write Performance

45 GBps

24.6 GBps

1.7 GBps

Sequential Read performance in degraded mode

42.5 GBps

42.6 GBps

1.4 GBps

Mean CPU load at MAX perf

13%

24%

37%

We also got results for ERA and for various loads:

Workload / Configuration

Performance

4k Random Reads / 32 drives RAID 5

9,999,000 IOps,
latency 0.25 ms

512b Random Reads / 32 drives RAID 5, 512b BS

11,700,000 IOps,
latency 0.2 ms

4k Random Reads / 16 drives RAID 5

5 380 000 IOps,
latency 0.25 ms

512b Random Reads / 16 drives RAID 5, 512b BS

8,293,000 IOps,
latency 0.2 ms

4k Random Writes / 32 drives RAID 50

2 512 000 IOps,
latency 0.6 ms

512b Random Writes / 32 drives RAID 50, 512b BS

1,644,000 IOps,
latency 0.7 ms

4k Random Writes / 16 drives RAID 50

1,548,000 IOps,
latency 0.6 ms

512b Random Writes / 16 drives RAID 50, 512b BS

859,000 IOps,
latency 0.7 ms

1024k Sequential Reads / RAID 5, RAID 50

53 GBps

1024k Sequential Writes / RAID 50, ss = 64, merges = 1,, mm = 7000, mw = 7000

45.6 GBps

If interested, our method of preparing the system is described in this article… Of course, all the discs were prepared for tests with the SNIA methodology. When starting the load, we varied the load (queue depth).

You may ask – at what queue depth was the specified delay received?

This is the minimum delay in reaching the performance shelf. On average, that’s about 16:

During the tests, we encountered another feature of the platform (or rather, drives from Intel): with a low offset_increment value, these devices begin to lose performance, and it is very noticeable. It looks like they really don’t like it when between calls for the same LBA a short time passes.

conclusions

The scenarios for using a system based on the SSG-1029P-NES32R platform, of course, are not limitless. The reasons lie in the rather high cost of the system and the small number of PCIe slots for such a capacity of the storage subsystem.

On the other hand, we managed to achieve excellent performance results, which is rare for E1.S drives. We, of course, knew that RAIDIX ERA would overclock IOps, but once again to witness an advantage of 5-10 times on Random, and even with a CPU load of 13% – this is always nice.

Do you need all this? Ask yourself (and a couple of other people at work). It happens that everything suits, and then let there be a server, the form factor and performance remain “as is.” If you want something more modern and faster, then you just read about one alternative.

And we will continue to write about interesting tests, including at your request.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *