Testing SSDs. To increase IOPS, you just need to… change the PCIe version?

As you have already read from the lead to the article, we will torture the SSD drive with benchmarks on motherboards with different versions of the PCIe interface for it. On the “operating table” for testing we got a Samsung PM9A3 U.2 with a capacity of 1.92TB. From SSD for ordinary consumers, this miracle of the South Korean semiconductor industry differs in a format similar to the usual 2.5 SATA SSD, but differs in the connector format, both from SATA and from M.2. In terms of characteristics, it is similar to M.2. In our case, in both tests with 4 PCIe lines. This miracle is located in our hotswap basket for U.2 disks, which, in turn, is connected to the SlimSAS connector, or as in our second test through the Riser in the PCI slot.

Initially, when testing this SSD, we were interested in how it would perform under various loads with the software stack we were using. Only at the very end of our tests did we move it to a server on a different motherboard, where instead of PCI 3.0 there was support for 4.0. We did not expect any changes from the change of versions, believing that, most likely, the controller is within the bandwidth of version 3. However, we were in for a surprise.

Preparing for the experiment

First, let's define units of measurement – the most painful and confusing part when it comes to benchmarks.

Let's start with volume. WITH 1998 An organization whose name you will forget immediately after reading (the International Electrotechnical Commission (IEC) decided to standardize units of measurement in the computer industry in order to reduce chaos and confusion of formats. In practice, chaos has increased. Because the confusion with the already existing units was joined by the confusion with the “official” ones, added within the SI system.
If earlier multiples of 2, quantities so familiar to computer scientists, were named using decimal prefixes kilo(KB), mega(MB) and giga(GB)then it was now proposed to use kibi(KiB), mibi(MiB), gibi(GiB). The byte was not touched: both there and there 1 byte 8 bits (fortunately, there are no babites, babaites or bebeys).
Drive manufacturers typically use decimal form. At least because it simply looks like a large number in the specifications. When converted to binary, 1 terabyte (TB) will not be 1 tebibyte (TiB), but ~0.9.

It is calculated as follows: number of bits in 1 (TB) / number of bits in 1 (TiB).
For 1(TB) it is 10^12(1,000,000,000,000), and for 1 TiB it is 2^40( 1,099,511,627,776).

Thus, the SSD volume in our test in TiB will be equal to 110^121.92/2^40, which will ultimately give ~1.74Tib. However, in practice, both will be called terabytes. Because, frankly speaking, tibibyte sounds so-so. And in the notations we will encounter a complete zoo: TB, TiB, and Tb, which are usually used in the context of computer networks and denote terabits – the third unit, which is 8 times smaller than TiB. And, of course, where would we be without the lowercase notation tb, which does not carry any standardized unit.

Speed. Here again there is a zoo, since the already mentioned third player is connected – bits (b/s), kilobits (Kb/s), megabits (Mb/s) etc. per second. And as if this is not enough – for some reason they also leave the abbreviation for “per”. This can be seen in Mbps and Mbit/s.
In the context of information storage, usually used MB/s, GB/s, or MBpswhere we have everything 1000, but not 1024.

Since we have 8 bits in one byte, if we need to convert one to another, we simply divide by 8. For example, (1000Mb/s)/8 — we get 0.125MB/s.

I bet your eyes are already glazed over from all this – the difference is in one single letter. If the lowercase B(MB) these are bytes if uppercase b(Mb) These are bits.

IOPS (Input/Output Per Second). Here, thank God, there is only one designation option – just IOPS. But in terms of measurements, this is the trickiest value, since it does not directly correlate with the speed of the memory bus or memory cells. The IOPS indicator depends most on how many input-output operations the drive can perform per unit of time. Which, first of all, depends on the speed at which the controller operates.
Since having a speed of, say, 7GB/s, one controller can perform 10 operations, between which it will have a delay of 40ms (milliseconds), and another 100 in 10ms, its throughput will also be 7GB/s, but its IOPS will be higher.

Testing Methodology

For both tests with PCIe 3.0 and 4.0, we used the fio console utility. IOPS is calculated using the following formula: in the script [fio/stat.c](https://github.com/axboe/fio/blob/master/stat.c), function static void show_ddir_status:

```C
iops = (1000 * (uint64_t)ts->total_io_u[ddir]) / runt; // Умножаем на 1000 для перевода из ms в секунды
```

The total number of input/output operations is the sum of two types of operations – sequential and random and is still in [fio/stat.c]in function void sum_thread_stats.

System parameters for the test

As mentioned at the very beginning, the model of the SSD we chose is Samsung PM9A3 U.2 in the amount of 4 pieces and a capacity of 1.92 TB. According to the documentation from Samsungit should have the following characteristics:

Processor – AMD EPYC 7F72 24 cores/3.2GHz. RAM – Samsung DDR4 3200MHz 256GB(16x16GB), another 2x32GB SSD, but in DOM format, OS – Proxmox 8.1, VM – Almalinux 9.3 with custom kernel 6.1.62-1.el9.x86_64, motherboard – Supermicro h11ssw-nt.

The parameters of the second test were identical. The main difference was that the PCIe slots on the motherboard were version 4.0, not 3.0.
Full specifications: Asrock B650D4U, CPU AMD Ryzen 7950X, RAM 128Gb DDR5. Since the testing was carried out at different times with a large time interval, the firmware of the drives and some testing parameters are different.

First test

PCIe version – 3.0, four lanes, SSD firmware version – GDC5602Q.

```bash
fio --filename=/dev/nvme0n1  --direct=1 --rw=randrw --bs=4k --ioengine=io_uring --iodepth=16 --name=test7 --size=384G --rwmixwrite=30 --runtime=600 
```

DEV

R-IOPS

W-IOPS

BW-R(MiB/s)

BW-W(MiB/s)

R-clat(usec)

W-clat(usec)

nvme0n1

83.8k

35.9k

327

140

166.05

39.94

```bash
MDADM RAID1 (2 disk)
fio --filename=/dev/md127  --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=16 --name=test5 --size=384G --rwmixwrite=30 --runtime=600
```

DEV

R-IOPS

W-IOPS

BW-R(MiB/s)

BW-W(MiB/s)

R-clat(usec)

W-clat(usec)

MD127

50.7k

21.7k

198

84.9

121.39

422.69

Second test

PCIe version is 4.0, four lanes, SSD firmware version is GDC7302Q.

```bash
fio --filename=/dev/nvme0n1 --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=16 --name=test --size=500G --rwmixwrite=30 --runtime=120
```

DEV

R-IOPS

W-IOPS

BW-R(MiB/s)

BW-W(MiB/s)

R-clat(usec)

W-clat(usec)

NVME0N1

228k

97.9k

892

383

57.37

23.25

```bash
fio --filename=/dev/md128 --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=16 --name=test --size=500G --rwmixwrite=30 --runtime=120
```

DEV

R-IOPS

W-IOPS

BW-R(MiB/s)

BW-W(MiB/s)

R-clat(usec)

W-clat(usec)

MD127

179k

76.8k

700

300

64.38

47.17

Reasoning

The results are, frankly speaking, astonishing: the PCIe 4.0 SSD significantly outperforms its 3.0 counterpart.
Unfortunately, we were pressed for time, as the server we used for testing had to be put into operation. And other tests in fio were not included in the post, because they were conducted with other parameters, which are not quite correct to compare with each other. But in general, a significant advantage is observed everywhere in favor of the system with PCIe 4.0 and a more recent firmware. Everything else remained unchanged. Although the collective wisdom on the forums and in general claims that there should not be such a significant difference between different versions of the interface and firmware, nevertheless, we observe the opposite.

From this we obtain two hypotheses, which, unfortunately, we will only be able to test in the future:

1. The SSD controller turned out to be fast enough, and on the PCIe 3.0 version it was limited by its bandwidth.

2. A more recent version of the firmware fixed some bugs that slowed down the SSD.

conclusions

It will definitely be necessary to retest the SSD next time under the most identical conditions. But in general, the experiment clearly shows that the generally accepted opinion and the characteristics specified by the manufacturer are one thing, and independent tests are something completely different. Neither components nor software exist in a vacuum. There are so many combinations and different versions that different versions can ultimately yield completely different results. And this is with the same test parameters, in the same tasks. And, given that everyone has different tasks, there is very little point in talking about anything in advance – there is no way without testing.
That is why we at cdnnow! prefer not only to test everything ourselves for internal tasks, but also provide demo access to our clients so that they can practically evaluate the services we offer to complete their tasks. To do this, you need to leave a request on our website.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *