The story of how we built fast storage in 2022

It was 2022. We, as a cloud provider, were faced with the task of creating the fastest possible storage for the project with a capacity of 150 TB or more. At that time, the market was rapidly changing: traditional SAS SSDs were being replaced by 2.5″ form factor drives supporting the PCIe Gen4 x4 NVMe interface. These drives had a number of advantages compared to classic SAS SSDs. In particular, their read and write speeds were 3-4 times faster (at 6700/4000 MB/s for the model we chose), and the response time was significantly lower due to direct connection to the processors. Another important factor was that these drives cost 15% less. their SAS analogues.

Yes, we could expand SAS storage shelves using extendet connections, and NVMe had to be connected directly to the processor lines, which could require additional costs for platforms with processors that support the required number of PCIe lanes. But even taking this into account, savings on the drives themselves allowed us to remain profitable. The transition to NVMe drives promised only benefits.

We calculated all the pros and cons, chose the technology and were ready to start buying equipment. However, 2022 brought unexpected obstacles: the largest server equipment manufacturers, such as HP, DELL, Lenovo and Supermicro, left the Russian market, and their representative offices were closed. If earlier you could almost go out into the street and just shout that you need a server, and representatives of these companies would immediately come running to you, offering their equipment, but now everything has changed. In response to such cries, we heard only a sad echo, reminiscent of sanctions.

In this new reality, no one else helped with the choice of servers, and all responsibility for making decisions had to be taken upon ourselves. The most difficult part was that I wanted to get a server with 24 NVMe disks, each of which had to operate at maximum speed. It was necessary to take into account many factors: from the number of PCIe lanes and their correct distribution to the support of the platform itself. At the same time, it was necessary to be sure that the processor platform could cope with the task of transferring data at such a high speed and would not become a bottleneck in the system.

However, this task depended on many small technical details that had previously been handled by vendors, but now needed to be taken into account by us.

The first important detail is the choice of processor. Everything had to start with it, since it is the processor that is responsible for supporting the required PCIe version (in our case it should have been Gen4) and for the number of PCIe lanes that are required not only for connecting disks, but also for peripheral equipment such as network cards and other controllers.

Let's consider one of the most popular processors at that time – Intel Gold 6248R. It was an excellent stone with remarkable performance, but there were some nuances here too.

We were faced with the fact that, according to the specifications on the manufacturer's website, the processor only supports PCIe 3.0 and has a maximum of 48 lanes. It would seem that for a standard dual-processor system, where each processor has 48 lines, this should be enough – after all, 2 * 48 = 96 lines, which seems an impressive number. But the problem is that each NVMe drive consumes 4 PCIe lanes. It turns out that 96/4 = 24 disks, and all lines are already occupied only for connecting disks. As a result, you will not be able to connect any additional network cards or other peripheral equipment.

It's like the equivalent of Nina Ritchie's tank: it looks nice, but it doesn't drive or shoot – it's just a luxury item. Gen 2 processors are simply not suitable for our task. The only solution was to look at the recently released Gen 3 processors.

A small digression: why is it so important for us to have PCIe 4.0 and not PCIe 3.0? To do this, let's turn to Wikipedia and look at the wonderful bandwidth table.

The table clearly demonstrates that PCIe 4.0 has twice the bandwidth compared to PCIe 3.0. 4 PCIe Gen 3 lanes provide only 4 GB/s in one direction. And our drives are capable of up to 6.7 GB/s read speed, and this speed can only be achieved with PCIe 4.0. Yes, PCIe 3.0 and PCIe 4.0 are compatible, but it's clearly not in our best interest to deliberately reduce performance by limiting ourselves to older standards. This seems unreasonable, especially when it comes to high-performance storage. That’s why in such a project you can’t miss even the smallest details – everything can affect the final performance. And this is critically important for our project, because our goal is to ensure maximum performance for each disk. And I remind you that you need to do everything yourself, without the support of vendors.

Now let's get back to the processor specifications – look at the Intel Gold 6348, which supports PCIe Gen 4.

In the Intel Gold 6348 specifications, we see that the processor supports the PCIe Gen 4 we need and already has 64 PCIe lanes, which in a dual-processor system will give us 128 lanes. Of these, 96 will be used to connect NVMe drives, leaving 32 lines for network cards. This, by the way, is not so much – in fact, these are two free PCIe x16 ports.

We have finally decided on the generation of processors and the specific model. It would seem that everything is ready: just take any platform that supports 24 NVMe disks and you can start working. But even here, unpleasant surprises may await you. Given the limited choice of manufacturers, let's look at what is available on the market. For example, there is the ASUS RS720-E10-RS24U platform, which is advertised as a dual-processor server in a 2U case that supports 3rd generation Intel Xeon processors, up to 32 DIMMs, 24 NVMe drives, nine PCIe 4.0 slots, OCP 3.0, dual M slots .2 and ASUS ASMB10-iKVM control module.

At first glance, everything looks perfect – support for 24 NVMe, big numbers. But as soon as you start delving into the specifications, questions arise: there are a suspiciously large number of expansion slots.

Hop-hey, la-la-lay, in the full information about the platform we see that yes, fewer PCIe lines are supplied to the disks, and as a result, only 2 lines remain for each disk. And there is an overwhelming majority of such platforms on the market. It was almost impossible to get a system that could fully utilize all the capabilities of NVMe drives. We were lucky that at that time there were two platforms on the market that met our requirements. Not two manufacturers with different options, but literally two specific models.

One of them was based on Supermicro equipment, which in 2022 became almost unavailable. The second is the Gigabyte R282-NO0 platform, and there was a procurement queue for it that was three months long. As it turned out, we were not the only ones investigating what equipment could handle our task. When people realized that this platform was suitable for full use of all the capabilities of disks, everyone rushed to order it.

Will there be a conclusion to this story? Well, I can give this advice: always check all the parameters, study the specifications down to the smallest detail, check compatibility, do not miss a single unclear place. In today's environment, there is no longer a good vendor who will do everything for you.

For example, there are already processors and motherboards that support PCIe Gen 5, but 95% of platforms still have PCIe Gen 4 drive cages. But what we ultimately assembled and what speed we achieved is a completely different story.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *