a thorny path to bright happiness? Disassembling into screws

What will the article be about?

In this article I will tell you what you may encounter when using this product: for some this may be useful when choosing, because you will definitely not read this in marketing materials, and even more so you will not hear it at presentations.

Of course, I will draw parallels with famous players in the storage market (well, I can’t do without it), because my path of working with storage began back in those distant times when IBM dominated with its DS series and EMC with its Clariion series. I have handled countless storage systems, and I am an ardent fan of classic storage systems, no matter how sideways the fans of HCI and dedicated servers with built-in disks look at me. But I remain of the opinion that storage systems are the standard of convenience, scalability and fault tolerance.

Remark: I will only talk about the two-controller option, because the configuration with one controller is not for combat load, there is no fault tolerance and storage principle here. You can just take a server with a RAID controller SAS HBA + JBOD with the same success. So we do not spend extra money, well, this is my personal fiiiiiiice…

Preamble

I will begin my story with how I began to be friends with this system and periodically hate it, spat and cursed the day when I sat behind the wheel of this vacuum cleaner, and then resigned myself and calmed down, that it seemed to even work.

The very first thing you hear at presentations is: “you have no hardware restrictions and you can assemble it on anything you have at hand.” In fact, this is a myth – and the manufacturer has a whole large list of hardware compatibility with which this storage system will take off and fly. But almost anything can fly (but not for long, like the plywood over Paris).

Let's start in order. We are all used to the fact that in the base, if it is a low- or middle-storage, then it should immediately have 2 controllers on board in the head unit, 24 2.5 disks or 12 3.5 disks, in HI-END there is often a separate controller unit, and then shelves go to them – I am talking now about the usual and most common middle options, a la DELL-EMC Unity, Fujitsu DX, Huawei Dorado, Netapp FAS.

And here we come to the point that we need to find a piece of hardware that will have two servers in one device at once and a built-in backplane with disks that will be accessible to both servers. Or we take two servers, twist them together with electrical tape, and necessarily 2U servers and JBOD with two SAS controllers, and all this will already take up 6U, and not 1U servers (I will tell you later why this is so and not otherwise). As a result, we get such a big sandwich that eats up a lot of space in our closet.

We are looking for a suitable platform.

There are two manufacturers of such servers on the world market: AIC and Gigabayte. Unfortunately, today you can only buy AIC from us – and not all models, and if you want a specific model, wait 6-9 months to order. I will also immediately note that these models may not be on the compatibility list, and no one will guarantee operability – but from experience everything worked, I did a lot of tests on cats.

Assembling Frankenstein

We found suitable hardware from AIC in the “High Availability Servers” section. And here we see two suitable models for us in 2U design – a classic of the genre, with two servers, we rejoice, jump, clap our hands, We purchase the necessary models, go to the laboratory, open the software manual with the requirements and meticulously assemble our “Frankenstein”.

We install 2CPU, 64 GB RAM: and then, in order for 2 controllers to work, we need to install a network card with two 25G ports or Infiband to synchronize the internal cache (there is also a limitation on manufacturers and models). We install an external SAS HBA card, we also search for a long time from the compatibility list.

And here is the funniest part: each server has only 2 PCI-E slots, and we only installed interfaces for the backend, but for the frontend we need FC HBA or ISCSI HBA.

  • In the case of ISCSI, we can get by with an integrated 10G card (yes, someone might say “for a product you need 25G, ugh, we live in the 21st century, and what 10G is there anyway, but for some it’s more than enough”).

  • In the case of FC HBA, we are at our wits' end. Well, I think, okay, I will test without an external shelf, only on the internal rack. We throw out the SAS HBA, take out our beloved Qlogic 2692 and immediately remember Homer Simpson when we try to install the card, but nothing works. First, it is long, the CPU cooler is in the way, second, “well, I still want to install it and run all this stuff.”

Cursing at the entire Chinese race with their designers, I remove the cooler and CPU, pour the memory onto the main bus and try to install the HBA again – and here I just get angry. It seems to me that my indignation was heard within a kilometer radius and the glass rattled, because the HBA bar does not fit into the hole on the back cover.

Here I sent all the Chinese to hell. I have no complaints about RAIDIX, except that so many PCI ports are needed, but I began to hate the AIC designers. But I didn't stop: there is also a 4U platform and 24 disk slots, which are universal – for both 2.5 and 3.5 disks. Again, we get a large irrational monster in 4U, and this is only the head unit.

Frankenstein Revived

WITH 4U model the assembly of the test subject was successful. Everything was installed as it should be, and I even connected an additional external shelf for 24 disks. Of course, it was quite a task to select 25G DAC cables with the necessary firmware for Mellanox cards: “Why not original ones?” you ask. Because in Russia you can't find them even with fire, and I won't write who supplied me with the necessary DAC cables with the necessary firmware, so as not to have any complaints.

There was the same charm SAS cables to connect an external shelf, so if someone tells me “Yes, you can buy components for it in any chain store“I'll send it there… And how I longingly remembered those times when I opened factory-assembled storage units from EMC, Netapp, Fujitsu, Huawei and HP, which were assembled without a hitch or a notch, in which each cable connected like clockwork, where there was even a set of tools and gloves in the kit.

Another problem from the series “yes, I’ll go buy any disk in the nearby chain store for self-assembly”, having bought the HD SAS 2.5 disks I needed for 1.8 TB, in the amount of 9 pieces, a month before assembly, I realized my mistake in the quantity for the required tests, and then alas and ah, an ambush, they are nowhere to be found, they are discontinued – and this, excuse me, has not even passed 5 years, here we are.

He is alive!

As a result, my Frankenstein is assembled, on its rump there are bundles of wires, because the connection between our nodes goes through two network cards, two by two, Karl! Whoever buys it and doesn't want to extend the warranty from the supplier, oh how they will love servicing such a miracle: you can't miss between the ports and, as I already said, there is a problem with the disks and other problems with spare parts.

The installation of the RAIDIX software itself goes off without a hitch, the main thing is to have the instructions at hand so that you can select the right parameter at the right time. Then I tested everything on virtualization enemy manufacturer Vmware, connected via DAS through FC 16G. The storage system configuration was as follows: 28 cores per controller, 64 GB RAM per controller, 9 SSD SAS 3.84 and 15 HD SAS 1.8 TB.

Well, let's move on to the sweet stuff: DC on RAIDIX assembled without any problems. If you're interested, read on instructions from the manufacturer – everything is pretty clear there, but not at all intuitive.

I had licenses for all the functionality and even for ERA, so they let me play with the cats to the fullest extent.

Configuration

The first thing we do, of course, is assemble RAID groups: I only have two of them.

  • First – RAID 5 on SSD: choose the number five because it was necessary to make a comparison for the customer with his current old IBM storage, which was already 10 years old, plus he already had SSDs.

  • Second RAID group is 7.3: the customer did not really need to receive information there. There were also 5 SSD MLC for SSD cache.

I started with SSD – and here we are immediately, like in a fairy tale, at a crossroads near a stone: to create RAID 5 or 5i. Considering that 5i will be initialized for many hours, and it is generally impossible to work with this group during initialization until the end of initialization, I remember the first mammoth with which I started working at the distant beginning of my career. This is IBM DS 4700 – even with it it was possible to work with a RAID group during initialization, and I am silent about modern EMC, Fujitsu, Huawei and even Sugon, assembled by Uncle Lao in the basement of Shanghai – everything works here too.

Okay, I'm making a regular five – you can work with it until the end of initialization. The cherry on the cake is that you need to look in the guide to see what cache size needs to be allocated for this raid depending on the selected number of disks. Well, everything seems logical: the more disks they have, the more cache they need, the cache will be backed up from all the RAM available in the node, and in addition, we strictly bind the RAID group to a specific node.

Checklist “to remember”:

Fig. 1

Fig. 1

  • we should always keep in mind how much memory we have allocated in total, so that if one node fails, we will have enough memory on the second;

  • also, we must not forget about the stripe size, and then we go back to the guide;

  • After creation, it is necessary to check that the CACHE has been synced (Fig. 1). And if it has not been synced, then in the event of a fuckup on the node where the RAID group is linked, the group will not move to the neighboring one.

Performance testing and marketing verification

I launched the first test on RAID 5 SSD and I see a disappointing picture in the parrots.

Here I remember the words of marketers: “The most gold you'll get is if you buy the ERA option, which gives you maximum performance on an SSD.“… Okay, RAIDIX has issued a set of all licenses. Now I go step by step again:

1. I'm disassembling my RAID group on SSD again.

2. I am already assembling with the ERA RAID 5 option.

3. I set all related parameters according to the official instructions.

4. I mount the disk.

5. I run a similar test again.

This time I get a result 3+ times higher than without the ERA option, but still not enough: 21573 parrots on 8 SSD drives.

Here I decided to check what would happen if I made a regular RAID 5i Generic without the ERA option, 5i is the initialized one, and set the accompanying settings similar to ERA. On a smaller number of workers I already got 49379 parrots.

And then I realized that this system requires knowledge related to fine-tuning parameters – which we have long forgotten about, because we have been spoiled with full automation by the mastodons of the storage market EMC, NETAPP, IBM, Huawei.

Having resorted partly to the method of scientific poking, I got the following set of parameters (screenshot below), with which I shot on RAID 5i SSD 240,000 parrots. On RAID5 SSD ERA the result was better 280,000 parrots. For me, the difference is not very noticeable, but for someone it may be critical.

A few flies in the ointment

I found these during testing.

  1. No automatic Fail-Back node after recovery

The story is this: if in a dual-controller implementation a node is turned off for some reason, the RAID groups will quietly move to the second working node and will continue to live there, even after the node is restored. And the node that has already been restored will not be synchronized and will not be used until you manually go in and start it.

In fact, we are left to live with only one working node, which is quite unusual for me, since all modern manufacturers have this automated, and with the RAIDIX system you can get into a very unfortunate mess.

This manual Fail-Back is documented as feature, not a bug.

  1. No LUN UID Display in RAIDIX System

    More precisely, some ID is present in it, but when you present the disk to VMWare virtualization, or even just to a physical server with Linux, the disk UID does not correspond to the internal ID in the RAIDIX system. And this is not very good: there are tasks when you need to make several identical disks and present them to the server, and then you look at these disks as twins that have no differences for comparison.

Some conclusions

What made me very happy was that it is extremely difficult to break RAIDIX, it is as reliable as a tank. I turned it off many times on the fly, after loading everything worked, I pulled out disks on the fly, the rebuild started, and the system continued to work. I disconnected disk shelves on the fly, and it was restored after reconnecting the shelves. I disconnected one of the two paths to the disk shelf, and it still worked.

Of course, after EMC, Netapp, Fujitsu and BM, RAIDIX storage really seems inconvenient and crude – it takes us back to the early 2000s, when it was necessary to independently calculate the disk load, queues and many parameters that many have already completely forgotten about. It does not have the RAID Migraition functionality that we are all accustomed to; with virtualization, this functionality is not needed in principle, but with dedicated servers, a problem may arise.

But it does perform its main function: storing data on it is possible. It is quite reliable, but the main thing is to follow the instructions and not forget to switch Fail-Back. I really hope that another 3-4 years will pass and the developers and architects from RAIDIX will catch up with the functionality of EMC and NetApp.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *