OCP Experience Lab – how we built a mini data center in the office
It all started with creating a stand for testing our own servers. Then the stand expanded and we decided to make a small data center for piloting various software solutions. Now it is the only OCP Experience Lab in Russia and the second in Europe.
In general, I am recently in the development of computing. It always seemed to me to be something impossibly difficult and accessible only to huge teams with very large budgets. And that is probably why it is very attractive. Last year, the stars came together and I managed to attract investors and find start-up clients to launch my own project to create a Russian server. It turned out that in Russia a couple of dozen teams of completely different sizes and specializations are working in this direction. Some simply bring kits of parts from Taiwanese manufacturers and do only assembly in Russia, while others dig much deeper, right down to writing their own BIOS. In short, it became clear that the topic, although difficult, but quite feasible.
To start development, we chose OCP standard servers. OCP is the Open Compute Project, an open community in which design documentation for all products is made publicly available for free use. Real Open Source Hardware, and as a result, the most progressive, rapidly growing and promising standard, moreover, promoted mainly not by suppliers, but by equipment consumers. In addition to all the technical advantages, open documentation was supposed to make it easier for us to start development and speed up embedding in such a heavy topic as server hardware. But this is probably a topic for a separate article.
And the company, by the way, was named GAGAR> IN. You will hear more about it soon.
My personal acquaintance with OCP took place five years ago, when I was involved in promoting American Stack Velocity solutions to the Russian market. Even then, we had the idea to localize their production and make servers assembled in Russia with open documentation for the needs of state-owned companies and state customers. But then import substitution was not yet in trend, and all potential customers ultimately preferred to buy Taiwanese equipment. It was then that the first shift in the popularization of OCP in Russia took place: Yandex installed in its new data center a sort of OCP server from a small Taiwanese vendor AIC, and Sberbank, Russian Railways and Mail were testing full-fledged OCP solutions from the giant Quanta, the world’s largest manufacturer of computer technology.
Quite a long time has passed since then, and therefore the first step of my plan was to bypass all the major vendors and closest OCP distributors in order to make friends, partner up and see and touch real hardware. Before the quarantine restrictions began, I miraculously managed to go around a dozen suppliers in Russia, Taiwan, China and Europe – it was a fast-paced and very productive tour, from which a lot became clear. It’s not the gods who burn the pots, and we definitely have a chance to successfully reproduce the OCP server, and moreover, to make it a little better in terms of characteristics.
One of the discoveries of immersion in the world of servers was that the vendor did not show samples of real products – most often we looked at the pictures of the presentation, and at best the engineers brought in a prototype of one model. Only once was it possible to see something similar to a showroom – the Dutch equipment distributor OCP put together a very elegant stand in a data center near Amsterdam.
The creation of a test bench has always been one of the first points of the company’s development plan, but we liked what we saw in Holland so much that it was decided to combine a test bench for our hardware, a showroom and a laboratory for debugging clients’ software and hardware solutions in one place, calling it the OCP Experience Lab. That is, we will create a laboratory that best suits the spirit of open communities – a convenient and easily accessible place where both end users and integrators can touch and test real, combat and most progressive hardware with their own hands.
All these beautiful plans of ours were canceled out by quarantine, and the first test benches were assembled in the dormitory districts of the capital “on the knee”, and a specially rented laboratory room in the center of Moscow stood empty.
As far as I know, during the quarantine period, even the jobs of engineers of the world’s largest corporations looked very similar.
In June, as soon as the quarantine eased a little, we were able to start assembling our dream lab.
Finally, it turned out to collect all the purchased equipment in one place and install it in the racks normally. We were lucky that the room had a three-phase electrical input, and there was an opportunity to lay optics. Nevertheless, at first, while we were coordinating the project with the landlord, dragging the optics and supplying electricity to the racks – all our equipment worked from a regular outlet and via an LTE modem. And because of the thick walls and dense buildings, this modem had to be glued to the window with tape.
And in the very first days it became clear that the idea of permanently placing system engineers in the laboratory was a big mistake – the noise from the equipment is such that it is possible to work longer than half an hour only in protective headphones.
We collected components for the laboratory from all over the world: we ordered everything new and interesting that is in the OCP world. As a result, we have three racks from three different manufacturers, one and a half dozen different computing servers, several disk arrays and as many as six switches! This diversity made it possible to simultaneously launch two or three functional stands and carry out long-term tests on them.
At first, they loaded with equipment and connected only the central rack, the left one was left for mechanical tests and storage of disconnected equipment, and the right one was assigned as a reserve for future expansion.
What and how we test
Obviously, the primary task of the test bench is to check the hardware developed by us for functionality, reliability and compatibility. Someday I will tell you in more detail how we conduct these tests, how we write methods and keep test reports. But even at the start, when our own servers were still in the project, the laboratory also did not stand idle – they tested the products of competitors and partners, chose peripherals, prepared and tested the very methodologies.
It all started with basic hardware benchmarks. We ran many components through testing: memory modules from Samsung, Micron, Hynix; SSD from the same Samsung, Micron and Intel; Mellanox, Broadcom, Emulex and Intel network cards. And they even compared the Intel SkyLake and AMD EPYC2 processors.
But it is clear that the laboratory is not only a place for testing new hardware. Consumers will not measure benchmarks, they need working software and hardware configurations. And so we began to slowly collect configurations of various software and check its operability and performance. We started with Russian Linux: Alt, Astra and Rosa. On basic tests, everything went without surprises – perhaps it is worth doing deeper research and comparison in complex problems. Then we assembled several different stands of virtualization systems. First we tried VmWare, Proxmox, Virtuozzo – they also went pretty smooth and boring with them. We saved the configurations and decided to return to these systems later, with real client tasks.
Since the main idea of OCP is no-frills equipment, the whole variety of functionality has been transferred to the software level. In fact, any configuration is assembled from two “building blocks” – a computing server and a JBOD (Just a Bunch Of Discs) disk array attached to it. We collected several different versions of both servers and disk arrays in the laboratory, and the next logical step was to test their joint work.
During all these tests, constant configuration and reconfiguration of servers and networks, it became clear that we could not cope without full monitoring of systems, and from that moment we got Zabbix.
One of the unexpected discoveries after the launch of Zabbix was that we discovered a rise in temperature in the laboratory at night. It turned out that since we are in an ordinary office center, the landlord turns off the central air conditioning at night. It can hardly cope with the cooling of our racks in the already hot summer, but it turns out at night the temperature in our makeshift data center regularly exceeded 35 degrees:
By the way, one of the advantages of OCP equipment is its ability to operate at temperatures up to 30 degrees, and the maximum allowable temperature is just 35 degrees. It turned out that we ourselves, unwittingly, arranged a kind of stress test for our servers. But still, leaving the server room without air conditioning is dangerous – a few more servers and the temperature will creep away under forty, and it is inconvenient to bring clients into such a hot room.
Our salespeople always asked for a beautiful showroom, but until the very last moment I hoped to get by with a little blood. Clean, comfortable, functional – and sufficient. However, the plans were to make a big announcement of the opening of the laboratory in October and for this it was necessary to shoot a short video. They called the film crew and received a harsh sentence – it will not be much cheaper to finish painting the beauty in the frame than to make a normal, full-fledged repair. As a result, the grand opening of the laboratory was postponed for a couple of months, and designers appeared to “make it beautifully”.
We ordered the design and renovation from the builders of exhibition stands – they, like no one else, know how to make bright designs. A month of approvals, a month of construction – and it turned out like this:
Now it was possible to conduct an official opening and shoot a full video:
As a result, we have not only a laboratory, a showroom, but also a convenient set for the production of a series of videos about Open Compute equipment.
So to be continued!