Small "raspberries" in a large data center

In May we have new

… And the guys from

(this is such an informal telegram chat of the company) they often asked how we managed to integrate them into the automatic delivery system. After a while, we looked at what we had done in retrospect and are ready to answer this question in detail.

The topic of using “raspberries” on an industrial scale is not new – you can find many publications on the Internet on the topic of assembling computational clusters. In the beginning, they were self-made farms, where the most interesting was the opportunity to add

more boards

or adapt

water cooling system

… But with the growing interest, ready-made brackets for mounting “raspberries” in a standard 19 ”rack began to appear. There have even been attempts to create

blade options

…

But, despite all the optimism in the media sphere, the question of whether it will be possible to integrate Malinki into Selectel data centers at all remained open.

General requirements for placement in a data center

To understand what could have got in the way, let’s look at the main differences between “raspberries” and “standard” servers. “Standard” refers to the Chipcore series servers based on desktop hardware.

To place a server in a data center, it must meet the following requirements:

Support for network interfaces. The most obvious requirement. The server must be accessible to the client by a “white” IP address. Provided by a Gigabit Ethernet port.
Rack placement. We used a 3D printed sled during our tests. There are 12 “raspberries” per 1 assembly unit.
Remote power management. By default, power is supplied through the dedicated USB Type-C port, which gives only manual control.
But through the 40-pin connector you can connect PoE HAT module… In this case, power can be controlled remotely through the switch by turning power on or off on the port.
Independence from local disk devices. Renting dedicated servers assumes full client access to hardware. What’s more, the discs can be replaced on demand. In this case, they should be completely cleaned up and not contain additional data, and the server deployment should not rely on them.
Boot over the network (PXE). The most incomprehensible and questionable point. We knew that Raspberries supported PXE booting, but how this actually worked remained unknown.

Let us dwell on the last point in more detail. It should be borne in mind that the Raspberry Pi belongs to the hardware for embedded systems. And on them, the process of initializing and loading the system is noticeably different from the usual.

The boot process of a “regular” server

For comparison, let’s take a quick look at how loading takes place on a “regular” server.

BIOS / UEFI. The system starts with BIOS / UEFI, which initializes the hardware and transfers control to the first device in the boot list. By default, this is the first disk on the system, but the correct network interface is selected first for PXE boot. It is important that the BIOS / UEFI gives us an interface to modify this list and can save it.
NIC. After the network interface has received control, it sends a DHCP broadcast request to the network, receives TFTP settings (tftp-server-name and boot-file-name) and using them loads the file to which the control is transferred.
iPXE… In our case, we return a customized iPXE bootloader, which through the parameter chain requests iPXE script and gets the server boot script.
Autoinstall / autoload. Depending on the choice of the distribution kit, a unique script (primarily connection data) for a specific server is generated based on the template. After the autoinstall completes, the boot script changes and iPXE will receive a different boot template next time. Loading through a network interface allows us to flexibly control the loading of the server, changing only the response of the server from which iPXE expects a response.

Raspberry loading process

The boot process for Raspberry Pi 4 is described in detail.

in the official documentation

… Here we restrict ourselves to details that are important in the context of comparison.

EEPROM. The start of the “raspberry” begins with the launch of the firmware from the non-volatile EEPROM microcircuit. In the script, it looks by default for files on the / boot partition of the SD card needed to initialize components. In this sense, it can be compared to a BIOS without a graphical interface. To change the settings and the boot script (BOOT_ORDER), you need to completely update the bootloader file using the options from the text file and re-write it to the microcircuit.
Boot directory (/ boot). By default, the bootloader from EEPROM refers to the first section of the SD-card, where the necessary files are located for further download. First of all, these are the firmware file for the GPU, DTB files (description of the device tree) and the kernel image kernel.img, from which the operating system is loaded. On an installed and booted system, this partition is mounted in / boot. Changing the BOOT_ORDER option essentially changes only the device from which the bootloader expects to receive files, not the list of them.

Yes, the process is very different. And this raises many questions about its integration into the existing scheme. But we will deal with it sequentially.

EEPROM update

To update the EEPROM, we need a preinstalled system with a utility vcgencmd… We can get it by downloading the image Raspberry Pi OS and rolling it directly onto an SD card.

wget https://downloads.raspberrypi.org/raspios_lite_armhf/images/raspios_lite_armhf-2021-05-28/2021-05-07-raspios-buster-armhf-lite.zip

unzip -p 2021-05-07-raspios-buster-armhf-lite.zip | sudo dd of=/dev/sdX

We return the SD-card back to the “raspberry” and boot from it into the operating system. Login and password are used to log in by default.

pi / raspberry

…

Here you can see the current EEPROM firmware options:

pi@raspberrypi:~ $ vcgencmd bootloader_config
  BOOT_UART=0
  WAKE_ON_GPIO=1
  POWER_OFF_ON_HALT=0
  FREEZE_VERSION=0

We generate a configuration file from the firmware file:

cp /lib/firmware/raspberrypi/bootloader/pieeprom-2019-11-18.bin new-pieeprom.bin

rpi-eeprom-config new-pieeprom.bin > bootconf.txt

In file

bootconf.txt

need to change the value

BOOT_ORDER

to change the loading order. Since we are interested in downloading over the network, we first specify the download over the network (2), then from the SD card (1), and this must be done in a loop (f). You can also specify the maximum number of network boot attempts:

BOOT_ORDER=0xf12
NET_BOOT_MAX_RETRIES=1

After preparing bootconf.txt, you need to apply the settings from it to the firmware file:

rpi-eeprom-config --out netboot-pieeprom.bin --config bootconf.txt new-pieeprom.bin

At the end, we write a new firmware file with the options we need to the EEPROM:

sudo rpi-eeprom-update -d -f ./netboot-pieeprom.bin

At Selectel, we love single-board devices, so we have compiled another selection of interesting articles on this topic for you:

Loading Raspberry into PXE

After rebooting, we see on the monitor screen that the server is starting to boot over the network. The single-board device successfully obtains an address from the DHCP server and starts requesting files from the TFTP server. Loading error! After all, we have not yet prepared a replacement

/ boot

directories on the TFTP server.

To do this, just copy the contents from the directory / boot in the installed Raspberry Pi OS to a remote TFTP server. For example:

scp -r /boot/* root@tftp-server:/srv/tftp/

In our case, it is better to get these files from the folder

/ bootofficial repository

(

alternative link

After such preparation, the “Malinka” will successfully start downloading over the network, receive the requested files via TFTP and boot into the already installed system.

Stop to assess what is happening: using the TFTP server logs, carefully review the list of files that the “raspberry” requests over the network. For a better understanding, we will check with documentationexplaining only the minimum required set of files.

RRQ from 10.51.228.22 filename 04f2ea0a/start4.elf
RRQ from 10.51.228.22 filename config.txt
RRQ from 10.51.228.22 filename start4.elf
RRQ from 10.51.228.22 filename fixup4.dat
RRQ from 10.51.228.22 filename recovery.elf
RRQ from 10.51.228.22 filename config.txt
RRQ from 10.51.228.22 filename dt-blob.bin
RRQ from 10.51.228.22 filename recovery.elf
RRQ from 10.51.228.22 filename config.txt
RRQ from 10.51.228.22 filename bootcfg.txt
RRQ from 10.51.228.22 filename bcm2711-rpi-4-b.dtb
RRQ from 10.51.228.22 filename overlays/overlay_map.dtb
RRQ from 10.51.228.22 filename overlays/rpi-poe.dtbo
RRQ from 10.51.228.22 filename config.txt
RRQ from 10.51.228.22 filename overlays/vc4-fkms-v3d.dtbo
RRQ from 10.51.228.22 filename cmdline.txt
RRQ from 10.51.228.22 filename recovery8.img
RRQ from 10.51.228.22 filename recovery8-32.img
RRQ from 10.51.228.22 filename recovery7l.img
RRQ from 10.51.228.22 filename recovery7.img
RRQ from 10.51.228.22 filename recovery.img
RRQ from 10.51.228.22 filename kernel8.img
RRQ from 10.51.228.22 filename kernel8-32.img
RRQ from 10.51.228.22 filename kernel7l.img
RRQ from 10.51.228.22 filename armstub8-32-gic.bin
RRQ from 10.51.228.22 filename kernel7l.img

On the first line, you can see that files are being requested with a prefix corresponding to the serial number of the board. If there is no such directory, all other files are requested from the root.

start4.elf and fixup4.dat – blob and linker required to initialize the video core, since the GPU is initialized before the CPU.

сonfig.txt – a file with custom parameters affecting the behavior of the EEPROM firmware and initialization of the hardware. If we draw an analogy between eeprom and bios, the file stores all the settings changed by the user through the graphical menu. Active values can be viewed through the utility vcgencmd…

bcm2711-rpi-4-b.dtb – base file describing the device tree (device tree binary) on the board. In general, the topic of DTB files (and DTBOs from the overlays / folder) is worthy of a separate article. In our context, we will restrict ourselves that this tree is subsequently passed to the Linux kernel…

kernel8.img and cmdline.txt – the Linux kernel, from which the operating system starts booting. Through file cmdline.txt additional kernel parameters are passed.

In our case, when we copied the directory / boot from the installed OS, we also copied the file cmdline.txt… If you look at its contents, it becomes clear why, booting over the network, we got into the operating system installed on the SD card.

cat cmdline.txt

console=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait dwc_otg.lpm_enable=0

The root option, which points to the root filesystem, leads to the second partition of the SD card with the previously installed Raspberry Pi OS.

Results and further plans

We figured out exactly how the Raspberry Pi 4 boot over the network. It remains to figure out what will replace the iPXE bootloader in the described scheme, through which we control the server boot.

Stop. Is it really necessary to look for a replacement for iPXE? Maybe somehow load the “raspberry” directly into iPXE?

Whether we succeeded in realizing our plans, I will tell you in the next article. In the meantime, subscribe to our blog so as not to miss the sequel.