Bare-Metal Provisioning infrastructure from scratch

Firmware upload process

Greetings, Habr. My name is Roman and I am an embedded systems developer at Getmobit. I would like to share a case study of deploying software on a large number of devices on a production line from scratch. I didn’t want to force people in production to run along the conveyor with a flash drive, but for automation I needed to clearly understand: what steps can be divided into this process, what should happen at each stage, and most importantly – how to modify the standard process in order to ultimately perform specific our product operation. The basis for solving this problem was the technology of booting devices over the network (PXE). I’ll tell you about this.

What are we deploying on?

The “Bare Metal” part is the GM-Box as seen by users.

GM-Box

Depending on the modification, the device may include different plug-ins (Wi-Fi, LTE, etc.).
During the production process, the devices must pass functional tests of the connected units: for example, Wi-Fi finds access points and provides the nominal speed, the buttons on the device are pressed and work, the Bluetooth module finds a device with a “control” MAC address, and the like.
All copies that have passed the tests should have the final software (firmware).

A bit of theory

Preboot eXecution Environment (PXE)

PXE technology allows you to boot the operating system on a device using a network card, and in our case, UEFI will initiate booting over the network.
In this case, you can get information about the bootloader (pxelinux, grub and the like) using DHCP options. This in turn gives flexibility in managing operating system configurations for target devices.
From an infrastructure perspective, a typical PXE boot process would look like this:

Boot sequence diagram

Where Device – target device for OS deployment and customization, Provisioning host – infrastructure or free-standing car.
In the diagram, Device, after turning on the power and initial boot, UEFI requests DHCP parameters, which will indicate the name of the Zero image file and where to find it (the address of the TFTP server). And then it transfers the resulting image for execution (for example, pxelinux).

Next, the program is loaded from the minimal image (Zero image). The program reads the boot configuration from the previously specified TFTP server and executes it. In our case, it automatically downloads the Provision agent for further installation and passes it the parameters obtained in the previous step. The agent itself is the Linux kernel and initial RAM disk and runs the installation script. The composition of the script depends on the task. In a minimal form, it performs the following operations:

  • detects the hardware of the loaded device
  • performs disk partitioning
  • configures the operating system
  • installs the required packages

Agents, for example, can be preseed, kickstart, or a self-built distribution with deployment scripts.
After running the deployment scripts, the Provision agent automatically reboots the device.

PXE-less boot

For systems without PXE function, according to the classics, you can use a ready-made live image on an external medium. But if, for example, you need remote monitoring of the deployment process or there is a dependency on resources in the infrastructure, then instead of PXE, a discovery image can be added to the boot scheme, which, like a live image, is loaded from an external media, but performs only the function of delivering a provision agent and then the download chain will be identical to the diagram.

Preparing the infrastructure

The preparation looks quite simple – to provide each stage with the necessary data from the infrastructure, the following components will be required:

  1. PXE-enabled devices
  2. DHCP server (e.g. isc-dhcp-server)
  3. TFTP server (e.g. tftp-hpa)
  4. Repository (e.g. Ubuntu or python packages repository)
  5. Zero image (for example, voiced earlier, pxelinux.0)
  6. Provision agent (Linux kernel and initrd)

There are a lot of practices and instructions for installing and configuring components 1-4 on the Internet, so I will give only examples of configurations. The diagram below shows the infrastructure that describes the sample configurations. Here all services are located on the same host.

Infrastructure diagram

Examples of configuration files

DHCP configuration /etc/dhcp/dhcpd.conf

option domain-name "provisioner";
option domain-name-servers 8.8.8.8;

default-lease-time 600;
max-lease-time 7200;

log-facility local7;

authoritative;

subnet 192.168.30.0 netmask 255.255.255.0 {
  range 192.168.30.100 192.168.30.200;
  option subnet-mask 255.255.255.0;
  option domain-name-servers 192.168.30.1;
  option domain-name "prod.provisioner";
  option domain-search "prod.provisioner";
  option broadcast-address 192.168.30.255;
  # Имя файла-загрузчика
  filename "pxelinux.0";
  # Адрес сервера, откуда будет браться загрузчик
  next-server 192.168.30.1;
}
TFTP configuration /etc/xinetd.d/tftp

service tftp
{
   protocol     = udp
   port         = 69
   socket_type  = dgram
   wait         = yes
   user         = user
   server       = /usr/sbin/in.tftpd
   server_args  = /var/lib/tftpboot
   disable      = no
}

You can get pxelinux for Preseed 5 and 6 here and then place it in a directory accessible via tftp (in our case / var / lib / tftpboot).
According to these configurations, the target device will download pxelinux.0 and then deploy the Provision agent for installation.

Example configuration for pxelinux

DEFAULT linux
LABEL linux
    KERNEL boot/vmlinuz
    APPEND initrd=boot/initrd.gz ramdisk_size=10800 root=/dev/rd/0 rw auto console-setup/ask_detect=false console-setup/layout=USA console-setup/variant=USA keyboard-configuration/layoutcode=us localechooser/translation/warn-light=true localechooser/translation/warn-severe=true locale=en_US
    IPAPPEND 2

Here you can pass additional parameters to the Linux kernel by specifying links to local resources and configuration for preseed.

Finally, how we applied it

So, to solve the original problem, we need to modify the provision agent, because hardware modules must be tested before deploying the software. Testing, in turn, does not have a fixed scenario and can vary.
In our case, the provision agent is a self-assembled live image with specialized software for testing and the function of installing the GM Soft Kit into the device’s memory. The script for testing the current configuration is taken from a special http server.
Using the described infrastructure, we can build a production process as shown in the diagram:

Device manufacturing process diagram

The process of updating on the production line of software for installation (continuous delivery) is well built into this scheme, but this is a separate topic.

Underwater rocks

Preseed does not perform well on scaling: with multiple installations, the same configuration can behave differently and freeze at unexpected moments. As a result, we had to abandon the installation of the distribution kit in the classic form, build the firmware in advance and make our own provision agent, which now uploads the image into the device’s memory over the network.

In the first iteration, we tried placing the Linux kernel and initrd on a simple http server and saving links to them in pxelinux.cfg – with this configuration, the bootloader periodically did not load the kernel and would hang tightly. Helped by placing files on the same tftp server where pxelinux itself is located.

Conclusion

Of course, the very task of deploying software to a large number of devices cannot be solved at once. The team and I are still in the process of developing the final pipeline for delivering the results of assembling software projects to the production line. We are well aware that there is still a “ceiling” of bandwidth, so we are planning to further optimize the process of distributing software over the network.

Additional materials

  1. As one of the development scenarios – a special EFI application wiki.archlinux.org/index.php/Systemd-boot#Preparing_a_unified_kernel_image… It can perform functions of zero image and provision agent at the same time.
  2. How to configure the PXE infrastructure and how to prepare the image is described here habr.com/ru/company/X5RetailGroup/blog/493124

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *