how we automated the installation of the OS on servers

[*]

Server leasing companies are faced with the need to automate the installation of operating systems. In the early years, we at HOSTKEY offered customers only a small number of installation options, but over time we have improved the service. We tell you how to do it with minimal cost.

Disadvantages of manual installation

Initially, our engineers deployed systems on servers manually using DVD distributions and USB drives, which weighed about a kilogram and required separate power supply (they began to work much later from the server’s USB port).

We then replaced the optical drives with Zalman USB devices with a hard drive inside and the ability to select an ISO image in a tiny menu using a slider.

After there were flash drives with Ventoy autoloader. This was combined with IPMI and IP KVM, and often with a monitor and keyboard. We still install such USB flash drives with the ISO library on client servers upon request.

With a relatively small number of machines, such an organization of work was possible, but the main problem with the manual approach is the lack of scaling options. With an increase in the server park, it will be necessary to expand the staff of engineers and increase the cost of rent. In addition, the market did not stand still, and it became somehow indecent not to offer self-service options.

Automation Challenges…

To begin with, we deployed a PXE server: this solved the scaling issue for a while, but as we grew, new difficulties arose. The main one is the need to manage the installation of the OS for different models of motherboards. Simple PXE didn’t allow doing this conveniently, so I had to look for options to simplify automatic installation that did not require extra actions from engineers or technical support specialists.

The way out was the introduction of a regular Foreman to manage the PXE deployment procedure and OS configurations via the API interface. So we got a more advanced automation system and made configurations for the main operating systems. But new problems emerged:

  1. The deployment allowed you to manage the installation of Linux, but when installing Windows in UEFI mode, there were problems – loading WIM or ISO images into the iPXE ramdisk did not work. We fixed this by deploying via our own CentOS Live CD, which started the process and prepared the server for installation with Windows PE. The device of such a deployment is a separate story, and we will tell about it somehow. This experience laid the foundation for changing Linux installations.

  2. Once the Windows issue was largely resolved, Canonical dropped support for Debian-Installer in Ubuntu 20.04. We had to create an unattended installation for Casper, which at that time was underdeveloped and rather inconvenient.

Solving problems as they come up turned out to be time-consuming and inefficient from a business point of view, so we decided to use an integrated approach and compiled a list of requirements for the future system:

  1. No problems with support for various installers in the future.

  2. Easier support for deploying Unix systems, since the configuration of Casper is radically different from Anaconda, and it is not even close to Debian-Installer. What can we say about RouterOS from Mikrotik, OpenSUSE or some ArchLinux.

  3. The presence of a unified procedure for partitioning disks and setting up volumes in order to manage it in the future through the Web API of our hosting.

…and their solution

Experience with Windows Server helped us a lot. For automation, we use a LiveCD based on CentOS 8, which is built through Jenkins and stored in Git. We can control the composition of the software, hardware support, and also change the behavior of the image when loading through the Foreman API by passing trigger parameters. This allows you to start testing and formatting the server, collecting information about the components of a dedicated server, installing Windows and installing Unix systems. How this kitchen is arranged, it is worth telling in a separate article.

When creating a Unix installation, we started from the fact that it does not require a complex installation procedure. It is enough to partition the disk, write OS files to it and make basic settings:

  • set hostname;

  • configure mounting file systems via fstab;

  • set up a network;

  • create a service user with the specified password and keys;

  • make additional settings (set the locale, etc.);

  • perform an OS update.

The procedure is very similar to installing ArchLinux according to the classic beginners guide. The first launch of new installations was planned on the main popular distributions: Debian, Ubuntu, CentOS.

Stages of automation

  • Preparing an image with files. This is a fairly simple procedure, for which you need to install the OS, and then reduce the image: remove the kernel from it (via the package manager), clear the caches and reset the network configuration. Operations on the OS are performed via chroot on the mounted partition with the root file system, and then its contents are sent to the tar.gz archive. The subsequent update or addition of standard software is performed in the same way, but in the reverse order: unload the image from the mirror, add software, update, clear caches and pack it into an archive again. As a result, the image is ready and lies on the mirror.

  • Preparing the OS installation script. Our script is assembled from several parts. Foreman uses a separate entity for partition table breakdowns that are bound to the OS type. In the future, we will move to a single pagination format controlled from the API.

Since the new partitioning is a generic shell script for CentOS 8, we didn’t need to bind individual disk partition tables to specific systems. Each such table is a link to a universal script through a snippet and is formatted like this:

<%#
kind: ptable
name: Debian_LVM_HBA
oses:
- Debian
- Ubuntu
-%>
<%= snippet 'Linux_Default_part_HBA' %>

The real code is in the Linux_Default_part_HBA snippet and is not duplicated.

The script itself is written in shell and performs the following procedures:

  1. Analyzes the composition of block devices and selects the smallest one for OS installation.

for device in ${blockdevices[*]};do
   if [[ `cat /sys/block/$device/queue/rotational` -eq 1 ]];then
        hdd_devs+=($device)
   elif [[ $(cut -d: -f1 < /sys/block/$device/dev) -ne 8 ]];then
        nvme_devs+=($device)
   else
        ssd_devs+=($device)
   fi
done
Simply set first device by type and size priority
if [[ ! -z ( GET_SMALLEST_DRIVE ( GET_SMALLEST_DRIVE {fake_r[0]}
fi
if [[ ! -z ( GET_SMALLEST_DRIVE ${hdd_devs[@]} )
fi
<% end -%>
if [[ -z $INST_DRIVE ]];then
   ERROR_REPORT partitioning
   exit 1
fi
  1. Cleans existing disks from traces of file system labels, LVM, etc.

  2. Performs partitioning using parted separately for installations in EFI or Legacy mode:

# Base partitioning
if [ -d /sys/firmware/efi ];then
   if [[ $(echo $INST_DRIVE | grep -c nvme) -eq 0 ]];then
       ESP_PART=${INST_DRIVE}1
       BOOT_PART=${INST_DRIVE}2
       ROOT_PART=${INST_DRIVE}3
   else
       ESP_PART=${INST_DRIVE}p1
       BOOT_PART=${INST_DRIVE}p2
       ROOT_PART=${INST_DRIVE}p3
   fi
   parted -s /dev/${INST_DRIVE} mklabel gpt mkpart fat32 1MiB 256MiB set 1 esp on
   parted -s /dev/${INST_DRIVE} mkpart $FILESYSTEM 256MiB 1GiB
   parted -s /dev/${INST_DRIVE} mkpart $FILESYSTEM 1GiB $ROOT_PART_SIZE
   wipefs -a /dev/$ESP_PART
   mkfs.vfat -F32 /dev/$ESP_PART
else
   if [[ $(echo $INST_DRIVE | grep -c nvme) -eq 0 ]];then
       BOOT_PART=${INST_DRIVE}1
       ROOT_PART=${INST_DRIVE}2
   else
       BOOT_PART=${INST_DRIVE}p1
       ROOT_PART=${INST_DRIVE}p2
   fi
   parted -s /dev/${INST_DRIVE} mklabel msdos
   parted -s /dev/${INST_DRIVE} mkpart primary $FILESYSTEM 1MiB 1GiB set 1 boot on
   parted -s /dev/${INST_DRIVE} mkpart primary $FILESYSTEM 1GiB $ROOT_PART_SIZE
fi

The examples above assume partitioning without RAID. If you need to create auto-layout for more complex block configurations, separate scripts are used, and we choose what exactly the OS is installed with via the Foreman API. In the future, we plan to move to a more complex system with flexible staking management through our own API and a friendly interface in the user control panel.

The result of all manipulations on the disk is a mounted structure with a root for a new installation. There is always only one Mountpoint (/mnt), and what kind of file systems it contains does not matter for the following blocks of the script. Thus, this is a point for error control during installation.

The further course of the installation is performed by the main Linux_Default script, which includes a script for partitioning disks. It solves common tasks for installing all types of OS:

<%#
kind: provision
name: Linux_Default
model: ProvisioningTemplate
-%>
#!/usr/bin/env bash

STAGE_CALL provisioning_start 5

# Set param manualprovision on OS for starting script manualy
<% if host_param_true?('manualprovision') %>
sleep 5
echo "=============================="
echo -e "\n You can start /provision.sh manualy \n"
echo "=============================="
exit 0
<% end -%>

# Выполняем задачи неспецифичные для ОС, например, задаем константы для сервера времени
<% if host_param('medium_fqdn') == "mirror.hostkey.com" -%>
TZ="Europe/Amsterdam"
NTP_SRV="ntpserver.example.com"
<% elsif host_param('medium_fqdn') == "mirror.hostkey.us" -%>
TZ="America/New_York"
NTP_SRV="ntpserver.example.us"
<% else -%>
TZ="Europe/Moscow"
NTP_SRV="ntpserver.example.ru"
<% end -%>

#  Здесь вставляем разбивку
<%= @host.diskLayout %>

# Загружаем с зеркала темплейт ОС и распаковываем файлы в корень новой ОС
cd /mnt
curl -k -L --output - -s <%= @host.os.medium_uri(@host) %>/<%= @host.operatingsystem.name %>.tar.gz | tar xvz

# Подключаем виртуальные ФС к новому корню
mount --bind /dev /mnt/dev
mount --bind /dev/pts /mnt/dev/pts
mount --bind /sys /mnt/sys
mount --bind /proc /mnt/proc
mount --bind /run /mnt/run
<% if host_param_true?('uefi') %>
mkdir /mnt/boot/efi
mount /dev/$ESP_PART /mnt/boot/efi
<% end -%> 

STAGE_CALL provisioning_end 5

# Вызываем задачи специфичные для ОС
<%= snippet_if_exists(template_name + "_" + @host.operatingsystem.family) %>

# дополнительные неспецифичные задачи, выполняемые при наличии распакованного корня, например, генерация fstab, имя хоста, задание пароля root и т.п.

STAGE_CALL finish_template_start 5

# Размонтируем корень
<% if host_param_true?('uefi') %>
umount /dev/$ESP_PART
<% end -%>
umount /mnt/dev/pts
umount /mnt/*
umount /mnt
swapoff /dev/$VGNAME/swap

# Отправляем Foreman сообщение о завершении установки
wget --no-proxy --quiet --output-document=/dev/null <%= foreman_url('built') %>
sync

STAGE_CALL finish_template_end 5

reboot

Here you can also set the hostname, generate fstab (the genfstab script from the ArchLinux LiveCD helped us a lot with this), set the user, locale, etc. In general, follow the procedures that are the same for modern Linux distributions.

Specific mandatory tasks are network setup, as well as OS upgrades and software installations. Since network configuration is tied to adapter names and other specific parameters, we use the firstinstall script. It is generated at the installation stage and written by the main script to the OS file system. The script starts with systemd or rc, depending on the OS.

Here is an example network configuration for Ubuntu/Debian:

# Setting network up
CONNECTION_NAME=\$(ip l | grep -B1 -i '<%= @host.mac %>' | head -1 | cut -d: -f2)

<% if @host.operatingsystem.name.include?('debian') or @host.operatingsystem.name.include?('ubuntu_bionic') %>
cat << EON > /etc/network/interfaces
#loopback
auto lo
iface lo inet loopback
 
#
auto \$CONNECTION_NAME
allow-hotplug \$CONNECTION_NAME
iface \$CONNECTION_NAME inet static
address <%= @host.ip %>
gateway <%= @host.subnet.gateway %>
netmask <%= @host.subnet.mask %>
dns-nameservers <%= @host.subnet.dns_primary %>
dns-search <%= @host.domain %>
EON

ifdown \$CONNECTION_NAME
ifup \$CONNECTION_NAME
<% else %>
mkdir -p /etc/netplan
cat << EON > /etc/netplan/01-netcfg.yaml
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
version: 2
renderer: networkd
ethernets:
\$CONNECTION_NAME:
addresses: [ <%= @host.ip %>/<%= @host.subnet.cidr %> ]
gateway4: <%= @host.subnet.gateway %>
nameservers:
search: [ <%= @host.domain %> ]
addresses:
- "<%= @host.subnet.dns_primary %>"
EON

netplan apply
<% end -%>

The escaping $ is used here because it is the firstinstall script included in the body of the main script. It is inserted into the file at the root of the OS via “cat ”.

So that we can see the progress of the installation, each stage of the process is controlled by STAGE_CALL calls to our API, and if something goes wrong, it is reflected in the logs. The installation is a single script that is easy to debug: just set the manualinstall parameter for Foreman to get a LiveCD with the compiled script, but without starting the installation.

The main disadvantage of the approach is that since the installation is carried out on a separate OS, it is impossible to see problems with hardware compatibility until the reboot stage. On the other hand, adding support for new hardware is even easier as there is no need to support udeb for Debian/Ubuntu, for example.

Conclusion

By switching to a new scheme, we unified the process of deploying the OS and servicing servers of different generations: systems without UEFI (based on sockets 1366 and similar), HP and IBM blade systems, Supermicro servers from generations X8 to X12, Gigabyte, Asus and Asrock, custom BIOS of T-platforms and OCP Winterfell, as well as modern Dell servers and motherboards for EPYC and Ryzen, on which Legacy mode is not actually supported.

Now we automatically hand over almost 90% of the machines to customers: if you order a standard or stock server through the API or the website, it will be completely ready in 5-15 minutes. The solution described in the article made it possible to almost completely automate the independent reinstallation of the OS by the client through a personal account or API. Engineers need to connect only in rare cases on machines without remote control, with its non-standard variations, and also in the case of a complex configuration of the disk subsystem.

Now the release of a new branch of the OS and the next installation system does not cause a desire to leave the profession – it is rather a routine event from those that happen once a quarter. We continue to work on improving the deployment system every day, analyzing the logs of unsuccessful installations, as well as testing new motherboards and server platforms in various modes. We are adding new features for partitioning disks, transferring SSH keys, setting up OS and network addresses, notifying clients about installations, etc. The system is completely ready for development and expansion.

Similar Posts

Leave a Reply