An example of configuring Linux for high-load Kubernetes clusters

Quite often you can find examples of tuning (tuning) the Linux network stack for highly loaded systems, however, some of these guides are very outdated and do not use the recommended approaches to configuring the system. In this overview, we’ll take a look at preparing Linux to be used as a Kubernetes control plane node.

Tuned

Tuned is a daemon that uses udev to keep track of connected devices
and statically or dynamically configures system settings using the selected profile.

Tuned is installed along with a collection of profiles for certain types of workload: network-latency, powersave, desktop and others.

To install tuned on distributions using yum/dnf:

yum install -y tuned
systemctl enable --now tuned
systemctl status tuned

To display the active profile:

tuned-adm active

Current active profile: virtual-host

To list all available tuned profiles:

tuned-adm list
Available profiles:
- accelerator-performance     - Throughput performance based tuning with disabled higher latency STOP states
- aws                         - Optimize for aws ec2 instances
- balanced                    - General non-specialized tuned profile
- desktop                     - Optimize for the desktop use-case
- hpc-compute                 - Optimize for HPC compute workloads
- intel-sst                   - Configure for Intel Speed Select Base Frequency
- latency-performance         - Optimize for deterministic performance at the cost of increased power consumption
- network-latency             - Optimize for deterministic performance at the cost of increased power consumption, focused on low latency network performance
- network-throughput          - Optimize for streaming network throughput, generally only necessary on older CPUs or 40G+ networks
- optimize-serial-console     - Optimize for serial console use.
- powersave                   - Optimize for low power consumption
- throughput-performance      - Broadly applicable tuning that provides excellent performance across a variety of common server workloads
- virtual-guest               - Optimize for running inside a virtual guest
- virtual-host                - Optimize for running KVM guests
Current active profile: network-latency

To enable the “network-latency” profile:

tuned-adm profile network-latency
tuned-adm active
Current active profile: network-latency

Settings for the “network-latency” profile:

cat /usr/lib/tuned/network-latency/tuned.conf
#
# tuned configuration
#

[main]
summary=Optimize for deterministic performance at the cost of increased power consumption, focused on low latency network performance
include=latency-performance

[vm]
transparent_hugepages=never

[sysctl]
net.core.busy_read=50
net.core.busy_poll=50
net.ipv4.tcp_fastopen=3
kernel.numa_balancing=0

[bootloader]
cmdline_network_latency=skew_tick=1

With the “include” directive, this profile includes settings from “latency-performance”.

There are tuned profiles with settings for highly loaded databases:

yum search tuned-profiles
yum install -y tuned-profiles-mssql tuned-profiles-oracle

By default, tuned runs in Linux daemon mode, but this can be changed by adding daemon = 0 V /etc/tuned/tuned-main.conf.

Create a custom tuned profile

The tuned profiles are by default located in the directory /etc/tuned/<profile_name>/tuned.conf or in /usr/lib/tuned/<profile_name>/tuned.conf.

Create a directory for your profile:

mkdir /usr/lib/tuned/custom

Create a new config in the profile directory:

touch /usr/lib/tuned/custom/tuned.conf

Make the file executable:

chmod +x /usr/lib/tuned/custom/tuned.conf

Open for editing (vim /usr/lib/tuned/custom/tuned.conf) created profile and add the following lines:

[main]
summary=Кастомизированный профиль tuned

A new preset should appear in the list with available profiles:

tuned-adm list | grep custom
- custom                      - Кастомизированный профиль tuned

Copy an example profile of the “iron” host or virtual machine intended for installing Kubernetes plane nodes:

[main]
summary=Кастомизированный профиль tuned

[cpu]
governor=performance
energy_perf_bias=performance
min_perf_pct=100

[vm]
transparent_hugepages=never

[net]
nf_conntrack_hashsize=131072

[sysctl]
net.ipv4.ip_forward=1
net.core.busy_read=50
net.core.busy_poll=50

# Увеличиваем значения параметров net.nf_conntrack_max и net.netfilter.nf_conntrack_max,
# отвечающих за максимальное количество сетевых соединений.
net.netfilter.nf_conntrack_max=1048576
net.nf_conntrack_max=1048576

# https://datatracker.ietf.org/doc/html/rfc7413
# https://repository.ihu.edu.gr/xmlui/bitstream/handle/11544/29857/Grendas_Dimitrios_Dissertation_IHU_Cybersecurity_2021.pdf?sequence=1
# Для сборки nginx с поддержкой TCP Fast Open нужно использовать флаг `-DTCP_FASTOPEN=23`
# https://github.com/VKCOM/nginx-quic/issues/1
net.ipv4.tcp_fastopen=3

# https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/tuning_guide/reduce_tcp_performance_spikes
net.ipv4.tcp_timestamps=0

kernel.sched_autogroup_enabled=0
kernel.sched_migration_cost_ns=5000000
kernel.sched_min_granularity_ns=10000000

# https://github.com/kubernetes-sigs/kubespray/issues/8825
net.ipv4.conf.all.arp_announce=2
net.ipv4.neigh.default.gc_thresh1=8192
net.ipv4.neigh.default.gc_thresh2=32768
net.ipv4.neigh.default.gc_thresh3=65536
net.ipv6.neigh.default.gc_thresh1=8192
net.ipv6.neigh.default.gc_thresh2=32768
net.ipv6.neigh.default.gc_thresh3=65536

fs.inotify.max_user_watches=65536
fs.inotify.max_user_instances=8192

# Разрешаем больше SYN соединений
net.ipv4.tcp_max_syn_backlog=100000
# И больше запросов на сокет
net.core.somaxconn=100000

# Увеличиваем значения по-умолчанию для сокетов TCP. Три значения: minimum, default, maximum.
net.ipv4.tcp_wmem='4096 12582912 16777216'
net.ipv4.tcp_rmem='4096 12582912 16777216'

# Расширяем диапазон динамических портов
net.ipv4.ip_local_port_range="10240 65535"

vm.max_map_count=262144

[sysfs]
/sys/module/nvme_core/parameters/io_timeout=4294967295
/sys/module/nvme_core/parameters/max_retries=10

Apply this profile on the system:

tuned-adm profile custom

To check the work tuned run the following command:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance

Thanks to scaling_governoryou can choose the optimal processor frequency control scheme, which affects power consumption and performance:

cpupower -c all frequency-info

Now if you change the value scaling_governor on powersave (power saving mode) and run the command again:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave

useful links

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *