Rolling k8s 1.26 ansible+jenkins
And in general, a managed cluster with your own hands for 1000 and one man-hour.
Greetings to all! A recent massive github update (when nothing worked there for hours) prompted me to share my experience of automating k8s installation on bare metal.
So. Task: deploy a kubernetes cluster of the latest version 1.26 at the moment using CI / CD in the minimum time (about 3 minutes on my equipment), and in general, start building your own cluster management tools from this.
This will require 3 servers running ubuntu to pass the sonobuoy conformance tests (adapting under rhel will require minor improvements).
The developed process is more suitable for deploying a test environment, it is for this that the emphasis is placed on speed, any step without which everything will fully function is excluded. For a production cluster, at a minimum, you will have to add additional separate etcd nodes, a process that is beyond the scope of this article. However, etcd nodes on the master nodes will be created in the process in question.
About the role of ansible inventory.
[masters]
k8s
[master]
k8s
[etcd]
k8s
[workers]
r01
r02
[jenkins]
k8s
[grafana]
k8s
Let’s make it simple, very simple. To do this, configure the dhcp service manually or with your router’s API. It’s a good idea to set up the router’s domain, just provide any identifier that will be added to your hostnames. You will need to add the mac addresses of the network interfaces of your nodes to the list of matching IP addresses in your subnet. In other words, plan for static addresses for your cluster nodes.

Perhaps, it remains only to remind you to send the public ssh keys of your ansible host to the nodes of the future cluster.
Strategy: select asynchronous, independent from each other from the general process of installing and configuring the cluster and describe them as ansible roles. These processes will run in parallel stages of the jenkins pipeline.
pipeline {
agent any
options {
parallelsAlwaysFailFast()
}
stages {
stage('Deploy Kubernetes Cluster conforms to kubeadm 1.26 kubernetes.io official'){
parallel {
stage('Apply system requirements'){
steps {
sh 'ansible-playbook -i files/hosts init-phd.yaml'
}
}
stage('Installing containerd container runtime'){
steps{
sh 'ansible-playbook -i files/hosts init-phb.yaml'
}
}
stage('Configuring systemd cgroup driver'){
steps{
sh 'ansible-playbook -i files/hosts init-phc.yaml'
}
}
stage('Installing kubeadm, kubelet and kubectl'){
steps {
sh 'ansible-playbook -i files/hosts init-pha.yaml'
}
}
}
}
stage('Bootstrap cluster with kubeadm'){
steps {
sh 'ansible-playbook -i files/hosts init-masters.yaml'
}
}
}
}
That is, we have 5 PLAYs: 4 asynchronous and the last PLAY

Pay attention to the parameter in ansible.cfg
[defaults]
forks = 50
Focus on the total number of nodes in inventory, asynchronous playbook tasks do the same work on all nodes.
To improve performance, it will be shown below that master nodes and worker nodes are deployed in different pipelines. This is due to the peculiarities of ansible. With this approach, you can start attaching work nodes much faster. You should not distribute downloaded tarballs and other large files from the master node. Etcd will start there, for it the access time to its database is critical.
Plays:
Applying system requirements includes removing unnecessary services, installing the necessary packages, disabling swap
Install containerd, runc runtime, network interfaces
Configuring cgroup driver in linux systemd
Installing the kubernetes repository and components
Running kubeadm to initialize the master node
Now to the materials of the official kubernetes website. In version 1.26:
as usual, compatibility of component versions is indicated
introduced new code for ubuntu, due to deprecated apt-key
We use the principles of simplicity to write roles and better code. When it is, preferably, not at all. First, we structure our initial information for step 3 in defaults:
# defaults file for containerd
content:
github.com/containerd/containerd/releases/download/v1.6.4/ :
name: containerd-1.6.4-linux-amd64.tar.gz
path: /usr/local
tag: tar
github.com/containernetworking/plugins/releases/download/v1.1.1/ :
name: cni-plugins-linux-amd64-v1.1.1.tgz
path: /opt/cni/bin
tag: tar
github.com/opencontainers/runc/releases/download/v1.1.4/ :
name: runc.amd64
path: /usr/local/sbin/runc
tag: install
raw.githubusercontent.com/containerd/containerd/main/ :
name: containerd.service
path: /usr/local/lib/systemd/system
tag: service
containerd-file-config.toml:
name: ./roles/containerd/files/config.toml
path: /etc/containerd
tag: file
cni-template-j2file:
name: ./roles/containerd/templates/10-containerd-net.conflist.j2
path: /etc/cni/net.d/10-containerd-net.conflist
tag: j2t
cni-lo-file:
name: ./roles/containerd/files/99-loopback.conf
path: /etc/cni/net.d
tag: file
Links to tarballs of officially recommended versions have the ‘tar’ tag
The runc module is recommended to be added via the install utility
File with description of containerd service with tag ‘service’
Network interface configuration file as a jinja template tag ‘j2t’
Ready containerd configuration under systemd and loopback interface for container networks with file tag
More than half of some playbooks consist of routine downloads of archives, unpacks, downloads of services, calls to install, creation of directories, exactly the same systemd and service restart+enable restart code, working out templates and copying role files. My playbooks never contain such code, although they do the same. It’s just that all these actions are performed by the ansible role I developed based on tags. Variable declarations in defaults for ansible inside PLAY make up the common space. Role:
# tasks file for wgi
- name: Wget content
get_url:
url: "https://{{ item.key }}{{ item.value.name }}"
dest: "/tmp/{{ item.value.name }}"
force: false
loop: "{{ content | dict2items |
rejectattr('value.tag', 'search', 'file' ) |
rejectattr('value.tag', 'search', 'j2t' ) |
list }}"
loop_control:
label: "{{ item.key }}"
- block:
- name: Creates directory
file:
path: "{{
item.value.path
if not (item.value.tag in ['install','j2t'])
else ( item.value.path | dirname )
}}"
state: directory
owner: root
group: root
mode: 0755
loop: "{{ content | dict2items }}"
loop_control:
label: "{{ item.key }}"
- name: Copy services content
copy:
src: "/tmp/{{ item.value.name }}"
dest: "{{ item.value.path }}"
remote_src: yes
register: content_restart
notify:
- reload systemd
- restart systemd
loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'service' ) | list }}"
loop_control:
label: "{{ item.key }}"
- name: Extract archived
unarchive:
src: "/tmp/{{ item.value.name }}"
dest: "{{ item.value.path }}"
owner: root
group: root
mode: 0755
remote_src: yes
loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'tar' ) | list }}"
loop_control:
label: "{{ item.key }}"
- name: Install module
shell: install -m 755 "/tmp/{{ item.value.name }}" "{{ item.value.path }}"
args:
creates: "{{ item.value.path }}"
loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'install' ) | list }}"
loop_control:
label: "{{ item.key }}"
- name: Copy plain artifacts
copy:
src: "/tmp/{{ item.value.name }}"
dest: "{{ item.value.path }}"
owner: root
group: root
mode: u=rw,g=r,o=r
remote_src: yes
loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'plain' ) | list }}"
loop_control:
label: "{{ item.key }}"
- name: Copy role files
copy:
src: "{{ item.value.name }}"
dest: "{{ item.value.path }}"
owner: root
group: root
mode: u=rw,g=r,o=r
loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'file' ) | list }}"
loop_control:
label: "{{ item.key }}"
- name: Copy role templates
template:
src: "{{ item.value.name }}"
dest: "{{ item.value.path }}"
owner: root
group: root
mode: u=rw,g=r,o=r
loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'j2t' ) | list }}"
loop_control:
label: "{{ item.key }}"
become: yes
Perhaps it needs to be clarified, systemd is rebooted via notify. The entries for changed services will be in the registered array, based on which handler restart systemd will restart the services listed in the array and issue the enabled command. Handlers are described in the file of this role, if necessary, they can be overridden, ansible works like that.
The jinja2 template in templates calculates the address that will be assigned to the cni network interface.
In general, tasks/main.yml for the containerd role turns out to be completely without code. There is no code, but there is a feeling of satisfaction.
What is important in the config.toml configuration file based on official requirements:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
...
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
And
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.k8s.io/pause:3.2"
The default configuration file generated by containerd does not contain the required options.
Let’s look at point 4.
Again, the above role will help. Downloads the key for signing the repository.
content:
packages.cloud.google.com/apt/doc/:
name: apt-key.gpg
path: /etc/apt/keyrings/
tag: plain
Add-the-Kubernetes-apt-repository:
name: ./roles/k8s-install/files/kubernetes.list
path: /etc/apt/sources.list.d
tag: file
The kubernetes.list file from the official source for the role now looks like this:
deb [signed-by=/etc/apt/keyrings/apt-key.gpg] https://apt.kubernetes.io/ kubernetes-xenial main
There is no code again, the role from step 4 actually contains only defaults.
Just as usual, don’t forget to work this out:
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
The rest of the installation is still in step 5:
- hosts: master
become: yes
pre_tasks:
- name: initialize the cluster
shell: kubeadm init --pod-network-cidr=10.244.0.0/16
args:
chdir: $HOME
creates: /etc/kubernetes/admin.conf
register: kubeadm
- debug:
var: kubeadm.stdout_lines
roles:
- k8s-copy-admin
The creates check allows you to later attach master nodes simply by specifying the masters group, kubeadm will not start again.

The pipeline for attaching worker nodes can be launched in parallel with the launch of the master node. There are usually more worker nodes than master nodes, so the total execution time of the asynchronous part should be at least as long as the execution time on the slowest worker node. Synchronization of kubeadm launch is needed, it is done by checking the presence of the generated configuration file on the master node.
The ability to quickly restore a cluster allows you to test hundreds of the most incredible hypotheses and configurations that are not and never will be in the clouds, deploy testing environments, and much more.
More materials on this topic in my repository https://itoracl.github.io/k8s