Rolling k8s 1.26 ansible+jenkins

And in general, a managed cluster with your own hands for 1000 and one man-hour.

Greetings to all! A recent massive github update (when nothing worked there for hours) prompted me to share my experience of automating k8s installation on bare metal.

So. Task: deploy a kubernetes cluster of the latest version 1.26 at the moment using CI / CD in the minimum time (about 3 minutes on my equipment), and in general, start building your own cluster management tools from this.

This will require 3 servers running ubuntu to pass the sonobuoy conformance tests (adapting under rhel will require minor improvements).

The developed process is more suitable for deploying a test environment, it is for this that the emphasis is placed on speed, any step without which everything will fully function is excluded. For a production cluster, at a minimum, you will have to add additional separate etcd nodes, a process that is beyond the scope of this article. However, etcd nodes on the master nodes will be created in the process in question.

About the role of ansible inventory.

[masters]
k8s
[master]
k8s
[etcd]
k8s
[workers]
r01
r02
[jenkins]
k8s
[grafana]
k8s

Let’s make it simple, very simple. To do this, configure the dhcp service manually or with your router’s API. It’s a good idea to set up the router’s domain, just provide any identifier that will be added to your hostnames. You will need to add the mac addresses of the network interfaces of your nodes to the list of matching IP addresses in your subnet. In other words, plan for static addresses for your cluster nodes.


For virtual machines, check that the mac address of the network interface is static.
For virtual machines, check that the mac address of the network interface is static.

Perhaps, it remains only to remind you to send the public ssh keys of your ansible host to the nodes of the future cluster.

Strategy: select asynchronous, independent from each other from the general process of installing and configuring the cluster and describe them as ansible roles. These processes will run in parallel stages of the jenkins pipeline.

pipeline { 
    agent any
    options {
        parallelsAlwaysFailFast()
    }
    stages { 
        stage('Deploy Kubernetes Cluster conforms to kubeadm 1.26 kubernetes.io official'){ 
           parallel {
             stage('Apply system requirements'){
                   steps { 
                     sh 'ansible-playbook -i files/hosts init-phd.yaml'
                   }
             }
             stage('Installing containerd container runtime'){
                   steps{
                     sh 'ansible-playbook -i files/hosts init-phb.yaml'
                   }
             }
             stage('Configuring systemd cgroup driver'){
                   steps{
                     sh 'ansible-playbook -i files/hosts init-phc.yaml'
                   }
             }
             stage('Installing kubeadm, kubelet and kubectl'){
                   steps { 
                     sh 'ansible-playbook -i files/hosts init-pha.yaml'
                   }
             }
           }
        }
         stage('Bootstrap cluster with kubeadm'){
            steps { 
                sh 'ansible-playbook -i files/hosts init-masters.yaml'
            }
         }
    }
}

That is, we have 5 PLAYs: 4 asynchronous and the last PLAY


Loading with kubeadm the first master node
Loading with kubeadm the first master node

Pay attention to the parameter in ansible.cfg

[defaults]
forks = 50

Focus on the total number of nodes in inventory, asynchronous playbook tasks do the same work on all nodes.

To improve performance, it will be shown below that master nodes and worker nodes are deployed in different pipelines. This is due to the peculiarities of ansible. With this approach, you can start attaching work nodes much faster. You should not distribute downloaded tarballs and other large files from the master node. Etcd will start there, for it the access time to its database is critical.

Plays:

  1. Applying system requirements includes removing unnecessary services, installing the necessary packages, disabling swap

  2. Install containerd, runc runtime, network interfaces

  3. Configuring cgroup driver in linux systemd

  4. Installing the kubernetes repository and components

  5. Running kubeadm to initialize the master node

Now to the materials of the official kubernetes website. In version 1.26:

  • as usual, compatibility of component versions is indicated

  • introduced new code for ubuntu, due to deprecated apt-key

    We use the principles of simplicity to write roles and better code. When it is, preferably, not at all. First, we structure our initial information for step 3 in defaults:

# defaults file for containerd
content:
    github.com/containerd/containerd/releases/download/v1.6.4/ : 
      name: containerd-1.6.4-linux-amd64.tar.gz
      path: /usr/local
      tag:  tar
    github.com/containernetworking/plugins/releases/download/v1.1.1/ : 
      name: cni-plugins-linux-amd64-v1.1.1.tgz
      path: /opt/cni/bin
      tag:  tar
    github.com/opencontainers/runc/releases/download/v1.1.4/ : 
      name: runc.amd64
      path: /usr/local/sbin/runc
      tag:  install
    raw.githubusercontent.com/containerd/containerd/main/ : 
      name: containerd.service
      path: /usr/local/lib/systemd/system
      tag:  service
    containerd-file-config.toml:
      name: ./roles/containerd/files/config.toml
      path: /etc/containerd
      tag: file
    cni-template-j2file:
      name: ./roles/containerd/templates/10-containerd-net.conflist.j2
      path: /etc/cni/net.d/10-containerd-net.conflist
      tag: j2t
    cni-lo-file:
      name: ./roles/containerd/files/99-loopback.conf
      path: /etc/cni/net.d
      tag: file
  • Links to tarballs of officially recommended versions have the ‘tar’ tag

  • The runc module is recommended to be added via the install utility

  • File with description of containerd service with tag ‘service’

  • Network interface configuration file as a jinja template tag ‘j2t’

  • Ready containerd configuration under systemd and loopback interface for container networks with file tag

More than half of some playbooks consist of routine downloads of archives, unpacks, downloads of services, calls to install, creation of directories, exactly the same systemd and service restart+enable restart code, working out templates and copying role files. My playbooks never contain such code, although they do the same. It’s just that all these actions are performed by the ansible role I developed based on tags. Variable declarations in defaults for ansible inside PLAY make up the common space. Role:

# tasks file for wgi

- name: Wget content
  get_url:
    url: "https://{{ item.key }}{{ item.value.name }}"
    dest: "/tmp/{{ item.value.name }}"
    force: false
  loop: "{{ content | dict2items |
            rejectattr('value.tag', 'search', 'file' ) |
            rejectattr('value.tag', 'search', 'j2t' ) |
            list }}"
  loop_control:
     label: "{{ item.key }}"
 
- block:

  - name: Creates directory
    file:
      path: "{{
               item.value.path
                  if not (item.value.tag in ['install','j2t'])
                     else ( item.value.path | dirname )
              }}"
      state: directory
      owner: root
      group: root
      mode: 0755
    loop: "{{ content | dict2items  }}"
    loop_control:
      label: "{{ item.key }}"

  - name: Copy services content
    copy:
      src:  "/tmp/{{ item.value.name }}"
      dest: "{{ item.value.path }}"
      remote_src: yes
    register: content_restart
    notify:
      - reload systemd
      - restart systemd
    loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'service' ) | list }}"
    loop_control:
      label: "{{ item.key }}"
        
  - name: Extract archived
    unarchive:
      src:  "/tmp/{{ item.value.name }}"
      dest: "{{ item.value.path }}"
      owner: root
      group: root
      mode:  0755
      remote_src: yes
    loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'tar' ) | list }}"
    loop_control:
      label: "{{ item.key }}"

  - name: Install module
    shell: install -m 755 "/tmp/{{ item.value.name }}" "{{ item.value.path }}"
    args:
      creates: "{{ item.value.path }}"
    loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'install' ) | list }}"
    loop_control:
      label: "{{ item.key }}"
      
  - name: Copy plain artifacts
    copy:
      src:  "/tmp/{{ item.value.name }}"
      dest: "{{ item.value.path }}"
      owner: root
      group: root
      mode: u=rw,g=r,o=r
      remote_src: yes
    loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'plain' ) | list }}"
    loop_control:
      label: "{{ item.key }}"

  - name: Copy role files
    copy:
      src:  "{{ item.value.name }}"
      dest: "{{ item.value.path }}"
      owner: root
      group: root
      mode: u=rw,g=r,o=r
    loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'file' ) | list }}"
    loop_control:
      label: "{{ item.key }}"
      
  - name: Copy role templates
    template:
      src:  "{{ item.value.name }}"
      dest: "{{ item.value.path }}"
      owner: root
      group: root
      mode: u=rw,g=r,o=r
    loop: "{{ content | dict2items | selectattr('value.tag', 'search', 'j2t' ) | list }}"
    loop_control:
      label: "{{ item.key }}"

  become: yes

Perhaps it needs to be clarified, systemd is rebooted via notify. The entries for changed services will be in the registered array, based on which handler restart systemd will restart the services listed in the array and issue the enabled command. Handlers are described in the file of this role, if necessary, they can be overridden, ansible works like that.

The jinja2 template in templates calculates the address that will be assigned to the cni network interface.

In general, tasks/main.yml for the containerd role turns out to be completely without code. There is no code, but there is a feeling of satisfaction.

What is important in the config.toml configuration file based on official requirements:

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]

  ...

  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]

    SystemdCgroup = true

And

[plugins."io.containerd.grpc.v1.cri"]

  sandbox_image = "registry.k8s.io/pause:3.2"

The default configuration file generated by containerd does not contain the required options.

Let’s look at point 4.

Again, the above role will help. Downloads the key for signing the repository.

content:
  packages.cloud.google.com/apt/doc/:
    name: apt-key.gpg
    path:  /etc/apt/keyrings/
    tag: plain
  Add-the-Kubernetes-apt-repository:
    name: ./roles/k8s-install/files/kubernetes.list
    path: /etc/apt/sources.list.d
    tag: file

The kubernetes.list file from the official source for the role now looks like this:

deb [signed-by=/etc/apt/keyrings/apt-key.gpg] https://apt.kubernetes.io/ kubernetes-xenial main

There is no code again, the role from step 4 actually contains only defaults.

Just as usual, don’t forget to work this out:

net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1

The rest of the installation is still in step 5:

- hosts: master
  become: yes
  pre_tasks:
    - name: initialize the cluster
      shell: kubeadm init --pod-network-cidr=10.244.0.0/16
      args:
        chdir: $HOME
        creates: /etc/kubernetes/admin.conf
      register: kubeadm

    - debug: 
        var: kubeadm.stdout_lines
  roles:
    - k8s-copy-admin

The creates check allows you to later attach master nodes simply by specifying the masters group, kubeadm will not start again.

We add work nodes as always
We add work nodes as always

The pipeline for attaching worker nodes can be launched in parallel with the launch of the master node. There are usually more worker nodes than master nodes, so the total execution time of the asynchronous part should be at least as long as the execution time on the slowest worker node. Synchronization of kubeadm launch is needed, it is done by checking the presence of the generated configuration file on the master node.

The ability to quickly restore a cluster allows you to test hundreds of the most incredible hypotheses and configurations that are not and never will be in the clouds, deploy testing environments, and much more.

More materials on this topic in my repository https://itoracl.github.io/k8s

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *