Deploying a Kubernetes cluster using Kubernetes
Within the course DevOps Practices and Tools prepared a translation of a useful article for you.
We also invite you to an open webinar on the topic “Prometheus: quick start”… At the webinar, participants, together with an expert, will examine the Prometheus architecture and how it works with metrics; will figure out how to generate alerts and events in the system.
Wait … what, what? Yes, I’ve heard similar reactions to my suggestion to use Kubernetes to build Kubernetes clusters.
But for cloud infrastructure automation nothing better comes to my mind than Kubernetes itself… Using one central K8s cluster, we create and manage hundreds of other K8s clusters. In this article, I’ll show you how to do it.
Note: SAP Concur uses AWS EKS, but the concepts discussed here apply to Google GKE, Azure AKS, and any other cloud provider that offers Kubernetes.
Ready for commercial operation
Deploying a Kubernetes cluster to any of the major cloud providers is very easy. You can create and run a cluster on AWS EKS with one command:
$ eksctl create cluster
However, creating a production-ready Kubernetes cluster requires more. Although everyone understands “production readiness” differently, SAP Concur uses the following four steps to create and deliver Kubernetes clusters.
Four stages of assembly
Preliminary tests… A set of baseline tests for the target AWS environment to verify that all requirements are met before actually creating the cluster. For example: available IP addresses for each subnet, AWS exports, SSM parameters and other variables.
EKS control plane and nodegroup… Actual assembly of the AWS EKS cluster with worker nodes connected.
Installing add-ons… This is what makes your cluster nicer 🙂 Install addons like Istio, logging integration, autoscaler, etc. This list of additions is not exhaustive and is completely optional.
Cluster validation… At this stage, we validate the cluster (EKS core components and add-ons) from a functional point of view before putting it into production. The more tests you write, the better you sleep. (Especially if you are the tech support person on duty!)
Glue the stages together
Each of these four steps uses different tools and techniques, which I will discuss later. We were looking for a single tool for all stages that would glue everything together, support sequential and parallel execution, be event-driven and, preferably, with assembly visualization.
And we found Argo… In particular, Argo Events and Argo workflows… They both run on Kubernetes as CRDs and use declarative YAML, which is also used in other Kubernetes deployments.
We found ideal combination: Imperative Orchestration, Declarative Automation…
Argo workflows
Argo Workflows is an open source container-native workflow engine for orchestrating parallel jobs in Kubernetes. Argo Workflows is implemented as Kubernetes CRD.
Note: If you are familiar with K8s YAML, then I promise you it will be easy for you.
Let’s see how the above build steps might look like in Argo Workflows.
1. Preliminary tests
To write tests, we use BATS framework… Writing a test in BATS is very simple:
#!/usr/bin/env bats
@test “More than 100 available IP addresses in subnet MySubnet” {
AvailableIpAddressCount=$(aws ec2 describe-subnets --subnet-ids MySubnet | jq -r ‘.Subnets[0].AvailableIpAddressCount’)
[ “${AvailableIpAddressCount}” -gt 100 ]
}
Running BATS tests in parallel (the above test avail-ip-addresses.bats
and three more) using Argo Workflow might look like this:
— name: preflight-tests
templateRef:
name: argo-templates
template: generic-template
arguments:
parameters:
— name: command
value: “{{item}}”
withItems:
— bats /tests/preflight/accnt-name-export.bats”
— bats /tests/preflight/avail-ip-addresses.bats”
— bats /tests/preflight/dhcp.bats”
— bats /tests/preflight/subnet-export.bats”
2.EKS control plane and nodegroup
You can choose from various tools to create an EKS cluster. Available eksctl
, CloudFormation or Terraform. Two-step EKS build with dependencies using CloudFormation templates (eks-controlplane.yaml
and eks-nodegroup.yaml
) in Argo Workflow might look like this.
— name: eks-controlplane
dependencies: [“preflight-tests”]
templateRef:
name: argo-templates
template: generic-template
arguments:
parameters:
— name: command
value: |
aws cloudformation deploy
--stack-name {{workflow.parameters.CLUSTER_NAME}}
--template-file /eks-core/eks-controlplane.yaml
--capabilities CAPABILITY_IAM
- name: eks-nodegroup
dependencies: [“eks-controlplane”]
templateRef:
name: argo-templates
template: generic-template
arguments:
parameters:
— name: command
value: |
aws cloudformation deploy
--stack-name {{workflow.parameters.CLUSTER_NAME}}-nodegroup
--template-file /eks-core/eks-nodegroup.yaml
--capabilities CAPABILITY_IAM
3. Installing add-ons
You can install add-ons using kubectl
, helm, kustomize, or a combination of both. For example setting metrics-server
from helm template
and kubectl
, provided that installation is required metrics-server
, in Argo Workflows might look like this.
— name: metrics-server
dependencies: [“eks-nodegroup”]
templateRef:
name: argo-templates
template: generic-template
when: “‘{{workflow.parameters.METRICS-SERVER}}’ != none”
arguments:
parameters:
— name: command
value: |
helm template /addons/{{workflow.parameters.METRICS-SERVER}}/
--name “metrics-server”
--namespace “kube-system”
--set global.registry={{workflow.parameters.CONTAINER_HUB}} |
kubectl apply -f -
4. Cluster validation
We use the excellent BATS library to test the functionality of add-ons. DETIKwhich makes writing K8s tests easier.
#!/usr/bin/env bats
load “lib/utils”
load “lib/detik”
DETIK_CLIENT_NAME=”kubectl”
DETIK_CLIENT_NAMESPACE="kube-system"
@test “verify the deployment metrics-server” {
run verify “there are 2 pods named ‘metrics-server’”
[ “$status” -eq 0 ]
run verify “there is 1 service named ‘metrics-server’”
[ “$status” -eq 0 ]
run try “at most 5 times every 30s to find 2 pods named ‘metrics-server’ with ‘status’ being ‘running’”
[ “$status” -eq 0 ]
run try “at most 5 times every 30s to get pods named ‘metrics-server’ and verify that ‘status’ is ‘running’”
[ “$status” -eq 0 ]
}
Executing the above BATS DETIK test file (metrics-server.bats
), provided that metrics-server
installed, in Argo Workflows it might look like this:
— name: test-metrics-server
dependencies: [“metrics-server”]
templateRef:
name: worker-containers
template: addons-tests-template
when: “‘{{workflow.parameters.METRICS-SERVER}}’ != none”
arguments:
parameters:
— name: command
value: |
bats /addons/test/metrics-server.bats
Imagine how many tests you can plug in here. You can run Sonobuoy conformance tests, Popeye – A Kubernetes Cluster Sanitizer and Fairwinds’ Polaris… Connect them with Argo Workflows!
If you get to this point, then you have a fully working AWS EKS cluster ready for production, with installed, tested and ready metrics-server
… You are great!
But we are not saying goodbye yet, I left the most interesting for the end.
WorkflowTemplate
Argo Workflows supports templates (WorkflowTemplate), which allows you to create reusable workflows. Each of the four assembly steps is a template. Basically, we’ve created building blocks that can be combined as needed. Using one “main” workflow, you can perform all the stages of the assembly in order (as in the example above), or independently of each other. This flexibility is achieved with Argo Events.
Argo Events
Argo Events is an event-driven workflow automation framework for Kubernetes that helps you launch K8s objects, Argo Workflow workflows, serverless workloads, and more on events from various sources like webhook, s3, schedules, queues messages, gcp pubsub, sns, sqs, etc.
The cluster assembly is triggered by an API call (Argo Events) using JSON. In addition, each of the four assembly steps (WorkflowTemplate) has its own API endpoint. Kubernetes maintainers can greatly benefit from this:
Not sure about the state of the cloud? Use the Pre-Test API…
Looking to build a bare EKS cluster? Use the eks-core (control-plane and nodegroup) API…
Want to install or reinstall add-ons on an existing EKS cluster? There are addons API…
Is something strange happening to the cluster and you need to run tests quickly? Call the test API…
Argo features
how Argo Eventsand Argo workflows includes a large set of features that you do not need to implement yourself.
Here are seven of them that are most important to us:
Parallelism
Dependencies
Retries – Note in the screenshots above for the red pre-test and validation tests. Argo repeated them automatically, followed by successful completion.
Conditions
S3 support
Workflow Templates (WorkflowTemplate)
Events Sensor parameters
Conclusion
We were able to use various tools that work together to determine the desired state of the infrastructure, providing flexibility and high speed of the project. We plan to use Argo Events, Argo Workflows and other automation tasks. The possibilities are endless.
Learn more about the course DevOps Practices and Tools…
Watch an open webinar on the topic “Prometheus: quick start”…