Building High-Availability Kubernetes Cluster with Ansible
Table of Content
Building High-Availability Kubernetes Cluster with Ansible¶
NOTE TO MYSELF
- use rfc5737 ipv4 address ranges for documentation on this blog series as well as public git repository in example files
- The blocks 192.0.2.0/24 (TEST-NET-1), 198.51.100.0/24 (TEST-NET-2), and 203.0.113.0/24 (TEST-NET-3) are provided for use in documentation.
- use rfc6761 example domain names for documentation
- "example.", "example.com.", "example.net.", "example.org."
- write up the documentation, and then prepare repositories, and finally run ansible plays
- 2025-02-05
- I'm writing up to the bootstrap section
- I'm getting confused if I should cover all from scratch... No, of course not. So, I should just have readers to refer to the repository I'm going to prepare, and only the hands on will be about the preparation specific to each's own
- repo: https://github.com/pkkudo/homelab-v3-k8s
Introduction¶
This blog series is on building kubernetes cluster with HA control planes and external etcd cluster using ansible.
I have been doing homelab scrap & build every once in a while, trying out something new every time. I got a machine powerful enough to run as hypervisor, and decided to do my homelab build with following new challenges:
- Servers
- More virtual machines
- Build and run process
- More automated setup tasks using ansible
- Kubernetes cluster design
- External etcd topology
- Highly available control plane using haproxy and keepalived
- Use cilium instead of calico as network addon for the kubernetes cluster
- Replaces MetalLB to provide loadbalancer and L2 advertisement features
- Replaces NGINX fabric gateway to provide gateway api implementation
- Use longhorn instead of minio directpv for the volume provisioning feature
- Test out percona everest to see if this can replace database operators I'm currently using
What's covered in this series of blog posts¶
In this blog series I'll try to cover following items:
- Preparing Ansible project to build the kubernetes cluster
- Setting up DNS servers
- Setting up etcd cluster
- Setting up kubernetes cluster
- Demo on using cilium features
- Demo on using longhorn features
The project content is available on public git repository, and there are some modifications and changes you need to make to run the same ansible playbooks. I will insert the repository links wherever applicable.
Link to the repository: https://github.com/pkkudo/homelab-v3-k8s
Explaining the design¶
Let me explain about the very first kubernetes cluster I have built, and then what I am going to build this time.
Basic Kubernetes Cluster¶
The very first cluster was composed of three nodes: one control plane and two worker nodes. You prepare three machines and configure and install requirements, and initialize the cluster on control plane node, and then have other worker nodes join the cluster.
---
title: basic kubernetes cluster
---
flowchart TD
subgraph kubernetes[Kubernetes Cluster]
subgraph worker[Worker Nodes]
worker1
worker2
end
subgraph control_plane[Control Plane Node]
cp1
end
end
Kubernetes Cluster to be built in this series¶
This time round, since I have more capability to boot up servers as virtual machines, I am going to have an etcd cluster with three nodes and a kubernetes cluster with three control plane nodes and one worker node.
---
title: kubernetes cluster with external etcd cluster
---
flowchart LR
subgraph kubernetes[Kubernetes Cluster]
subgraph worker[Worker Nodes]
worker1
worker2
end
subgraph control_plane[Control Plane Node]
subgraph cp1
kube-apiserver
etcd
others[and other k8s control plane components]
end
end
end
Each of the three control plane nodes runs keepalived to host a highly-available virtual IP address (VIP), and haproxy to listen to the requests for kube-apiservers using the VIP and loadbalance it and pass them on to any of the control plane node
Here is a simple diagram on keepalived. They speak among each other and have one node host the VIP, 192.0.2.9 in this case.
---
title: keepalived on each control plane node
---
flowchart TD
subgraph control_plane[control plane nodes]
subgraph lab-cp1[lab-cp1 192.0.2.1]
keepalived1[keepalived vip 192.0.2.8]
end
subgraph lab-cp2[lab-cp2 192.0.2.2]
keepalived2[keepalived vip 192.0.2.8]
end
subgraph lab-cp3[lab-cp3 192.0.2.3]
keepalived3[keepalived vip 192.0.2.8]
end
end
Here is a diagram on haproxy. They will be configured to listen on port 8443 and pass on the traffic to any available kube-apiserver.
---
title: haproxy on each control plane node
---
flowchart TD
subgraph control_plane[control plane nodes]
subgraph lab-cp1[lab-cp1 192.0.2.1]
haproxy1[haproxy:8443] --- kubeapi1[kube-apiserver:6443]
end
subgraph lab-cp2[lab-cp2 192.0.2.2]
haproxy2[haproxy:8443] --- kubeapi2[kube-apiserver:6443]
end
subgraph lab-cp3[lab-cp3 192.0.2.3]
haproxy3[haproxy:8443] --- kubeapi3[kube-apiserver:6443]
end
end
haproxy1 --- kubeapi2
haproxy1 --- kubeapi3
haproxy2 --- kubeapi1
haproxy2 --- kubeapi3
haproxy3 --- kubeapi1
haproxy3 --- kubeapi2
Server List¶
Finally, here is the list of servers. The IP addresses listed here are for documentation purpose, and they will be consistent throughout the series as well as on git repository.
The last two nodes lab-ns1 and lab-ns2 are going to be the DNS servers.
hostname | ipaddr | role | os | cpu | memory | disk | hypervisor |
---|---|---|---|---|---|---|---|
lab-cp1 | 192.0.2.1 | kubernetes control plane | debian | 4 | 4GB | 64GB | hyper-v |
lab-cp2 | 192.0.2.2 | kubernetes control plane | rocky | 4 | 4GB | 64GB | proxmox |
lab-cp3 | 192.0.2.3 | kubernetes control plane | ubuntu | 4 | 4GB | 64GB | proxmox |
lab-worker1 | 192.0.2.4 | kubernetes worker node | debian | 4 | 4GB | 64GB | hyper-v |
lab-etcd1 | 192.0.2.5 | etcd node | debian | 2 | 4GB | 64GB | hyper-v |
lab-etcd2 | 192.0.2.6 | etcd node | debian | 2 | 4GB | 64GB | proxmox |
lab-etcd3 | 192.0.2.7 | etcd node | oracle | 2 | 4GB | 64GB | proxmox |
lab-ns1 | 192.0.2.16 | run dns server using docker | rhel | 1 | 2GB | 32GB | proxmox |
lab-ns2 | 192.0.2.17 | run dns server using docker | debian | 1 | 1GB | 10GB | hyper-v |
Project Preparation¶
Let me start with project directory preparation and introduction on following ansible components:
- ansible config
- ansible-galaxy collection
- ansible-vault
- ansible inventory and variables
Dedicated repository for this project¶
You can clone the public repository I prepared for this series to get started. I will cover what should be modified and what should be created anew on this project as we move on with the build.
Or you can start from scratch, and even without VCS.
Installing ansible on ansible master node¶
Ultimately, just install ansible whichever way you prefer, and many different ways are described in the official documentation here. The first way described in the official documentation is to use pipx
.
I use mise to manage different versions of different programming languages on my machine. I place .mise.toml
at the project root to specify which version of python to use, and then install poetry to manage packages to use on this project.
The getting started page shows different ways to install mise on different OS.
Here is what I did on debian to install mise.
# installation
curl https://mise.run | sh
# activation script in .bashrc as instructed by the mise installer
echo "eval \"\$(/home/$USER/.local/bin/mise activate bash)\"" >> ~/.bashrc
# reload
source .bashrc
# verify
mise doctor
Let's say the cloned repository "homelab-v3-k8s" is at ~/homelab-v3-k8s
. Continue on with the following steps to prepare python.
cd homelab-v3-k8s
# run "mise up" to install what's configured in the .mise.toml, [email protected] in this case
mise up
# mise warns about untrusted .mise.toml file
mise trust
# mise warns of about python venv activation
mise settings experimental=true
# mise warns that there is no virtual env created
python -m venv ~/homelab-v3-k8s/.venv
# .venv is listed in the .gitignore
And then I install poetry following this instruction.
The cloned repository already has the pyproject file poetry can use to install requirements.
Generating pyproject.toml using poetry from scratch¶
Following two commands is all you need to run if you want to setup an ansible project with no pyproject.toml file.
Preparing Ansible Configuration File¶
This is the ansible configuration file, ansible.cfg
.
[defaults]
inventory={{CWD}}/inventory/hosts.yml
vault_password_file={{CWD}}/.vault_pass
roles_path={{CWD}}/roles
collections_path={{CWD}}/collections
# callbacks_enabled = ansible.posix.profile_tasks
timeout = 60
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null
inventory
specifies where the ansible inventory file isvault_password_file
specifies where the ansible vault password file isroles_path
specifies where the ansible roles directory iscollections_path
specifies where ansible collections directory iscallbacks_enabled
is optional, and is something you can use to modify the output format when you run ansible plays, and commented-out, but you can do however you like and see how things changessh_args
can be used to set ssh options on how ansible master access ansible-managed machines using ssh- I'm ignoring ssh host keys because everything is on my homelab and is more convenient when I re-create VM using the same IP address
Sample Ansible Configuration File¶
You can generate one using ansible command with all available configuration by running ansible-config init --disabled -t all > ansible.cfg
, and see for yourself what's available.
Ansible Community Plugins and requirements.yml File¶
There are ansible community plugins I want to use, and here is the requirements file I prepare at the project root, requirements.yml
.
# ansible-galaxy collection install -r requirements.yml
collections:
- name: community.docker
# https://github.com/ansible-collections/community.docker
version: 4.0.1
- name: community.general
# https://github.com/ansible-collections/community.general
version: 10.0.1
- name: community.crypto
# https://github.com/ansible-collections/community.crypto
version: 2.22.3
- name: ansible.posix
# https://github.com/ansible-collections/ansible.posix
version: 1.6.2
- name: kubernetes.core
# https://github.com/ansible-collections/kubernetes.core
version: 5.1.0
Run ansible-galaxy collection install -r requirements.yml
to have ansible install plugins listed in the file.
Using Ansible Vault to Encrypt Data on the Project Repository¶
https://docs.ansible.com/ansible/latest/vault_guide/vault.html
Ansible Vault encrypts variables and files so you can protect sensitive content such as passwords or keys rather than leaving it visible as plaintext in playbooks or roles
Although I keep my things mainly on my self-hosted GitLab, I still do encrypt them to keep a good habit.
All you need to use ansible vault is .vault_pass
file which I specified in the ansible configuration file with one line of text to be used for encryption and decryption.
This file should not be on a git repository, so be sure to add it in the .gitignore
file.
# generating random 31 characters string used as ansible vault password
tr -dc '[:alnum:]' < /dev/urandom | head -c 31 > .vault_pass
# add it in the gitignore
echo ".vault_pass" >> .gitignore
Inventory and Variable¶
The list of ansible-managed machines are found in the inventory file, and the inventory file was specified in the ansible configuration file, inventory={inventory_file_path}
.
Variable is something you can use in combination with ansible plays. There will be a tons of example coming up throughout this series, but hope this example helps to get the idea.
server1
listed in the inventory file and variablerole=webserver
is set- there is an ansible playbook with tasks to install packages
- one task says it's for web servers (
when: role == "webserver"
) and the task is to install nginx using apt- there may be
server2
withrole=webserver
set, andserver3
withrole=database
, and this task runs onserver2
but not onserver3
- there may be
Inventory File¶
Here is the inventory/hosts.yml
ansible inventory file I am going to use on this blog series.
# prepare the directory
mkdir inventory
# hosts.yml file
cat >inventory/hosts.yml <<EOF
lab:
children:
lab_docker:
hosts:
lab-ns1: # debian on proxmox
lab-ns2: # debian on hyper-v
lab_kubernetes:
children:
lab_k8s_cp:
hosts:
lab-cp1: # debian on hyper-v
lab-cp2: # rocky on hyper-v
lab-cp3: # ubuntu on proxmox
lab_k8s_worker:
hosts:
lab-worker1: # debian on hyper-v
lab_etcd:
hosts:
lab-etcd1: # debian on hyper-v
lab-etcd2: # debian on proxmox
lab-etcd3: # oracle linux on proxmox
EOF
I wanted to have servers with mixed OS to practice writing ansible scripts, so here they are. And they are all running as VM on either Windows 11 Pro Hyper-V or Proxmox VE.
Here is the breakdown of the inventory:
- lab-ns1 and lab-ns2 under
lab_docker
group as hosts running services using docker - lab-cp1, lab-cp2, and lab-cp3 under
lab_k8s_cp
group as kubernetes ansible control plane nodes - lab-worker1 under
lab_k8s_worker
group as kubernetes worker node(s) - lab-etcd1, lab-etcd2, and lab-etcd3 under
lab_etcd
group as etcd nodes - examples
- limit ansible play target to
lab_etcd
to run the tasks on the three etcd nodes - limit ansible play target to
lab_kubernetes
to run the tasks to all nodes that are part of kubernetes
- limit ansible play target to
I actually have the top groups named lab
and hlv3
(for homelab version 3), and have sub-groups such as kubernetes
, docker
, and bastion
. Let's say I have a common ansible playbook to run apt upgrade
or dnf upgrade
. I can just run the playbook for all nodes on my homelab, but I can also specify certain group to run the package upgrade task. Also when I am testing things out, I would run the test playbook targeting only lab
group, and see how it goes, and run against hlv3
group when all is good.
Validating Hosts and Groups in the Ansible Inventory File¶
# list all hosts in inventory/hosts.yml file
ansible all --list-hosts -i inventory/hosts.yml
# -i option to specify inventory file can be omitted since it's configured in the configuration file
# list hosts in lab group
ansible lab --list-hosts
# and other groups
ansible lab_docker --list-hosts
ansible lab_kubernetes --list-hosts
ansible lab_k8s_cp --list-hosts
ansible lab_k8s_worker --list-hosts
ansible lab_etcd --list-hosts
Variable File¶
Variables can be set per host and per group.
There are different ways to set variables. In this project, I am placing vars.yml
(and vault.yml
) files in the directory named with the corresponding host or group.
In the example below, ./inventory/group_vars/lab_kubernetes/vars.yml
contains the variables that apply to hosts in "lab_kubernetes" group, and vault.yml
file sitting next to contains the variables in encrypted form. Likewise for ./inventory/host_vars/lab-etcd2/vars.yml
, it contains vars for this specific host "lab-etcd2".
.
|-.vault_pass
|-inventory
| |-group_vars
| | |-lab_kubernetes
| | | |-vars.yml
| | | |-vault.yml
| |-host_vars
| | |-lab-etcd2
| | | |-vars.yml
| | | |-vault.yml
| |-hosts.yml
|-ansible.cfg
Example vars.yml and vault.yml¶
I am going to next go over variable file and encrypted vault file for lab_kubernetes
group to see one example. Each file and variable will be covered later when it's actually used.
Here is the group variable file inventory/group_vars/lab_kubernetes/vars.yml
. The variables set here can be used in ansible play tasks. For example, kube_svc_cidr
can be used in a task to prepare kubeadm configuration file.
# group variable
hlv3_function: kubernetes
# kube-endpoint
kube_endpoint: "{{ vault_kube_endpoint }}"
kube_endpoint_vip: "{{ vault_kube_endpoint_vip }}"
kube_endpoint_port: "{{ vault_kube_endpoint_port }}"
kube_clustername: lab
kube_cluster_domain: lab.example.com
# kubeadm config
kube_svc_cidr: 10.96.0.0/16
kube_pod_cidr: 10.244.0.0/24
As you can see, the value of the variable kube_endpoint
is unknown by just looking at this file. It's pointing to vault_kube_endpoint
variable, and this can be stored in the encrypted vault inventory/group_vars/lab_kubernetes/vault.yml
.
Run ansible-vault create inventory/group_vars/lab_kubernetes/vault.yml
to create and edit the file. Create the file that look like below.
vault_kube_endpoint: kube-lab.example.com
vault_kube_endpoint_vip: 192.0.2.8
vault_kube_endpoint_port: 8443
If you take a look at this file, say, using cat, you can see the meaningless strings.
You can edit
and view
anytime by running ansible-vault edit inventory/group_vars/lab_kubernetes/vault.yml
and ansible-vault view inventory/group_vars/lab_kubernetes/vault.yml
.
First Playbook - bootstrap¶
I'd like run the first playbook to create a new user account with password-less ssh and sudo and dedicated for ansible master on each ansible-managed node. Once this is done, the following ansible plays will be executed by this specific ansible user.
This is how this plays out:
- the servers are running and I have the credentials to access them
- run the bootstrap playbook specifying the logon username and password (and/or ssh private key) for ansible to use to access the servers
- ansible play executes tasks:
- to check if sudo is installed
- to create the new account which ansible will use going forward
- to install sudo if missing and setup password-less sudo for the created ansible account
Name Resolution¶
Since I already do have DNS servers running on my homelab, I added records for lab-cp1 and other hosts there to make it accessible by name from the ansible master machine.
For the sake of this blog series, I am going to cover the steps to prepare DNS servers along the way. And until we have the DNS servers, I am going to use /etc/hosts
file to help with the name resolution on the ansible master until we get the DNS servers running.
# append the content of inventory/hosts-list.txt file to the /etc/hosts on the local, ansible master
sudo tee -a /etc/hosts < inventory/hosts-list.txt
Here is the inventory/hosts-list.txt
file.
# temporarily added on ansible master to help access lab nodes by name
192.0.2.1 lab-cp1.lab.example.net lab-cp1
192.0.2.2 lab-cp2.lab.example.net lab-cp2
192.0.2.3 lab-cp3.lab.example.net lab-cp3
192.0.2.4 lab-worker1.lab.example.net lab-worker1
192.0.2.5 lab-etcd1.lab.example.net lab-etcd1
192.0.2.6 lab-etcd2.lab.example.net lab-etcd2
192.0.2.7 lab-etcd3.lab.example.net lab-etcd3
192.0.2.16 lab-ns1.lab.example.net lab-ns1
192.0.2.17 lab-ns2.lab.example.net lab-ns2
The domain "lab.example.net" is there because that is the domain suffix ansible master uses along with the hostnames listed in the inventory file.
Let me also explain about the variables set in the inventory/group_vars/lab/vars.yml
file.
# environment
hlv3_environment: lab
# ansible remote access
ansible_user: "{{ vault_ansible_user }}"
ansible_username: "{{ vault_ansible_user }}" # required by bootstrap playbook
ansible_ssh_private_key_file: "{{ playbook_dir }}/playbooks/files/ssh/id_ed25519_ansible"
ansible_ssh_pubkey: "{{ playbook_dir }}/playbooks/files/ssh/id_ed25519_ansible.pub"
domain_suffix: "{{ vault_domain_suffix }}"
ansible_host: "{{ inventory_hostname_short }}.{{ domain_suffix }}"
This ansible_host
is actual target ansible tries to access, and it's the combination of inventory_hostname_short
which is the hostname in the inventory file and domain_suffix
which is defined in the same file (pointing to the encrypted vault_domain_suffix
) and is set to "lab.example.net".
For example, if you run the bootstrap playbook against lab_etcd
group, ansible runs the play against lab-etcd1.lab.example.net
, lab-etcd2.lab.example.net
, and lab-etcd3.lab.example.net
hosts and the local, ansible master needs to be able to resolve those names and access over tcpip.
Ansible User¶
There are also ansible_user
and ansible_username
variables set in this file. The former is the username ansible uses to access other hosts. Combining with ansible_host
and it's going to look like ssh ${ansible_user}@${ansible_host}
. In this blog series let's just assume the ansible username is ansiblemaster-lab
, and so the access to lab-etcd1 will be ssh [email protected]
.
Now the latter ansible_username
has the same vault_ansible_user
value. This ansible_username
is used in a task in the bootstrap playbook to create a new ansible account on the hosts onboarding. It's a bit confusing, but this bootstrap process is run before the ansible account is created and so ansible_user
value is overridden by the existing user account, thus a slightly different name is used for this variable just for the sake of the bootstrap tasks.
SSH Key for Ansible User¶
This ansible_ssh_private_key_file
is the ssh private key used when doing ssh [email protected]
. Its public key, ansible_ssh_pubkey
, is used in the bootstrap task to add it to the ~/.ssh/authorized_keys
list of the newly created ansible account.
So, let's generate this new ssh key pair.
# prepare playbooks dir
# and files/ssh directory to place ssh key pair used by ansible master
mkdir -p playbooks/files/ssh
cd playbooks/files/ssh
ssh-keygen -t ed25519 -f id_ed25519_ansible
# the ssh directory is listed in the gitignore list
# echo "playbooks/files/ssh" >> {prj_root}/.gitignore
Ansible ping-pong test¶
Let's say the username already available no the remote hosts is "happyansibleuser", you can run ansible ping-pong test to see if the ansible master can access target hosts.
# with "-k" option, ansible will prompt you to enter ssh password to use to logon as happyansibleuser on target hosts
ansible all -m ping -e ansible_user=happyansibleuser -k
If sshpass is not installed¶
In case if you get error messages like this, you can install sshpass
(sudo apt install sshpass
for example).
lab-etcd1 | FAILED! => {
"msg": "to use the 'ssh' connection type with passwords or pkcs11_provider, you must install the sshpass program"
}
Use existing ssh private key¶
If you are using ssh private key located at ~/.ssh/keyfile
to access the target hosts, you can run ping like this.
Run bootstrap playbook¶
# this will try ssh key logon and then password
# and use "happyansibleuser" username to connect
ansible-playbook playbooks/bootstrap.yml -e ansible_ssh_private_key_file=playbooks/files/ssh/id_ed25519_ansible -e ansible_user=happyansilbleuser -k -K
Verification¶
Once the play has been successfully executed, you can run the same playbook without any options and it works all good, and you can also run ping to confirm.
ansible-playbook playbooks/bootstrap.yml
ansible all -e ansible_ssh_private_key_file=playbooks/files/ssh/id_ed25519_ansible -m ping
Gather facts¶
I have prepared the playgook to gather facts and save the result as json file per each host on local ansible master.
# run the playbook
ansible-playbook playbooks/gather_facts.yml
# check the facts json files
ls playbooks/facts
cat playbooks/facts/lab-cp1.json
Setting up docker-ready nodes¶
These two hosts as mentioned earlier are going to run services using docker. I am going to install docker using ansible playbook.
Here are the tasks executed in brief:
- install ansible community.docker requirements on the target host using package manager such as apt and dnf
- packages to install are defined in
./roles/docker/defaults/main.yml
- packages to install are defined in
- uninstall existing docker packages and clean up docker directories
- also defined in
./roles/docker/defaults/main.yml
- also defined in
- install docker
- add official docker repository to the package manager
- install docker packages from the official repository
- version defined in
./roles/docker/defaults/main.yml
- version defined in
- lock the package version
- make sure the docker is enabled on systemd
- add ansible user to the docker group
- reboot
There is "test" tag to verify the installation by checking installed docker version and running hello-world.
Running DNS server using docker¶
The way I am doing this on my real homelab is to prepare a git repository including docker compose file and DNS configuration file, clone the repository on remote docker hosts and spin it up.
For the sake of this blog series, I have prepared a playbook to deploy similar DNS service.
Unbound DNS¶
The playbook uploads DNS configuration file and run DNS server.
The image I'm going to use is mvance/unbound
available on Docker Hub. Here are the links to the GitHub repository and Docker Hub.
https://github.com/MatthewVance/unbound-docker
https://hub.docker.com/r/mvance/unbound
Edit .roles/dns/templates/env.j2
file to change the image tag. It is set to "1.21.1" which is the latest as of the timing of writing. It uses Cloudflare DNS over TLS for the name resolution (root forwarder destinations).
Preparing DNS configuration file¶
Customize ./roles/dns/templates/a-records.j2
file to match the names and IP addresses for your actual homelab environment.
The file stored in the repository contains the hosts listed in the server list section using the example domain and example IP address for each host (example.net in rfc6761 and 192.0.2.0/24 TEST-NET-1 in rfc5737).
Also, note that there is a record for kubernetes apiserver virtual IP address, lab-kube-endpoint.lab.example.net.
. This will be the VIP for the highly-available kube-apiserver hosted by the three control plane nodes, lab-cp1, lab-cp2, and lab-cp3.
local-data: "lab-cp1.lab.example.net. IN A 192.0.2.1"
local-data: "lab-cp2.lab.example.net. IN A 192.0.2.2"
local-data: "lab-cp3.lab.example.net. IN A 192.0.2.3"
local-data: "lab-worker1.lab.example.net. IN A 192.0.2.4"
local-data: "lab-etcd1.lab.example.net. IN A 192.0.2.5"
local-data: "lab-etcd2.lab.example.net. IN A 192.0.2.6"
local-data: "lab-etcd3.lab.example.net. IN A 192.0.2.7"
local-data: "lab-ns1.lab.example.net. IN A 192.0.2.16"
local-data: "lab-ns2.lab.example.net. IN A 192.0.2.17"
local-data: "lab-kube-endpoint.lab.example.net. IN A 192.0.2.8"
Running DNS playbook¶
ansible-playbook playbooks/dns.yml # the untagged play will just display tags available on this playbook and exits
# start
# - stop if already running
# - upload files, verify docker compose, and pull image if missing
# - start the service
# - test name resolution from the localhost using dig command
ansible-playbook playbooks/dns.yml --tags start
# stop
ansible-playbook playbooks/dns.yml --tags stop
# enable/disable on systemd
ansible-playbook playbooks/dns.yml --tags enable
ansible-playbook playbooks/dns.yml --tags disable
Change nameservers settings¶
Once the DNS service is ready on lab-ns1 and lab-ns2, let us change the nameservers settings on each host to use the DNS service running on lab-ns1 and lab-ns2.
There are different variations and combinations of services to manage network related settings now. I barely managed to get things working at least for what I have running on my lab and actual homelab environment.
# to double check what you have
ansible-playbook playbooks/nameservers.yml --tags check
# to update the settings
ansible-playbook playbooks/nameservers.yml --tags update
The IP addresses of the nameservers are retrieved from the host
command ran locally, so it either comes from the /etc/hosts
file setup previously or your existing DNS server.
Here is the list of what is done in brief:
- identify what is running: networking, networkd (netplan), or NetworkManager
- get the IP addresses of lab-ns1 and lab-ns2 by looking up
- NetworkManager
- apply new IP4.DNS using nmcli
- if no IP4.DNS is present, modify
/etc/resolv.conf
- networking
- modify
/etc/resolv.conf
- modify
- networkd (netplan)
- identify netplan yaml file used and upload new templated one, then apply
Setting up kubernetes-ready nodes¶
Multiple items must be installed and configured to make a host kubernetes-ready node. I will run my kubernetes playbook to check the configuration and installed kubernetes component version, and also to install and configure whatever needed.
# check and generate a report in a markdown file
ansible-playbook playbooks/kubernetes.yml --tags check
# make a kubernetes-ready host
ansible-playbook playbooks/kubernetes.yml --tags prepare
Here is the list of things done in 'prepare' tagged tasks:
- disable swap memory
- enable ipv4 forwarding
- install and setup containerd, runc, and cni
- disable selinux and firewalld
- install kubeadm, kubelet, and kubectl
- install other necessary packages
Version of each component is defined in the defaults file at ./roles/kubernetes/defaults/main.yml
along with all the variables used in the playbook. The additional packages to install are also listed here, and the package installation tasks refer to this list in the defaults file.
The 'check' tasks will check above items and list them in tables. See ./playbooks/files/kubernetes/*.md
files after running --tags check
.
Now, the external etcd cluster nodes have a slightly different requirement than the kubernetes cluster nodes, but the same preparatory changes are made in this project.
Example markdown report on kube-ready nodes¶
# Kubernetes readiness check report for lab environment
## Host Summary
| hostname | product_uuid | swap | systemd swap unit | ipv4_forward | cgroup | selinux | firewalld |
| ----------- | ------------------------------------ | ---- | ----------------- | ----------------------- | --------- | ---------- | --------- |
| lab-cp1 | bd45cd73-1fd5-44b1-b66a-481b8100deb6 | 0 | none | net.ipv4.ip_forward = 1 | cgroup2fs | n/a | n/a |
| lab-cp2 | f4862e7e-49d3-4ae8-9c84-d4aada7ee01d | 0 | none | net.ipv4.ip_forward = 1 | cgroup2fs | Permissive | n/a |
| lab-cp3 | 86936edc-36f7-4730-bdbe-6ddbfc9a5226 | 0 | none | net.ipv4.ip_forward = 1 | cgroup2fs | n/a | n/a |
| lab-worker1 | 9b1b24d3-5d9b-4d11-92d6-6eb6dd8d4f7e | 0 | none | net.ipv4.ip_forward = 1 | cgroup2fs | Permissive | n/a |
## kubernetes Packages
| hostname | kubeadm | kubelet | kubectl |
| ----------- | ------- | ------- | ------- |
| lab-cp1 | v1.32.2 | v1.32.2 | v1.32.2 |
| lab-cp2 | v1.32.2 | v1.32.2 | v1.32.2 |
| lab-cp3 | v1.32.2 | v1.32.2 | v1.32.2 |
| lab-worker1 | v1.32.2 | v1.32.2 | v1.32.2 |
## Dependencies
| hostname | containerd | runc | cni |
| ----------- | ---------- | ----- | ------ |
| lab-cp1 | v2.0.2 | 1.2.5 | v1.6.0 |
| lab-cp2 | v2.0.2 | 1.2.5 | v1.6.0 |
| lab-cp3 | v2.0.2 | 1.2.5 | v1.6.0 |
| lab-worker1 | v2.0.2 | 1.2.5 | v1.6.0 |
Setting up etcd cluster¶
Here are the tasks to setup etcd cluster:
- configure kubelet to manage etcd service
- generate etcd CA certs on one node and copy them over to the other etcd nodes
- use the common CA certs to generate other certs required by etcd cluster members and also kubernetes control plane nodes
- the certs copy will be downloaded to the ansible master and will be used later when setting up kubernetes cluster
- generate static pod manifest to run etcd and form a new etcd cluster
# spin up a new etcd cluster
ansible-playbook playbooks/etcd.yml --tags cluster
# run health checks and display results
ansible-playbook playbooks/etcd.yml --tags healthcheck
Setting up kubernetes cluster¶
In the previous steps, all the nodes were made kubernetes-ready and the etcd cluster kubernetes control planes are going to use was built. Finally in this step, I am going to setup a new kubernetes cluster.
Here is the list of tasks:
- prepare kubeadm configuration file
- prepare configuration files and static pod manifests to setup highly-available kube-apiservers
- keepalived to setup VIP with health check monitoring kube-apiserver availability on localhost
- haproxy to setup kube-apiserver loadbalancer with health check monitoring kube-apiserver on all control plane nodes
- upload certs generated by etcd to control plane nodes
- spin up the kubernetes cluster
- initiate the cluster on one control plane node
- copy necessary certs to the other control plan enodes
- join the other control plane nodes to the cluster
- join worker nodes to the cluster
kubeadm configuration for control plane nodes¶
The custom kubernetes cluster configuration can be set using the kubeadm config file. The same was actually done when generating etcd cluster certs in the previous step by providing custom etcd service details and generating certs based on that configuration.
The two important customization to be done here are to use external etcd cluster and to change the kube-apiserver endpoint.
The official documentation on this is here.
etcd configuration in kubeadmcfg¶
Below is etcd section of kubeadm config file, and is in jinja2 template.
The control plane nodes on the kubernetes cluster know that the etcd service is available on the external etcd cluster with endpoints on three etcd nodes listening on port 2379, and that the specified certs must be used to access these external etcd service.
etcd:
external:
endpoints:
{% for etcd_ipaddr in lst_etcdipaddr %}
- https://{{ etcd_ipaddr }}:2379
{% endfor %}
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
Note that the etcd node IP address list in lst_etcdipaddr
gets populated from the actual running systems.
kube-endpoint configuration in kubeadmcfg¶
The second key customization is the kube-endpoint. Admins, operators, kubernetes cluster components, external tools, and etc. uses kube-apiserver to communicate with the kubernetes cluster, and the endpoint is the destination where they send their communications to.
In case of the basic kubernetes cluster described in the beginning of this blog series, the kube-apiserver was present on a single control plane node, in which case the endpoint was the same as the kube-apiserver listening on the control plane.
When you have multiple kube-apiservers (control plane nodes) running in a kubernetes cluster, that is when you should point the kube-endpoint to somewhere other than the actual kube-apiserver listening on the control plane node.
---
title: flow from client to endpoint to apiserver
---
flowchart LR
client[operators<br/>k8s components<br/>external tools] --> haproxy[kube-endpoint + loadbalancer<br/>lab-kube-endpoint.lab.example.net:8443] --> kubeapi[any cp node running healthy kube-apiserver<br/>cp-node:6443]
In this blog series, the endpoint is "lab-kube-endpoint.lab.example.net:8443" and its IP address is 192.0.2.8. The endpoint config portion in the kubeadmcfg is as shown below.
These variables are found in the defaults file at ./roles/kubernetes/defaults/main.yml
. When running the playbook in the actual environment, update the variables here or in group inventory variables file such as ./inventory/group_vars/lab_kubernetes/vars.yml
.
Customizing kube-endpoint is about pointing the destination for kube-apiservers access to a loadbalancer. The loadbalancer setup details will be covered shortly.
Rest of the settings in kubeadmcfg¶
Before moving on to the loadbalancer setup, let me explain briefly about the other customization done in the kubeadmcfg file.
Here is the entire kubeadmcfg jinja2 template, the kubernetes cluster configuration.
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "{{ kube_endpoint }}:{{ kube_endpoint_port }}"
clusterName: {{ kube_clustername }}
networking:
dnsDomain: {{ kube_cluster_domain }}
podSubnet: {{ kube_pod_cidr }}
serviceSubnet: {{ kube_svc_cidr }}
etcd:
external:
endpoints:
{% for etcd_ipaddr in lst_etcdipaddr %}
- https://{{ etcd_ipaddr }}:2379
{% endfor %}
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
controllerManager:
extraArgs:
- name: "allocate-node-cidrs"
value: "true"
- name: "cluster-cidr"
value: "{{ kube_cluster_cidr }}"
proxy:
disabled: true
I will be using Cilium as the network add-on for this kubernetes cluster. Different network add-on has different cidr settings and other requirements. To do what I wanted to do with Cilium, I disabled the kube-proxy and turned on "allocate-node-cidrs" and set the cidr range for the controller manager.
The minimum kubeadmcfg customizations required for the highly-available cluster are the endpoint and etcd. The official documentation on this with sample config file is in the link below.
highly-available kube-apiserver preparation¶
The next topic is about setting up highly-available kube-apiserver using loadbalancer. There are different ways of implementing it, and the details are described in the official document especially in the following links.
In this project, I will be running keepalived and haproxy as static pods.
The keepalived service provides a virtual IP managed by a configurable health check
The haproxy service can be configured for simple stream-based load balancing thus allowing TLS termination to be handled by the API Server instances behind it
VIP - keepalived¶
Keepalived service running on each control plane nodes negotiates among each other to decide which one takes the VIP. In addition to the simple service availability check for the keepalived itself, the kube-apiserver health check on localhost:6443 is taken into account. So if lab-cp2 is up and keepalived service is running, but kube-apiserver is not, then the keepalived on lab-cp2 decides that it won't take the VIP.
loadbalancer - haproxy¶
So the keepalived is about hosting the VIP. This haproxy is about listening on port 8443, to receive the request traffic and loadbalance it to any one of the available kube-apiservers. Haproxy will also be doing its own health check against kube-apiserver on all control plane nodes to make sure that it passes on the request trafifc to a healthy kube-apiserver.
Preparing certs files¶
In a basic setup with the stacked etcd design, there will be cluster initialization phases to setup etcd and also generate certs that enables legitimate communication between the control plane and etcd service. Since the etcd was built independently, the certs required by the control plane are also generated independent to the kubernetes cluster initialization phases.
These certs were already generated and downloaded to the ansible master as a part of the tasks to setup the etcd cluster.
./playbooks/files/etcd-ca/ca.crt
./playbooks/files/etcd-certs/apiserver-etcd-client.crt
./playbooks/files/etcd-certs/apiserver-etcd-client.key
spin up the kubernetes cluster¶
Here is the same list mentioned in the beginning of this section. All of what's explained above will be done by this playbook.
- prepare kubeadm configuration file
- prepare configuration files and static pod manifests to setup highly-available kube-apiservers
- keepalived to setup VIP with health check monitoring kube-apiserver availability on localhost
- haproxy to setup kube-apiserver loadbalancer with health check monitoring kube-apiserver on all control plane nodes
- upload certs generated by etcd to control plane nodes
- spin up the kubernetes cluster
- initiate the cluster on one control plane node
- copy necessary certs to the other control plan enodes
- join the other control plane nodes to the cluster
- join worker nodes to the cluster
# init kubernetes cluster and join all available control plane nodes
ansible-playbook playbooks/kubernetes.yml --tag cluster
# join all available worker nodes to the kubernetes cluster
ansible-playbook playbooks/kubernetes.yml --tag worker
The kubeadm init
command run to spin up the cluster on the first control plane node actually prints a lot of logs. The outputs will be saved in files ./playbooks/files/kubernetes/kubeadm.log
and ./playbooks/files/kubernetes/kubeadm.err
.
The kubectl get nodes -o wide
output is also taken and saved in the file ./playbooks/files/kubernetes/kubectl_get_nodes.txt
.
Install network-addon - Cilium¶
As you may have seen it in the kubectl get nodes
output, all the nodes are shown as "not ready". The next thing you need to install is a network add-on.
The quick installation steps would be to install cilium
cli and use it to install cilium on the cluster, but we cannot go this path this time.
As briefly mentioned, there are features I wanted to try, and different feature had different requirements to use them. I have customized the cluster when configuring it through kubeadmcfg cluster configuration file. Some of the customizations were to have external etcd cluster with highly-available kube-apiservers. Others were for these items in the following links.
https://docs.cilium.io/en/stable/installation/k8s-install-external-etcd/#requirements
https://docs.cilium.io/en/stable/network/l2-announcements/#prerequisites
https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/gateway-api/#prerequisites
And so the cilium installation must be customized as well. I chose to do this using helm as I can make the customization on the "values" file, and document and track the changes on VCS. It's GitOps, almost. I mean, GitOps for the cluster cannot be setup until the cluster is functional with network add-on installed, but the point is that I can record the changes made in a GitOps repository.
You need to work on a host that you use to operate the kubernetes cluster. That host may be the same ansible master host or one of the control plane node. Let's just go with the control plane this time.
Here is the list of tasks:
- install helm
- identify the cilium version to use
- download the values file of the cilium chart on the version you are going to install
- edit the values file
- install the helm chart
Here is the list of commands executed:
# on one of the control plane node
# install helm
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
# add cilium repository on helm
helm repo add cilium https://helm.cilium.io/
# confirm the latest version of cilium
helm search repo cilium
helm search repo cilium -l # to see all available versions
# download the values file for version 1.17.1
helm show values cilium/cilium --version 1.17.1 > values.yaml
# edit the values file
# create secret for cilium containing etcd cert files
sudo cp /etc/kubernetes/pki/etcd/ca.crt .
sudo cp /etc/kubernetes/pki/apiserver-etcd-client.crt client.crt
sudo cp /etc/kubernetes/pki/apiserver-etcd-client.key client.key
sudo chown $USER:$USER *.crt
sudo chown $USER:$USER *.key
kubectl create secret generic -n kube-system cilium-etcd-secrets \
--from-file=etcd-client-ca.crt=ca.crt \
--from-file=etcd-client.key=client.key \
--from-file=etcd-client.crt=client.crt
sudo rm *.crt *.key
# install
helm install cilium cilium/cilium --version 1.17.1 --values values.yaml -n kube-system
# it took a little less than 20 minutes until everything was up and running
# for a cluster composed of VMs running on personal-use Proxmox and Hyper-V
Here is the list of changes made to the cilium values file, and the entire file is stored at ./playbooks/files/cilium/values.yaml
:
- k8sServiceHost: lab-kube-endpoint.lab.example.net
- k8sServicePort: "8443"
- k8sClientRateLimit.qps: 33
- k8sClientRateLimit.burst: 50
- kubeProxyReplacement: "true"
- kubeProxyReplacementHealthzBindAddr: "0.0.0.0:10256"
- l2announcements.enabled: true
- l2announcements.leaseDuration: 3s
- l2announcements.leaseRenewDeadline: 1s
- l2announcements.leaseRetryPeriod: 200ms
- externalIPs.enabled: true
- gatewayAPI.enabled: true
- etcd.enabled: true
- etcd.ssl: true
- etcd.endpoints: ["https://192.0.2.5:2379", "https://192.0.2.6:2379", "https://192.0.2.7:2379"]
- hubble.ui.enabled: true
- hubble.relay.enabled: true
- hubble.peerService.clusterDomain: lab.example.net
Demo¶
The kubernetes cluster is now functional with network add-on installed. Let me do a demo on name lookups inside the cluster, and then some more demos using cilium features.
Name lookups¶
Let's create a temporary namespace named "test" and create a pod there.
# again on any one of the control plane node...
# create test namespace
kubectl create namespace test
# add dnsutils pod in the test namespace
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: test
spec:
containers:
- name: dnsutils
image: registry.k8s.io/e2e-test-images/agnhost:2.39
command:
- sleep
- "infinity"
imagePullPolicy: IfNotPresent
restartPolicy: Always
EOF
You can execute commands on the test pod using kubectl exec
. You can see from the logs below that:
- it's configured to use nameserver at 10.96.0.10
- 10.96.0.10 is the IP address of kube-dns service in kube-system namespace
- you can lookup {service-name}.{namespace}.svc.{cluster-domain} for available service in the kubernetes cluster
- for example, kube-dns.kube-system.svc.lab.example.net.
- for example, kubernetes.default.svc.lab.example.net.
- for example, hubble-ui.kube-system.svc.lab.example.net.
- workloads in "test" namespace has search suffix "test.svc.lab.example.net"
- if there is service "web" in this test namespace, "web.test.svc.lab.example.net" is the available destination for this "web" service
- thanks to the search suffix list, workloads in the same "test" namespace can access merely by this name "web"
$ kubectl exec -t pod/dnsutils -n test -- cat /etc/resolv.conf
search test.svc.lab.example.net svc.lab.example.net lab.example.net
nameserver 10.96.0.10
options ndots:5
$ kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 19h
kube-system cilium-envoy ClusterIP None <none> 9964/TCP 19h
kube-system hubble-peer ClusterIP 10.96.128.27 <none> 443/TCP 19h
kube-system hubble-relay ClusterIP 10.96.155.61 <none> 80/TCP 17h
kube-system hubble-ui ClusterIP 10.96.227.21 <none> 80/TCP 17h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 19h
$ kubectl exec -t pod/dnsutils -n test -- dig kubernetes.default +search +noall +answer
kubernetes.default.svc.lab.example.net. 30 IN A 10.96.0.1
$ kubectl exec -t pod/dnsutils -n test -- dig kube-dns.kube-system +search +noall +answer
kube-dns.kube-system.svc.lab.example.net. 30 IN A 10.96.0.10
$ kubectl exec -t pod/dnsutils -n test -- dig hubble-ui.kube-system +search +noall +answer +stats
hubble-ui.kube-system.svc.lab.example.net. 30 IN A 10.96.227.21
;; Query time: 1 msec
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Fri Feb 28 01:02:50 UTC 2025
;; MSG SIZE rcvd: 139
Cleaning up the test namespace¶
Nothing complex was created on this namespace, so to clean this up you can just delete the namespace, and the pod will also be gone.
Cilium L2Advertisement¶
All these 10.96.*.*
IP addresses shown as the service IP address are not accessible from outside the kubernetes cluster. One of the solutions to this is by using layer 2 advertisement feature. Here I am going to use the hubble-ui service that I enabled in the cilium helm chart to demonstrate this.
I first look for the appropriate label to use to identify the hubble-ui pods.
# looking at the defined labels on the hubble-ui deployment
$ kubectl get deploy hubble-ui -n kube-system -o jsonpath='{.spec.template.metadata.labels}'
{"app.kubernetes.io/name":"hubble-ui","app.kubernetes.io/part-of":"cilium","k8s-app":"hubble-ui"}o
# double check that the label works
$ kubectl get pods -l 'k8s-app=hubble-ui' -n kube-system
NAME READY STATUS RESTARTS AGE
hubble-ui-68bb47466-6gkwb 2/2 Running 0 100m
I will then create another service but with loadbalancer type for the hubble-ui.
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: l2-hubble-ui
namespace: kube-system
labels:
app.kubernetes.io/name: l2-hubble-ui
spec:
type: LoadBalancer
ports:
- port: 80
protocol: TCP
targetPort: 8081
selector:
k8s-app: hubble-ui
EOF
The new service is created to access the same hubble-ui pods as the existing "hubble-ui" service.
# the created service with "pending" external IP address allocation
$ kubectl get svc l2-hubble-ui -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
l2-hubble-ui LoadBalancer 10.96.177.88 <pending> 80:32442/TCP 23s
Now I create a cilium IP pool for the created "l2-hubble-ui" service.
cat <<EOF | kubectl apply -f -
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "ippool-hubble-ui"
spec:
blocks:
- start: "192.0.2.24"
stop: "192.0.2.24"
serviceSelector:
matchExpressions:
- { key: app.kubernetes.io/name, operator: In, values: [l2-hubble-ui] }
EOF
Now the external IP address gets assigned to the service.
$ kubectl get svc l2-hubble-ui -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
l2-hubble-ui LoadBalancer 10.96.177.88 192.0.2.24 80:32442/TCP 5m48s
YES! Now, is it reachable? Not yet, as no one is advertising on the LAN that this IP address is in use and available. So next, the l2 announcement policy needs to be created.
cat <<EOF | kubectl apply -f -
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: l2-hubble-ui
spec:
serviceSelector:
matchLabels:
app.kubernetes.io/name: l2-hubble-ui
interfaces:
- ^eth[0-9]+
- ^eno[0-9]+
- ^enp[0-9]s[0-9]+
loadBalancerIPs: true
EOF
Now the IP address gets advertised on LAN, and I can connect to the hubble-ui from a web-browser on other machines on my home LAN.
Hubble UI¶
https://github.com/cilium/hubble-ui
Observability & Troubleshooting for Kubernetes Services
Since this is the tool to see something going on in the cluster, we want something running. There is a cilium post-installation test programs available which is introduced in the installation document. Let's go ahead and use this.
https://docs.cilium.io/en/latest/installation/k8s-install-helm/#validate-the-installation
It is as simple as spinning up the name lookup test pod, to create a namespace, do kubectl apply
, and delete the namespace to clean up all.
# create the namespace cilium-test
kubectl create ns cilium-test
# run the connecitvity check pods in the cilium-test namespace
kubectl apply -n cilium-test -f https://raw.githubusercontent.com/cilium/cilium/1.17.1/examples/kubernetes/connectivity-check/connectivity-check.yaml
# clean up
kubectl delete ns cilium-test
Here is the screen capture of the hubble UI for the cilium-test namespace.
TODO: attach the image here