k8s with HA kube-apiserver and external etcd cluster
Table of Content
k8s with HA kube-apiserver and external etcd cluster¶
This page is on building kubernetes cluster with loadbalancer for kube-apiserver and external etcd cluster.
references to official documents¶
The HA (highly available kube-apiserver using loadbalancer) is described in the official kubernetes documentation here.
The external etcd topology is explained in the official kubernetes document here.
ToC¶
- preparing machines
- setting up kube-ready nodes
- building etcd cluster
- preparing loadbalancer for kube-apiserver on control plane nodes
- initializing kubernetes cluster
- demo
Preparing Machines¶
I prepared physical and virtual machines all on AMD64 with mixed OS just to practice composing ansible playbook capable of handling different kinds of machines.
Baremetal Servers¶
As for baremetals, it is just a matter of getting the minimal OS installer image on USB, stick it to the machines, and follow the installation wizard to install OS with minimal system utilities and ssh server. I won't go into details on this.
Virtual Servers¶
As for virtual servers, I decided to build main virtual machines on Proxmox, and also did some testing on Hyper-V using my gaming desktop machine running Windows 11 Pro.
Building a virtual machine on Proxmox web GUI is as straightforward as installing an OS on baremetal machines. Here I leave notes on building virtual machines using templated cloud-init images on Proxmox.
Rocky 9 on Proxmox Using Cloud-init Image¶
Here is what's done below:
- download cloud-init image and verify it
- prepare ssh public key to import on the new machine at
~/.ssh/authorized_keys
- create a virtual machine using the image and set basic parameters such as CPU, memory, architecture, and NIC
- convert the machine into a template
- clone the template to build the actual machine to run
- set cloud-init parameters such as IP address, username, and ssh public key to allow access
- start the image
# on proxmox hypervisor
# prepare the directory to place downloaded cloud-init image files
mkdir -p /opt/pve/dnld
# download and verify the file
cd /opt/pve/dnld
wget https://dl.rockylinux.org/pub/rocky/9/images/x86_64/Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2
wget https://dl.rockylinux.org/pub/rocky/9/images/x86_64/Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2.CHECKSUM
sha256sum -c Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2.CHECKSUM
qemu-img info Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2
# prepare ansible user ssh public key
mkdir /opt/pve/ssh_pub
cd /opt/pve/ssh_pub
# get the ssh public key from wherever and place it in the created directory
# or generate one
# environment variables
path_ciimg=/opt/pve/dnld/Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2
path_ssh_public_key=/opt/pve/ssh_pub/id_ed25519.pub
ciusername=username_here
# create a VM using the downloaded cloud-init image
qm create 9000 --memory 2048 --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-single --cpu cputype=x86-64-v3 --ostype l26
qm set 9000 --scsi0 local-zfs:0,import-from=$path_ciimg
qm set 9000 --ide2 local-zfs:cloudinit
qm set 9000 --boot order=scsi0
# template it
qm template 9000
# check what's created
qm config 9000
qm showcmd 9000
# create a new VM using the template
qm clone 9000 1308 --name name_of_this_rocky9_vm
qm set 1308 --sshkey $path_ssh_public_key
qm set 1308 --ipconfig0 ip=192.168.111.30/24,gw=192.168.111.1
qm set 1308 --ciuser $ciusername
While many of the steps are identical to what's described in the official document, here are some additional notes on minor deviations:
- storage is
local-zfs
instead oflocal
just because I prepared a different storage namedlocal-zfs
- cputype is
x86-64-v3
, and I referred to these links to choose one for my machine running Proxmox - resizing disk
- resized disk size of the downloaded qcow2 image itself for most of the OS's I used, and the final cloned virtual machine had all the disk space automatically recognized and mounted
# qemu-img resize debian-12-generic-amd64.qcow2 64G
- for some I did disk resize using
qm
command like following, and in which case I had to partitioning,pvcreate
,vgextend
,lvextend
,xfs_growfs
, and the likesqm disk resize 1312 scsi0 128G
- obviously changing the disk size on the original cloud-init image is easier
- resized disk size of the downloaded qcow2 image itself for most of the OS's I used, and the final cloned virtual machine had all the disk space automatically recognized and mounted
Setting Up Kube-Ready Nodes Using Ansible¶
Here is the list of machines I prepared.
One of the goal for rebuilding my homelab is to be able to compose whatever ansible automation I want, and here I am with numbers of machines with mixed OS. I ran ansible-playbook
to make whatever changes required to make these nodes kubernetes-ready along with other machines not listed here with roles to run services using docker or to serve as jump host.
hostname | role | os | cpu | memory | disk (+additional disk) | machine |
---|---|---|---|---|---|---|
cp1 | k8s control plane | debian | 4 | 8GB | 128GB ssd | baremetal |
cp2 | k8s control plane | rocky | 4 | 6GB | 128GB | vm on proxmox |
worker1 | k8s worker node | debian | 4 | 16GB | 128GB ssd +6TB | baremetal |
worker2 | k8s worker node | rhel | 4 | 6GB | 128GB +200GB | vm on proxmox |
worker3 | k8s worker node | debian | 2 | 8GB | 128GB ssd +500GB | baremetal |
worker4 | k8s worker node | debian | 16 | 16GB | 512GB ssd | baremetal |
worker5 | k8s worker node | ubuntu | 4 | 6GB | 128GB +200GB | vm on proxmox |
etcd1 | etcd node | debian | 4 | 8GB | 64GB ssd | baremetal |
etcd2 | etcd node | debian | 2 | 4GB | 64GB | vm on proxmox |
etcd3 | etcd node | oracle | 2 | 4GB | 64GB | vm on proxmox |
And here is the list of tasks executed using Ansible on all of the nodes listed above. #TODO Link to the repository will be for each tasks when I made them available on a public git repository.
- bootstrap
- to create user account for ansible master and enable password-less sudo
- skipped for machines created using cloud-init images as the ssh key and username I set during the build were for ansible master
- posture check
- sshd hardening
- ordinary package upgrades
- install certain packages such as nfs which I wanted in common among different machine roles on my homelab
- configuration changes and packages installation
- disable swap
- set ipv4 forward
- install and setup containerd, runc, and cni
- disable selinux
- disable firewalld
- install kubernetes packages such as kubeadm, kubelet, and kubectl
Building Etcd Cluster¶
I have followed what's described in the official document on setting up the etcd cluster.
Here is what's done in brief:
- configure kubelet through systemd to have kubelet look for static pod manifest files and run them
- prepare kubeadm config file with etcd nodes details for each etcd node that will form the etcd cluster
- in my case, three different kubeadm config file for etcd1, etcd2, and etcd3
- generate CA certificate and key for etcd using
kubeadm
command - generate certificates and keys for different communications among etcd and kubernetes components on each etcd node
- generate static pod manifest that runs etcd on each etcd node
- kubelet now picks up the generated manifest and spins up etcd service on each node
- install
etcdctl
command - run
etcdctl
command to verify that etcd is working fine on each etcd node
TODO: Here is the link (to be added) to the repo containing ansible tasks
One difference from the steps described in the official document (link added in the beginning of the section) and the ansible tasks is that in the former steps a single etcd node prepares all the certificates, keys, and configuration files, and then distributes the products to other etcd nodes, and the latter the generation of files is done on each etcd node that actually uses them. The exception is of course that the etcd CA key is generated in the beginning on one etcd node and then shared to other etcd nodes so that all etcd nodes work under the common CA.
Verification Output of Etcd¶
PLACEHOLDER
TODO place example output of ss -tlpn
, crictl ps
, and etcdctl endpoint health
¶
Preparing Loadbalancer for Kube-apiserver on Control Plane Nodes¶
I have two control plane nodes prepared on my homelab environment, and this section covers the preparation steps to setup high-available control plane (kube-apiserver) using haproxy and keepalived running as static pods on the two control plane nodes.
The Kubernetes official documentation on setting up kubernetes cluster using external etcd cluster is available here. The documentation on high-available control plane is available here, and the actual details on how to setup the loadbalancer is available here.
Here is the brief steps:
- copy etcd CA certificate to the control plane nodes
- copy client certificate and key for kube-apiserver to communicate with etcd cluster to the control plane nodes
- prepare kubeadm config file on one control plane node
- configured to use the external etcd cluster using the certificates and key files prepared in the previous steps
- the control plane endpoint configured to point to the VIP and port to be served by the loadbalancer prepared in the following steps
- prepare a keepalived configuration file, check script which keepalived uses to monitor health of the kube-apiserver (control plane service), and haproxy configuration file
- prepare static pod manifest files for keepalived and haproxy
initializing Kubernetes Cluster¶
Run kubeadm init --config {kubeadm config file}
on a control plane node to setup kubernetes cluster.
It will generate certificate and key files used by control plane node. These files are copied to the second control plane node as part of the ansible playbook tasks, and then kubeadm join
will be executed on the second control plane node to join the created kubernetes cluster.
The similar kubeadm join
command will be executed on the rest of the worker nodes to join the cluster.
Now I have the new kubernetes cluster composed of two control plane nodes with loadbalancer running for kube-apiserver access (for example, whatever you run with kubectl
gets there), worker nodes, and the external etcd cluster of three etcd nodes running independently from the kubernetes cluster members.
demo¶
I'll continue on a little bit more on what I'd do after initializing the kubernetes cluster.
Managing Cluster from Other Machines¶
I could logon to the control plane nodes to manage the cluster but I rather do so without logging onto the control plane.
I copy the /etc/kubernetes/admin.conf
file to ~/.kube/config
on the same machines I run ansible playbook. This way I can execute bulk, scripted ansible changes and also manage the kubernetes cluster from the same machine.
By the way, if you have multiple clusters with respective /etc/kubernetes/admin.conf
files to use, I would prepare aliases like below in ~/.bashrc
, ~/.bash_aliases
, or any file that works to switch the cluster to work on.
# kubectl unset (default, ~/.kube/config)
alias kcu="export KUBECONFIG="
# set to manage different k8s clusters
alias kchv="export KUBECONFIG=~/.kube/hv-config"
alias kchlv3="export KUBECONFIG=~/.kube/hlv3-config"
alias kclab="export KUBECONFIG=~/.kube/lab-config"
Installing Network Addon¶
The first component to install is the network addon. There are numbers of options as listed here.
This time I am trying out Cilium, and I'm installing it using helm.
- add helm repo
- run helm search and identify the version
- download the values file of the chart locally
- analyze and edit the values file as necessary
- install the helm chart
- specify the modified values file to use if any changes were made with
--values {modified_values_file.yaml}
- specify the modified values file to use if any changes were made with
# helm installation) https://helm.sh/docs/intro/install/
# add cilium repository
helm repo add cilium https://helm.cilium.io/
# confirm the latest version
helm search repo cilium
# helm search repo cilium -l # to see the list of older versions available
# download the values file
helm show values cilium/cilium --version 1.16.6 > cilium-values.yaml
# edit the values file as necessary
# install
helm install cilium cilium/cilium --version 1.16.6 --values cilium-values.yaml
# monitor cilium pods and wait until all are up
kubectl get pods -n kube-system
# confirmation using helm commands
helm status cilium -n kube-system
helm get values cilium -n kube-system
# upgrade the existing release
# with revised values file when there's any change
helm upgrade cilium cilium/cilium -f cilium-values.yaml -n kube-system
# rollout restart for cilium deamonsets and deployments
# when necessary after helm release upgrade
kubectl -n kube-system rollout restart deployment/cilium-operator
kubectl -n kube-system rollout restart ds/cilium
# uninstall the helm release "cilium"
helm uninstall cilium -n kube-system
Cilium Values File¶
Here is the list of changes made on cilium chart 1.16.6:
- k8sServiceHost: {kube_endpoint_ip_or_fqdn}
- k8sServicePort: "8443"
- kubeProxyReplacement: "true"
- kubeProxyReplacementHealthzBindAddr: "0.0.0.0:10256"
- l2announcements.enabled: true
- externalIPs.enabled: true
- hubble.relay.enabled: true
- hubble.ui.enabled: true
- gatewayAPI.enabled: true
Update CoreDNS Configuration¶
As the next step, I'd like to make changes on CoreDNS configuration. Two changes I want to make are:
- to point root forwarder to the two DNS servers running on my homelab outside the kubernetes cluster
- to add the cluster DNS domain name to the search suffix list
The changes can be made by preparing modified configmap manifest for the coredns and applying it.
# retrieve the configmap
kubectl get configmap coredns -n kube-system -o yaml > configmap-coredns-original.yaml
cp configmap-coredns-original.yaml configmap-coredns-custom-domain.yaml
# edit the configmap and then apply it
kubectl replace -f configmap-coredns-custom-domain.yaml
# restart coredns deployment to recreate coredns pods with the updated configmap
kubectl -n kube-system rollout restart deployment coredns
Here is the jinja2 template of the coredns configmap file with variables to add domain suffix with kube_cluster_domain
variable, and the two forwarders with nameservers[0]
and nameservers[1]
variables. I actually ran ansible playbook to make above changes.
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes {{ kube_cluster_domain }} cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . {{ nameservers[0] }} {{ nameservers[1] }} {
health_check 5s
max_concurrent 1000
}
cache 30
loop
reload
loadbalance
}
Pod Security Administration¶
Applying pod security standards to the cluster is one of the things listed in the best practice section of the document after kubeadm section.
https://kubernetes.io/docs/setup/best-practices/enforcing-pod-security-standards/
Let me just apply non-enforcing, baseline policy as a starter.
kubectl label --overwrite ns --all \
pod-security.kubernetes.io/audit=baseline \
pod-security.kubernetes.io/warn=baseline
Test Namespace and Pods¶
The kubernetes cluster becomes functional after installing network addon, and I also have made necessary changes on CoreDNS in the cluster and am ready to test running something in the cluster.
- create test namespace
- create test deployment
- get inside the pods spun up and see how they look
# create test namespace
kubectl create namespace test
# add dnsutils pod in the test namespace
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: test
spec:
containers:
- name: dnsutils
image: registry.k8s.io/e2e-test-images/agnhost:2.39
command:
- sleep
- "infinity"
imagePullPolicy: IfNotPresent
restartPolicy: Always
EOF
# view /etc/resolv.conf file
kubectl exec -t pod/dnsutils -n test -- cat /etc/resolv.conf
# internal DNS records using suffix list
kubectl exec -t pod/dnsutils -n test -- dig +search cp1
kubectl exec -t pod/dnsutils -n test -- dig +search worker1
# external records
kubectl exec -t pod/dnsutils -n test -- dig google.com.
kubectl exec -t pod/dnsutils -n test -- dig cloudflare.com.
# clean up
kubectl delete namespace test
- more test using cilium, https://sreake.com/blog/learn-about-cilium-l2-announcement/
- [x] done in cilium_demo, 2025-01-24
- [x] update qps and burst value to 8 and 16 respectively and upgrade cilium hr
- [x] deploy web server
- [x] create svc with loadbalancer
- [x] add ipam
- [x] add l2advertisement
- gateway examples, https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/http/
- [x] done in cilium_demo, 2025-01-24
- TODO: cilium gateway and cert-manager, https://docs.cilium.io/en/stable/network/servicemesh/gateway-api/https/