Skip to content



k8s with HA kube-apiserver and external etcd cluster


k8s with HA kube-apiserver and external etcd cluster

This page is on building kubernetes cluster with loadbalancer for kube-apiserver and external etcd cluster.

references to official documents

The HA (highly available kube-apiserver using loadbalancer) is described in the official kubernetes documentation here.

The external etcd topology is explained in the official kubernetes document here.

ToC

  • preparing machines
  • setting up kube-ready nodes
  • building etcd cluster
  • preparing loadbalancer for kube-apiserver on control plane nodes
  • initializing kubernetes cluster
  • demo

Preparing Machines

I prepared physical and virtual machines all on AMD64 with mixed OS just to practice composing ansible playbook capable of handling different kinds of machines.

Baremetal Servers

As for baremetals, it is just a matter of getting the minimal OS installer image on USB, stick it to the machines, and follow the installation wizard to install OS with minimal system utilities and ssh server. I won't go into details on this.

Virtual Servers

As for virtual servers, I decided to build main virtual machines on Proxmox, and also did some testing on Hyper-V using my gaming desktop machine running Windows 11 Pro.

Building a virtual machine on Proxmox web GUI is as straightforward as installing an OS on baremetal machines. Here I leave notes on building virtual machines using templated cloud-init images on Proxmox.

Rocky 9 on Proxmox Using Cloud-init Image

Here is what's done below:

  • download cloud-init image and verify it
  • prepare ssh public key to import on the new machine at ~/.ssh/authorized_keys
  • create a virtual machine using the image and set basic parameters such as CPU, memory, architecture, and NIC
  • convert the machine into a template
  • clone the template to build the actual machine to run
  • set cloud-init parameters such as IP address, username, and ssh public key to allow access
  • start the image
# on proxmox hypervisor

# prepare the directory to place downloaded cloud-init image files
mkdir -p /opt/pve/dnld

# download and verify the file
cd /opt/pve/dnld
wget https://dl.rockylinux.org/pub/rocky/9/images/x86_64/Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2
wget https://dl.rockylinux.org/pub/rocky/9/images/x86_64/Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2.CHECKSUM
sha256sum -c Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2.CHECKSUM
qemu-img info Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2

# prepare ansible user ssh public key
mkdir /opt/pve/ssh_pub
cd /opt/pve/ssh_pub
# get the ssh public key from wherever and place it in the created directory
# or generate one

# environment variables
path_ciimg=/opt/pve/dnld/Rocky-9-GenericCloud-LVM-9.5-20241118.0.x86_64.qcow2
path_ssh_public_key=/opt/pve/ssh_pub/id_ed25519.pub
ciusername=username_here

# create a VM using the downloaded cloud-init image
qm create 9000 --memory 2048 --net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-single --cpu cputype=x86-64-v3 --ostype l26
qm set 9000 --scsi0 local-zfs:0,import-from=$path_ciimg
qm set 9000 --ide2 local-zfs:cloudinit
qm set 9000 --boot order=scsi0

# template it
qm template 9000

# check what's created
qm config 9000
qm showcmd 9000

# create a new VM using the template
qm clone 9000 1308 --name name_of_this_rocky9_vm
qm set 1308 --sshkey $path_ssh_public_key
qm set 1308 --ipconfig0 ip=192.168.111.30/24,gw=192.168.111.1
qm set 1308 --ciuser $ciusername

While many of the steps are identical to what's described in the official document, here are some additional notes on minor deviations:

  • storage is local-zfs instead of local just because I prepared a different storage named local-zfs
  • cputype is x86-64-v3, and I referred to these links to choose one for my machine running Proxmox
  • resizing disk
    • resized disk size of the downloaded qcow2 image itself for most of the OS's I used, and the final cloned virtual machine had all the disk space automatically recognized and mounted
      • # qemu-img resize debian-12-generic-amd64.qcow2 64G
    • for some I did disk resize using qm command like following, and in which case I had to partitioning, pvcreate, vgextend, lvextend, xfs_growfs, and the likes
      • qm disk resize 1312 scsi0 128G
      • obviously changing the disk size on the original cloud-init image is easier

Setting Up Kube-Ready Nodes Using Ansible

Here is the list of machines I prepared.

One of the goal for rebuilding my homelab is to be able to compose whatever ansible automation I want, and here I am with numbers of machines with mixed OS. I ran ansible-playbook to make whatever changes required to make these nodes kubernetes-ready along with other machines not listed here with roles to run services using docker or to serve as jump host.

hostname role os cpu memory disk (+additional disk) machine
cp1 k8s control plane debian 4 8GB 128GB ssd baremetal
cp2 k8s control plane rocky 4 6GB 128GB vm on proxmox
worker1 k8s worker node debian 4 16GB 128GB ssd +6TB baremetal
worker2 k8s worker node rhel 4 6GB 128GB +200GB vm on proxmox
worker3 k8s worker node debian 2 8GB 128GB ssd +500GB baremetal
worker4 k8s worker node debian 16 16GB 512GB ssd baremetal
worker5 k8s worker node ubuntu 4 6GB 128GB +200GB vm on proxmox
etcd1 etcd node debian 4 8GB 64GB ssd baremetal
etcd2 etcd node debian 2 4GB 64GB vm on proxmox
etcd3 etcd node oracle 2 4GB 64GB vm on proxmox

And here is the list of tasks executed using Ansible on all of the nodes listed above. #TODO Link to the repository will be for each tasks when I made them available on a public git repository.

  • bootstrap
    • to create user account for ansible master and enable password-less sudo
    • skipped for machines created using cloud-init images as the ssh key and username I set during the build were for ansible master
  • posture check
    • sshd hardening
    • ordinary package upgrades
    • install certain packages such as nfs which I wanted in common among different machine roles on my homelab
  • configuration changes and packages installation
    • disable swap
    • set ipv4 forward
    • install and setup containerd, runc, and cni
    • disable selinux
    • disable firewalld
    • install kubernetes packages such as kubeadm, kubelet, and kubectl

Building Etcd Cluster

I have followed what's described in the official document on setting up the etcd cluster.

Here is what's done in brief:

  • configure kubelet through systemd to have kubelet look for static pod manifest files and run them
  • prepare kubeadm config file with etcd nodes details for each etcd node that will form the etcd cluster
    • in my case, three different kubeadm config file for etcd1, etcd2, and etcd3
  • generate CA certificate and key for etcd using kubeadm command
  • generate certificates and keys for different communications among etcd and kubernetes components on each etcd node
  • generate static pod manifest that runs etcd on each etcd node
  • kubelet now picks up the generated manifest and spins up etcd service on each node
  • install etcdctl command
  • run etcdctl command to verify that etcd is working fine on each etcd node

TODO: Here is the link (to be added) to the repo containing ansible tasks

One difference from the steps described in the official document (link added in the beginning of the section) and the ansible tasks is that in the former steps a single etcd node prepares all the certificates, keys, and configuration files, and then distributes the products to other etcd nodes, and the latter the generation of files is done on each etcd node that actually uses them. The exception is of course that the etcd CA key is generated in the beginning on one etcd node and then shared to other etcd nodes so that all etcd nodes work under the common CA.

Verification Output of Etcd

PLACEHOLDER

TODO place example output of ss -tlpn, crictl ps, and etcdctl endpoint health

Preparing Loadbalancer for Kube-apiserver on Control Plane Nodes

I have two control plane nodes prepared on my homelab environment, and this section covers the preparation steps to setup high-available control plane (kube-apiserver) using haproxy and keepalived running as static pods on the two control plane nodes.

The Kubernetes official documentation on setting up kubernetes cluster using external etcd cluster is available here. The documentation on high-available control plane is available here, and the actual details on how to setup the loadbalancer is available here.

Here is the brief steps:

  • copy etcd CA certificate to the control plane nodes
  • copy client certificate and key for kube-apiserver to communicate with etcd cluster to the control plane nodes
  • prepare kubeadm config file on one control plane node
    • configured to use the external etcd cluster using the certificates and key files prepared in the previous steps
    • the control plane endpoint configured to point to the VIP and port to be served by the loadbalancer prepared in the following steps
  • prepare a keepalived configuration file, check script which keepalived uses to monitor health of the kube-apiserver (control plane service), and haproxy configuration file
  • prepare static pod manifest files for keepalived and haproxy

initializing Kubernetes Cluster

Run kubeadm init --config {kubeadm config file} on a control plane node to setup kubernetes cluster.

It will generate certificate and key files used by control plane node. These files are copied to the second control plane node as part of the ansible playbook tasks, and then kubeadm join will be executed on the second control plane node to join the created kubernetes cluster.

The similar kubeadm join command will be executed on the rest of the worker nodes to join the cluster.

Now I have the new kubernetes cluster composed of two control plane nodes with loadbalancer running for kube-apiserver access (for example, whatever you run with kubectl gets there), worker nodes, and the external etcd cluster of three etcd nodes running independently from the kubernetes cluster members.

demo

I'll continue on a little bit more on what I'd do after initializing the kubernetes cluster.

Managing Cluster from Other Machines

I could logon to the control plane nodes to manage the cluster but I rather do so without logging onto the control plane.

I copy the /etc/kubernetes/admin.conf file to ~/.kube/config on the same machines I run ansible playbook. This way I can execute bulk, scripted ansible changes and also manage the kubernetes cluster from the same machine.

By the way, if you have multiple clusters with respective /etc/kubernetes/admin.conf files to use, I would prepare aliases like below in ~/.bashrc, ~/.bash_aliases, or any file that works to switch the cluster to work on.

# kubectl unset (default, ~/.kube/config)
alias kcu="export KUBECONFIG="
# set to manage different k8s clusters
alias kchv="export KUBECONFIG=~/.kube/hv-config"
alias kchlv3="export KUBECONFIG=~/.kube/hlv3-config"
alias kclab="export KUBECONFIG=~/.kube/lab-config"

Installing Network Addon

The first component to install is the network addon. There are numbers of options as listed here.

This time I am trying out Cilium, and I'm installing it using helm.

  • add helm repo
  • run helm search and identify the version
  • download the values file of the chart locally
  • analyze and edit the values file as necessary
  • install the helm chart
    • specify the modified values file to use if any changes were made with --values {modified_values_file.yaml}
# helm installation) https://helm.sh/docs/intro/install/

# add cilium repository
helm repo add cilium https://helm.cilium.io/

# confirm the latest version
helm search repo cilium
# helm search repo cilium -l  # to see the list of older versions available

# download the values file
helm show values cilium/cilium --version 1.16.6 > cilium-values.yaml

# edit the values file as necessary

# install
helm install cilium cilium/cilium --version 1.16.6 --values cilium-values.yaml

# monitor cilium pods and wait until all are up
kubectl get pods -n kube-system

# confirmation using helm commands
helm status cilium -n kube-system
helm get values cilium -n kube-system

# upgrade the existing release
# with revised values file when there's any change
helm upgrade cilium cilium/cilium -f cilium-values.yaml -n kube-system

# rollout restart for cilium deamonsets and deployments
# when necessary after helm release upgrade
kubectl -n kube-system rollout restart deployment/cilium-operator
kubectl -n kube-system rollout restart ds/cilium

# uninstall the helm release "cilium"
helm uninstall cilium -n kube-system

Cilium Values File

Here is the list of changes made on cilium chart 1.16.6:

  • k8sServiceHost: {kube_endpoint_ip_or_fqdn}
  • k8sServicePort: "8443"
  • kubeProxyReplacement: "true"
  • kubeProxyReplacementHealthzBindAddr: "0.0.0.0:10256"
  • l2announcements.enabled: true
  • externalIPs.enabled: true
  • hubble.relay.enabled: true
  • hubble.ui.enabled: true
  • gatewayAPI.enabled: true

Update CoreDNS Configuration

As the next step, I'd like to make changes on CoreDNS configuration. Two changes I want to make are:

  • to point root forwarder to the two DNS servers running on my homelab outside the kubernetes cluster
  • to add the cluster DNS domain name to the search suffix list

The changes can be made by preparing modified configmap manifest for the coredns and applying it.

# retrieve the configmap
kubectl get configmap coredns -n kube-system -o yaml > configmap-coredns-original.yaml
cp configmap-coredns-original.yaml configmap-coredns-custom-domain.yaml
# edit the configmap and then apply it
kubectl replace -f configmap-coredns-custom-domain.yaml
# restart coredns deployment to recreate coredns pods with the updated configmap
kubectl -n kube-system rollout restart deployment coredns

Here is the jinja2 template of the coredns configmap file with variables to add domain suffix with kube_cluster_domain variable, and the two forwarders with nameservers[0] and nameservers[1] variables. I actually ran ansible playbook to make above changes.

configmap-coredns-custom-domain.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        log
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes {{ kube_cluster_domain }} cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . {{ nameservers[0] }} {{ nameservers[1] }} {
           health_check 5s
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

Pod Security Administration

Applying pod security standards to the cluster is one of the things listed in the best practice section of the document after kubeadm section.

https://kubernetes.io/docs/setup/best-practices/enforcing-pod-security-standards/

Let me just apply non-enforcing, baseline policy as a starter.

kubectl label --overwrite ns --all \
  pod-security.kubernetes.io/audit=baseline \
  pod-security.kubernetes.io/warn=baseline

Test Namespace and Pods

The kubernetes cluster becomes functional after installing network addon, and I also have made necessary changes on CoreDNS in the cluster and am ready to test running something in the cluster.

  • create test namespace
  • create test deployment
  • get inside the pods spun up and see how they look
# create test namespace
kubectl create namespace test

# add dnsutils pod in the test namespace
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: test
spec:
  containers:
    - name: dnsutils
      image: registry.k8s.io/e2e-test-images/agnhost:2.39
      command:
        - sleep
        - "infinity"
      imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF

# view /etc/resolv.conf file
kubectl exec -t pod/dnsutils -n test -- cat /etc/resolv.conf
# internal DNS records using suffix list
kubectl exec -t pod/dnsutils -n test -- dig +search cp1
kubectl exec -t pod/dnsutils -n test -- dig +search worker1
# external records
kubectl exec -t pod/dnsutils -n test -- dig google.com.
kubectl exec -t pod/dnsutils -n test -- dig cloudflare.com.

# clean up
kubectl delete namespace test