longhorn installation

Table of Content

longhorn installation¶

Longhorn is a powerful, lightweight, and easy-to-use distributed block storage system designed specifically for Kubernetes. Built to deliver high availability, reliability, and performance, Longhorn simplifies the complexities of persistent storage in cloud-native environments. It seamlessly integrates with Kubernetes to provide persistent volumes that are resilient to node failures, making it an ideal solution for running stateful applications in production. With features like snapshot, backup, and volume cloning, Longhorn offers enterprise-grade capabilities while remaining open-source and developer-friendly.

environment check script¶

There is an environment check script you can run to check if something is missing in your environment to run longhorn.

In my case, the script gave me three issues:

kernel module iscsi_tcp missing
cryptsetup package missing
warning in regard with multipathd on Ubuntu hosts

$ curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.8.1/scripts/environment_check.sh | bash
[INFO]  Required dependencies 'kubectl jq mktemp sort printf' are installed.
[INFO]  All nodes have unique hostnames.
[INFO]  Waiting for longhorn-environment-check pods to become ready (0/0)...
[INFO]  Waiting for longhorn-environment-check pods to become ready (0/6)...
[INFO]  All longhorn-environment-check pods are ready (6/6).
[INFO]  MountPropagation is enabled
[INFO]  Checking kernel release...
[INFO]  Checking iscsid...
[ERROR] kernel module iscsi_tcp is not enabled on lab-cp3
[ERROR] kernel module iscsi_tcp is not enabled on lab-cp1
[ERROR] kernel module iscsi_tcp is not enabled on lab-worker2
[ERROR] kernel module iscsi_tcp is not enabled on lab-worker1
[ERROR] kernel module iscsi_tcp is not enabled on lab-cp2
[ERROR] kernel module iscsi_tcp is not enabled on lab-worker3
[INFO]  Checking multipathd...
[WARN]  multipathd is running on lab-cp3 known to have a breakage that affects Longhorn.  See description and solution at https://longhorn.io/kb/troubleshooting-volume-with-multipath
[INFO]  Checking packages...
[ERROR] cryptsetup is not found in lab-cp1.
[ERROR] cryptsetup is not found in lab-worker2.
[ERROR] cryptsetup is not found in lab-worker1.
[ERROR] cryptsetup is not found in lab-cp2.
[ERROR] cryptsetup is not found in lab-worker3.
[INFO]  Checking nfs client...
[INFO]  Cleaning up longhorn-environment-check pods...
[INFO]  Cleanup completed.

multipathd remed on lab-cp3¶

Add this block in /etc/multipath.conf and restart the service by running sudo systemctl restart multipathd.service.

blacklist {
    devnode "^sd[a-z0-9]+"
}

cryptsetup package¶

Install using package manager (apt or dnf).

kernel modules¶

Create a file /etc/modules-load.d/longhorn-requirements.conf, add lines below, and reboot the system.

Here I am adding dm_crypt as well. It's because the longhorn node description was showing the warning that this module is missing when I was checking each node on Longhorn UI.

iscsi_tcp
dm_crypt

other optional requirement for great performance using experimental feature¶

These are for Longhorn V2 Data Engine utilizing SPDK, Storage Performance Developer Kit, to use "block" disk type which is the experimental feature as of April 2025, and is resource intensive (local nvme disk, 1 dedicated CPU, 2G memory for 1024 huge page, and so on). I will be skipping it and stick with the file system disk type this time, but here is the list of requirements.

SSE4.2 instruction set support
- grep sse4_2 /proc/cpuinfo
5.19 or later is required for NVMe over TCP support
- v6.7 or later is recommended for improved system stability
linux kernel modules
- vfio_pci
- uio_pci_generic
- nvme-tcp
Huge page support, 2 GiB of 2 MiB-sized pages

Longhorn installation¶

I am skipping some detailed steps, but here is the list of what I am going to do:

create longhorn-system namespace
identify the helm chart version I want to use
download the values file to my GitOps repository and edit
create flux HelmRepository and HelmRelease manifests and include it in the existing infra-controllers flux kustomization

Creating longhorn-system namespace¶

I like to manage the namespace manifest independent from helm releases, so I am adding the namespace manifest in my flux-system kustomization.

I will be using Cilium gateway to access the Longhorn UI, so I have added the gateway label which allows the HTTPRoute in this namespace to use the gateway.

# ./clusters/lab-hlv3/namespaces/longhorn-system.yaml
---
kind: Namespace
apiVersion: v1
metadata:
  name: longhorn-system
  labels:
    service: longhorn
    type: infrastructure
    gateway: cilium

Longhorn helm chart and values¶

Let's add the helm repository storing Longhorn and find the available version.

helm repo add longhorn https://charts.longhorn.io
helm search repo longhorn
# helm repo update

# store values file locally
helm show values --version 1.8.1 longhorn/longhorn > longhorn-1.8.1-values.yaml

Changes made to the values file¶

image registry to use local harbor
longhorn ui replica count from 2 to 1
persistence settings
- default class replica count from 3 to 2
- data locality setting from disabled to best-effort

Flux helmrepo and helmrelease for longhorn¶

Here I am skipping the details, but I have prepared a shell script to generate flux helmrepo and helmrelease manifests using the modified values file, and I have added the product file to the infra-controllers kustomization. Please see the similar example done when installing cert-manager.

Longhorn installation result¶

Many microservices spin up, run, and disappear during the installation. In case on my lab VMs running on Hyper-V and Proxmox, it took around 20 minutes to complete.

Here is the part of flux tree output, just on the longhorn HelmRepository and HelmRelease.

$ flux tree ks infra-controllers
Kustomization/flux-system/infra-controllers
├── HelmRelease/flux-system/longhorn
│   ├── PriorityClass/longhorn-critical
│   ├── ServiceAccount/longhorn-system/longhorn-service-account
│   ├── ServiceAccount/longhorn-system/longhorn-ui-service-account
│   ├── ServiceAccount/longhorn-system/longhorn-support-bundle
│   ├── ConfigMap/longhorn-system/longhorn-default-resource
│   ├── ConfigMap/longhorn-system/longhorn-default-setting
│   ├── ConfigMap/longhorn-system/longhorn-storageclass
│   ├── CustomResourceDefinition/backingimagedatasources.longhorn.io
│   ├── CustomResourceDefinition/backingimagemanagers.longhorn.io
│   ├── CustomResourceDefinition/backingimages.longhorn.io
│   ├── CustomResourceDefinition/backupbackingimages.longhorn.io
│   ├── CustomResourceDefinition/backups.longhorn.io
│   ├── CustomResourceDefinition/backuptargets.longhorn.io
│   ├── CustomResourceDefinition/backupvolumes.longhorn.io
│   ├── CustomResourceDefinition/engineimages.longhorn.io
│   ├── CustomResourceDefinition/engines.longhorn.io
│   ├── CustomResourceDefinition/instancemanagers.longhorn.io
│   ├── CustomResourceDefinition/nodes.longhorn.io
│   ├── CustomResourceDefinition/orphans.longhorn.io
│   ├── CustomResourceDefinition/recurringjobs.longhorn.io
│   ├── CustomResourceDefinition/replicas.longhorn.io
│   ├── CustomResourceDefinition/settings.longhorn.io
│   ├── CustomResourceDefinition/sharemanagers.longhorn.io
│   ├── CustomResourceDefinition/snapshots.longhorn.io
│   ├── CustomResourceDefinition/supportbundles.longhorn.io
│   ├── CustomResourceDefinition/systembackups.longhorn.io
│   ├── CustomResourceDefinition/systemrestores.longhorn.io
│   ├── CustomResourceDefinition/volumeattachments.longhorn.io
│   ├── CustomResourceDefinition/volumes.longhorn.io
│   ├── ClusterRole/longhorn-role
│   ├── ClusterRoleBinding/longhorn-bind
│   ├── ClusterRoleBinding/longhorn-support-bundle
│   ├── Service/longhorn-system/longhorn-backend
│   ├── Service/longhorn-system/longhorn-frontend
│   ├── Service/longhorn-system/longhorn-conversion-webhook
│   ├── Service/longhorn-system/longhorn-admission-webhook
│   ├── Service/longhorn-system/longhorn-recovery-backend
│   ├── DaemonSet/longhorn-system/longhorn-manager
│   ├── Deployment/longhorn-system/longhorn-driver-deployer
│   └── Deployment/longhorn-system/longhorn-ui
└── HelmRepository/flux-system/longhorn

Access to Longhorn UI¶

The pod and service for longhorn web frontend are created as part of the helm chart installation. Let's configure the gateway and add HTTPRoute to enable web access to the Longhorn UI.

$ kubectl get svc longhorn-frontend -n longhorn-system
NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
longhorn-frontend   ClusterIP   10.96.107.178   <none>        80/TCP    23h

Cilium gateway¶

Here is the updated gateway manifest. Once this change is pushed, the gateway spins up the new listener for longhorn.lab.blink-1x52.net and cert-manager prepares the TLS certificate.

# ./infrastructure/lab-hlv3/configs/cilium/gateway.yaml
---
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: cilium-gateway
  namespace: gateway
  annotations:
    cert-manager.io/issuer: issuer
spec:
  gatewayClassName: cilium
  addresses:
    - type: IPAddress
      value: 192.168.1.79
  listeners:
    - name: whoami-kube-http
      hostname: whoami-kube.lab.blink-1x52.net
      port: 80
      protocol: HTTP
      allowedRoutes:
        namespaces:
          from: Selector
          selector:
            matchLabels:
              gateway: cilium

    - name: whoami-kube-https
      hostname: whoami-kube.lab.blink-1x52.net
      port: 443
      protocol: HTTPS
      allowedRoutes:
        namespaces:
          from: Selector
          selector:
            matchLabels:
              gateway: cilium
      tls:
        mode: Terminate
        certificateRefs:
          - name: tls-whoami-kube
            kind: Secret
            namespace: gateway

    - name: longhorn-https
      hostname: longhorn.lab.blink-1x52.net
      port: 443
      protocol: HTTPS
      allowedRoutes:
        namespaces:
          from: Selector
          selector:
            matchLabels:
              gateway: cilium
      tls:
        mode: Terminate
        certificateRefs:
          - name: tls-longhorn
            kind: Secret
            namespace: gateway

HTTPRoute¶

And here is the HTTPRoute to connect the longhorn-https listener and longhorn-frontend service.

# ./infrastructure/lab-hlv3/configs/longhorn/httproutes.yaml
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: longhorn-https
  namespace: longhorn-system
spec:
  parentRefs:
    - name: cilium-gateway
      sectionName: longhorn-https
      namespace: gateway
  hostnames:
    - "longhorn.lab.blink-1x52.net"
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /
      backendRefs:
        - name: longhorn-frontend
          port: 80

Longhorn UI and Disks¶

You can check nodes and disks on the UI. By default, or by how I configured values file and installed Longhorn, the longhorn system is reserving some space for system and making the rest as scheduled, ready to serve.

Longhorn Dashboard

Next I am going to add additional 80GB disk formatted in xfs filesystem to two worker nodes, lab-worker2 and lab-worker3. I will be keeping the root filesystem disk space untouched as for the time being.

Expanding the disk space on lab-cp2 running Rocky Linux¶

I found the node lab-cp2 showing warning on Longhorn UI that it does not have any disk space. I learnt that the disk resizing done on Proxmox using qm command did make the disk space bigger, but Rocky Linux cloud-init template image did not use the expanded disk space.

# inside the VM...
sudo lsblk
# partition unused disk space
sudo fdisk /dev/sda
sudo lsblk -f
# /dev/sda5 was created

# create new physical volume using this available disk partition
sudo pvcreate /dev/sda5

# identify existing volume group name
sudo vgs
# add the pv to the existing vg
sudo vgextend rocky /dev/sda5
sudo vgdisplay # to confirm the added free space

# identify target lvm and expand
sudo lvs
sudo lvextend -l +100%FREE /dev/rocky/lvroot
# resize the xfs filesystem
sudo xfs_growfs /dev/rocky/lvroot
df -h

Adding disk on lab-worker2, VM running on Hyper-V¶

Here are the steps:

create a new virtual disk
look for the available iscsi controller to connect to on the target VM
- create a new iscsi controller if needed
add the disk to the VM

The operations done using powershell are as shown below.

# create a new vhdx
New-VHD -Path "F:\hyperv\vhd\lab-worker2-disk2.vhdx" -SizeBytes 80GB -Dynamic

# confirm the existing disks connected
Get-VMHardDiskDrive -VMName lab-worker2 -ControllerType SCSI

# confirm the controllers
# there should be a couple for disks and DVD drive
Get-VMScsiController -VMName lab-worker2

# add one controller
# and confirm the controller ID newly created and open for the new disk connection
Add-VMScsiController -VMName lab-worker2
Get-VMScsiController -VMName lab-worker2

# add the newly created vhdx to the VM using the available controller
Add-VMHardDiskDrive -VMName lab-worker2 -ControllerType SCSI -ControllerNumber 2 -Path "F:\hyperv\vhd\lab-worker2-disk2.vhdx"

Here is the steps done on the VM host:

create a new partition
make it ext4
mount it

# identify the device
sudo fdisk -l

# assuming the newly attached disk is on /dev/sda
sudo fdisk /dev/sda

# on fdisk menu,
# "p" to print existing partitions
# "d" to delete if there is any existing one
# "n" to create new partition, create "p" primary, one big partition
# and then "w" to write the change and exit fdisk

# confirm the result and should see /dev/sda1
sudo fdisk -l

# format /dev/sda1
# sudo mkfs.ext4 /dev/sda1
# update: go with xfs as that's the recommended filesystem for Minio S3
# sudo apt install xfsprogs  # if mkfs.xfs command is missing
sudo mkfs.xfs /dev/sda1

# confirm UUID of the formatted disk space
sudo blkid

# prepare directory to mount this new disk
sudo mkdir -p /mnt/disk2

Edit fstab file to add one line below to set automatic disk mount.

UUID=UUID_FOR_DEV_SDA1_CONFIRMED_ABOVE /mnt/disk2 xfs defaults,noatime 0 2

Verify the result and then reboot to ensure the new disk is properly mounted even after machine reboot.

sudo findmnt --verify --verbose

sudo systemctl reboot

And once I have the new file system ready to use on longhorn, I navigate to the node menu and add the disk.

Adding disk on lab-worker3, VM running on Proxmox¶

I want to add another 80GB on this VM running on Proxmox.

# shutdown the target VM
# on this VM) sudo systemctl poweroff, or sudo shutdown -h now
# or on proxmox) qm stop VMID

export VMID=1234 # VM ID

# see the status of each available storage on proxmox ve
pvesm status

# list the disks on specific storage
pvesm list local-lvm

# create and allocate 80GB disk
pvesm alloc local-lvm $VMID vm-$VMID-disk2 80G

# confirm vm config
qm config $VMID

# attach the created disk at scsi1
qm set $VMID --scsi1 local-lvm:vm-$VMID-disk2

# verify and start the VM
pvesm list local-lvm
qm config $VMID
qm start $VMID

And do the same for the other VM to create partition, format it, mount it, and add the new disk for longhorn to use.

Adding the new disk to longhorn¶

Longhorn node edit to add disk2

I have added /mnt/disk2 on lab-worker2 and lab-worker3 node.

I have also disabled the control plane nodes and root disk on worker nodes, so the remaining disks available for scheduling are the ones I just added, giving me approx 160GB disk space to use.

Longhorn disk2 only

Test PVC¶

Longhorn dashboard before volume creation

Here is the condition since I have disabled all available nodes and disks except for the two newly added ones on lab-worker2 and lab-worker3.

there are 156 Gi schedulable storage
there are 2 nodes with schedulable storage

This is my test deployment, service, and persistent volume claim, pvc:

service to access nginx
deploy 1 replica of nginx
- mount longhorn volume at /usr/share/nginx/html
- recreate strategy type set so that whenever this deployment needs to restart, there will be downtime and a time when the available pod is zero
pvc requesting 1Gi from "longhorn" storageclass

Longhorn dashboard after test volume creation

Once this is pushed and reconciliated by fluxcd, the persistent volume is created for the pvc requested. The pod was created on lab-worker2 and so is the volume. This is due to the data locality option set in the helm values file used to install longhorn.

$ kubectl -n longhorn-system get pods -o wide | grep test
my-longhorn-test-676d6b57dc-gzqtc                   1/1     Running   0                3m40s   10.0.3.202   lab-worker2   <none>           <none>

When I add nodeSelector to choose lab-worker1 to the deployment, the pod running on lab-worker2 gets destroyed, the pvc stays, the new pod gets created on lab-worker1, and the same pvc attaches to the pod now running on lab-worker1.

I was able to edit the file on the pod when running on lab-worker2, and then on lab-worker1 as well. Below is the curl command output.

# edit /usr/share/nginx/html/index.html on the pod running on lab-worker2
$ kubectl exec deploy/tools -n testbed -- curl http://longhorn-test.longhorn-system.svc.lab.blink-1x52.net
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    50  100    50    0     0  21758      0 --:--:-- --:--:-- --:--:-- 50000
<html>
<body>
<h1>hi, world.</h1>
</body>
</html>

# nodeSelector lab-worker1 added to the deployment, pod gets re-created

# edit /usr/share/nginx/html/index.html on the pod running on lab-worker1
$ kubectl exec deploy/tools -n testbed -- curl http://longhorn-test.longhorn-system.svc.lab.blink-1x52.net
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    76  100    76    0     0  30170      0 --:--:-- --:--:-- --:--:-- 38000
<html>
<body>
<h1>hi, world.</h1>
<h2>hi, world at h2.</h2>
</body>
</html>

Captures below are for the volume when the pod was running on lab-worker2, and then on lab-worker1 showing warnings on the data locality but functional.

Longhorn volume for pod on lab-worker2

Longhorn volume for pod on lab-worker1

Closing¶

All looks good! Next up, I will be adding monitoring service which will definitely require disks to store data, and then setup S3 service and logging system which will use the S3 storage service to store log data.