building homelab cluster part 8
Table of Content
building homelab cluster part 8¶
In last part I setup monitoring system, and this part I am going to setup logging system.
loki¶
https://github.com/grafana/loki
Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.
components¶
Agent - An agent or client, for example Promtail, which is distributed with Loki, or the Grafana Agent. The agent scrapes logs, turns the logs into streams by adding labels, and pushes the streams to Loki through an HTTP API.
Loki - The main server, responsible for ingesting and storing logs and processing queries. It can be deployed in three different configurations, for more information see deployment modes.
Grafana for querying and displaying log data. You can also query logs from the command line, using LogCLI or using the Loki API directly.
deployment modes¶
There are simple, monolithic, and microservice. Illustration and description of each is available and easy to understand.
https://grafana.com/docs/loki/latest/get-started/deployment-modes/
(simple scalable deployment mode)/It strikes a balance between deploying in monolithic mode or deploying each component as a separate microservice.
I am going with simple scalable mode.
loki values file¶
https://grafana.com/docs/loki/latest/setup/install/helm/
Here is the list of items you can see from the helm values file. I am omitting enterprise section because that's not an option for me building homelab.
- loki
- storage
- bucketNames
- chunks, ruler, admin
- type: s3
- s3 access details
- bucketNames
- memcached
- chunk_cache (enabled: false)
- results_cache (enabled: false)
- storage
- monitoring
- dashboard
- rules (prometheus rules)
- service monitor
- self monitoring
- loki canary
- write
- table manager (enabled: false, deprecated as per v2.9 doc)
- read
- backend
- single binary (replicas 0)
- ingress (enabled: false)
- memberlist service
- gateway (enabled: true) # changed this to false
- network policy (enabled: false)
- minio (enabled: false)
- sidecar
preparing s3 bucket¶
I will just follow the default bucket names found in the values file and create each bucket on my minio tenant.
- admin
- chunks
- ruler
I create a new group named loki-group and user named loki, set rw policy to the group and generate access key and secret for the user.
Loki helm chart does not seem to have an option to use s3 credentials in secret like gitlab-runner helm chart.
preparing memcached¶
https://grafana.com/docs/loki/latest/operations/caching/
There are two sets of memcached cluster recommended. They are named chunk_cache and results_cache in the values file.
Here is the instructed memcached settings:
- chunk_cache: --memory-limit=4096 --max-item-size=2m --conn-limit=1024
- results_cache: --memory-limit=1024 --max-item-size=5m --conn-limit=1024
The number of concurrent connections limit is set to 1024 by default as per the official wiki.
https://artifacthub.io/packages/helm/bitnami/memcached
# cd {homelab repo}/infrastructure/CLUSTERNAME/controllers/default-values
# confirm the version, 6.14.0 as of 20240308 (memcached v1.6.24)
helm show chart oci://registry-1.docker.io/bitnamicharts/memcached
# get the values file
helm show values oci://registry-1.docker.io/bitnamicharts/memcached > memcached-values.yaml
cp memcached-values.yaml ../.
Here is my values file. This one is for chunk, and I have another copy named results-memcached-values.yaml for results_cache with modified args -m 1024
and -I 5m
.
20c20
< storageClass: "directpv-min-io"
---
> storageClass: ""
80,81c80,81
< registry: registry.blink-1x52.net
< repository: cache-dockerio/bitnami/memcached
---
> registry: docker.io
> repository: bitnami/memcached
102c102
< architecture: high-availability
---
> architecture: standalone
130,134c130
< args:
< - /run.sh
< - -m 4096
< - -I 2m
< - --conn-limit=1024
---
> args: []
152c148
< replicaCount: 3
---
> replicaCount: 1
229,235c225
< resources:
< requests:
< cpu: 250m
< memory: 256Mi
< limits:
< cpu: 1
< memory: 2048Mi
---
> resources: {}
326,327c316
< nodeSelector:
< app.kubernetes.io/part-of: directpv
---
> nodeSelector: {}
564c553
< enabled: true
---
> enabled: false
572c561
< storageClass: "directpv-min-io"
---
> storageClass: ""
Here is my script to generate flux helmrepo and hr.
#!/bin/bash
# add flux helmrepo to the manifest
flux create source helm bitnami \
--url=oci://registry-1.docker.io/bitnamicharts \
--interval=1h0m0s \
--export >memcached.yaml
# add flux helm release to the manifest including the customized values.yaml file
flux create helmrelease chunk-memcached \
--interval=10m \
--target-namespace=memcached \
--source=HelmRepository/bitnami \
--chart=memcached \
--chart-version=6.14.0 \
--values=chunk-memcached-values.yaml \
--export >>memcached.yaml
# add flux helm release to the manifest including the customized values.yaml file
flux create helmrelease results-memcached \
--interval=10m \
--target-namespace=memcached \
--source=HelmRepository/bitnami \
--chart=memcached \
--chart-version=6.14.0 \
--values=results-memcached-values.yaml \
--export >>memcached.yaml
NAME READY STATUS RESTARTS AGE
pod/memcached-chunk-memcached-0 1/1 Running 0 66s
pod/memcached-chunk-memcached-1 1/1 Running 0 66s
pod/memcached-chunk-memcached-2 1/1 Running 0 66s
pod/memcached-results-memcached-0 1/1 Running 0 66s
pod/memcached-results-memcached-1 1/1 Running 0 66s
pod/memcached-results-memcached-2 1/1 Running 0 66s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/memcached-chunk-memcached ClusterIP 10.102.190.128 <none> 11211/TCP 66s
service/memcached-results-memcached ClusterIP 10.104.25.227 <none> 11211/TCP 66s
NAME READY AGE
statefulset.apps/memcached-chunk-memcached 3/3 66s
statefulset.apps/memcached-results-memcached 3/3 66s
loki values file, helmrepo, and helmrelease¶
# add repo
helm repo add grafana https://grafana.github.io/helm-charts
# update
helm repo update
# confirm the version, 5.43.6 as of 2024-03-12
helm search repo grafana/loki
helm show chart grafana/loki
Here is the diff from the original values file.
280c280
< endpoint: s3.blink-1x52.net
---
> endpoint: null
282,283c282,283
< secretAccessKey: REDACTED
< accessKeyId: REDACTED
---
> secretAccessKey: null
> accessKeyId: null
330,331c330,331
< enabled: true
< host: "memcached-chunk-memcache.memcached.svc"
---
> enabled: false
> host: ""
336,337c336,337
< enabled: true
< host: "memcached-results-memcached.memcached.svc"
---
> enabled: false
> host: ""
844,845c844
< nodeSelector:
< app.kubernetes.io/part-of: directpv
---
> nodeSelector: {}
867c866
< storageClass: directpv-min-io
---
> storageClass: null
1023,1024c1022
< nodeSelector:
< app.kubernetes.io/part-of: directpv
---
> nodeSelector: {}
1041c1039
< storageClass: directpv-min-io
---
> storageClass: null
1127,1128c1125
< nodeSelector:
< app.kubernetes.io/part-of: directpv
---
> nodeSelector: {}
1144c1141
< size: 50Gi
---
> size: 10Gi
1150c1147
< storageClass: directpv-min-io
---
> storageClass: null
1292c1289
< enabled: false
---
> enabled: true
loki installed¶
I see a lot of things are running.
It may take too much time for flux helmrelease max health check retries which results in the flux hr status stuck in "not ready". Try flux suspend hr loki
and then flux resume hr loki
to resolve it.
NAME READY STATUS RESTARTS AGE
pod/loki-backend-0 2/2 Running 2 (11m ago) 12m
pod/loki-backend-1 2/2 Running 3 (10m ago) 12m
pod/loki-backend-2 2/2 Running 3 (10m ago) 12m
pod/loki-canary-4xrzj 1/1 Running 0 12m
pod/loki-canary-p9qll 1/1 Running 0 12m
pod/loki-canary-qtqrq 1/1 Running 0 12m
pod/loki-canary-vwhnc 1/1 Running 0 12m
pod/loki-canary-wtzx5 1/1 Running 0 12m
pod/loki-loki-grafana-agent-operator-59b5949888-9nrgv 1/1 Running 0 12m
pod/loki-loki-logs-86vv7 2/2 Running 0 12m
pod/loki-loki-logs-bcdnh 2/2 Running 0 12m
pod/loki-loki-logs-j56zb 2/2 Running 0 12m
pod/loki-loki-logs-k5277 2/2 Running 0 12m
pod/loki-loki-logs-pnlbd 2/2 Running 0 12m
pod/loki-read-c9f55f985-hgscn 1/1 Running 0 12m
pod/loki-read-c9f55f985-qlbgh 1/1 Running 0 12m
pod/loki-read-c9f55f985-v56lg 1/1 Running 0 12m
pod/loki-write-0 1/1 Running 0 12m
pod/loki-write-1 1/1 Running 0 12m
pod/loki-write-2 1/1 Running 0 12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/loki-backend ClusterIP 10.109.177.120 <none> 3100/TCP,9095/TCP 12m
service/loki-backend-headless ClusterIP None <none> 3100/TCP,9095/TCP 12m
service/loki-canary ClusterIP 10.99.208.197 <none> 3500/TCP 12m
service/loki-memberlist ClusterIP None <none> 7946/TCP 12m
service/loki-read ClusterIP 10.99.247.137 <none> 3100/TCP,9095/TCP 12m
service/loki-read-headless ClusterIP None <none> 3100/TCP,9095/TCP 12m
service/loki-write ClusterIP 10.104.241.140 <none> 3100/TCP,9095/TCP 12m
service/loki-write-headless ClusterIP None <none> 3100/TCP,9095/TCP 12m
service/query-scheduler-discovery ClusterIP None <none> 3100/TCP,9095/TCP 12m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/loki-canary 5 5 5 5 5 <none> 12m
daemonset.apps/loki-loki-logs 5 5 5 5 5 <none> 12m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/loki-loki-grafana-agent-operator 1/1 1 1 12m
deployment.apps/loki-read 3/3 3 3 12m
NAME DESIRED CURRENT READY AGE
replicaset.apps/loki-loki-grafana-agent-operator-59b5949888 1 1 1 12m
replicaset.apps/loki-read-c9f55f985 3 3 3 12m
NAME READY AGE
statefulset.apps/loki-backend 3/3 12m
statefulset.apps/loki-write 3/3 12m
more changes on loki values file¶
wip
- memcached service name - related to srv record lookup
- s3 path settings for the buckets access
- loki canary args to specify destination loki server
promtail¶
https://grafana.com/docs/loki/latest/send-data/promtail/installation/
Promtail is an agent which ships the contents of local logs to a private Grafana Loki instance or Grafana Cloud
# confirm chart/version, 6.15.5 as of 2024-03-12
helm show chart grafana/promtail
# get values file
helm show values grafana/promtail --version=6.15.5 > promtail-values.yaml
Since the gateway is disabled on my loki installation, I modified the promtail logging destination in the promtail values file.
415c415
< - url: http://loki-gateway/loki/api/v1/push
---
> - url: http://loki-write:3100/loki/api/v1/push
And the helmrepo is the same with loki, I just prepared the script to generate flux hr for promtail. This is installed in the same "loki" namespace.
#!/bin/bash
# add flux helm release to the manifest including the customized values.yaml file
flux create helmrelease promtail \
--interval=10m \
--target-namespace=loki \
--source=HelmRepository/grafana \
--chart=promtail \
--chart-version=6.15.5 \
--values=promtail-values.yaml \
--export >promtail.yaml
check logs on grafana¶
Access grafana, navigate to Home > Connections > Your Connections > Data sources, and add loki data source.
The server http url in my case is "http://loki-read.loki:3100" where "loki-read" is the svc name and following ".loki" is to reach the service in the different namespace, "loki" from "monitoring" namespace where grafana is in.
Once that's added, navigate to Home > Explore, and select loki as data source and run query.