building homelab cluster part 7
Table of Content
building homelab cluster part 7¶
In this part, I am going to setup monitoring system for the cluster. I am going to install kube-prometheus which comes along with grafana and other monitoring components.
kube-prometheus¶
https://github.com/prometheus-operator/kube-prometheus
install crds¶
As described in the quick start, the way to install kube-prometheus is to first apply everything in side ./manifests/setup directory, and then ./manifests directory.
The manifest files inside ./manifests/setup directory are all custom resource definitions, so I will merge them into a single crds file and place them in my ./infrastructure/homelab/controllers/crds
directory.
# clone the repository
mkdir -p ~/repos/github.com/prometheus-operator
cd ~/repos/github.com/prometheus-operator
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
# confirm the version
git branch -lr
# change the version
git checkout release-0.13
# remove namespace manifest as this will be created separately
cd manifests/setup
rm namespace.yaml
# merge all the crds files
cat *.yaml > kube-prometheus-v0.13.yaml
# add '---' separator line
sed -i '/^apiVersion/i ---' kube-prometheus-v0.13.yaml
# copy
cp kube-prometheus-v0.13.yaml {homelab repo}/infrastructure/hyper-v/controllers/crds/.
Now, I changed the way I add a new namespace to add them all in ./clusters/{cluster name}/namespace
so that the namespace and secret from homelab-sops repository does not get affected by trial and error adding and pulling back all the manifests in infra-controllers space.
I am planning to gain access to grafana through gateway, so I'm adding the gateway label.
---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
service: monitoring
type: infrastructure
gateway-available: yes
I update infra-controllers kustomization to include the new kube-prometheus crds.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
# CRDs
- crds/gateway-v1.0.0.yaml
- crds/directpv-v4.0.10.yaml
- crds/cert-manager-v1.14.3.yaml
- crds/kube-prometheus-v0.13.yaml
# infra-controllers
- sops.yaml
- metallb.yaml
- ngf.yaml
- minio-operator.yaml
- minio-tenant.yaml
- cert-manager.yaml
- gitlab-runner.yaml
Here is the result.
alertmanagerconfigs amcfg monitoring.coreos.com/v1alpha1 true AlertmanagerConfig
alertmanagers am monitoring.coreos.com/v1 true Alertmanager
podmonitors pmon monitoring.coreos.com/v1 true PodMonitor
probes prb monitoring.coreos.com/v1 true Probe
prometheusagents promagent monitoring.coreos.com/v1alpha1 true PrometheusAgent
prometheuses prom monitoring.coreos.com/v1 true Prometheus
prometheusrules promrule monitoring.coreos.com/v1 true PrometheusRule
scrapeconfigs scfg monitoring.coreos.com/v1alpha1 true ScrapeConfig
servicemonitors smon monitoring.coreos.com/v1 true ServiceMonitor
thanosrulers ruler monitoring.coreos.com/v1 true ThanosRuler
resource manifests¶
Now, there are tons of files in the ./manifests directory of the kube-prometheus repo.
alertmanager-alertmanager.yaml kubernetesControlPlane-serviceMonitorCoreDNS.yaml prometheusAdapter-podDisruptionBudget.yaml
alertmanager-networkPolicy.yaml kubernetesControlPlane-serviceMonitorKubeControllerManager.yaml prometheusAdapter-roleBindingAuthReader.yaml
alertmanager-podDisruptionBudget.yaml kubernetesControlPlane-serviceMonitorKubelet.yaml prometheusAdapter-serviceAccount.yaml
alertmanager-prometheusRule.yaml kubernetesControlPlane-serviceMonitorKubeScheduler.yaml prometheusAdapter-serviceMonitor.yaml
alertmanager-secret.yaml kubeStateMetrics-clusterRoleBinding.yaml prometheusAdapter-service.yaml
alertmanager-serviceAccount.yaml kubeStateMetrics-clusterRole.yaml prometheus-clusterRoleBinding.yaml
alertmanager-serviceMonitor.yaml kubeStateMetrics-deployment.yaml prometheus-clusterRole.yaml
alertmanager-service.yaml kubeStateMetrics-networkPolicy.yaml prometheus-networkPolicy.yaml
blackboxExporter-clusterRoleBinding.yaml kubeStateMetrics-prometheusRule.yaml prometheusOperator-clusterRoleBinding.yaml
blackboxExporter-clusterRole.yaml kubeStateMetrics-serviceAccount.yaml prometheusOperator-clusterRole.yaml
blackboxExporter-configuration.yaml kubeStateMetrics-serviceMonitor.yaml prometheusOperator-deployment.yaml
blackboxExporter-deployment.yaml kubeStateMetrics-service.yaml prometheusOperator-networkPolicy.yaml
blackboxExporter-networkPolicy.yaml nodeExporter-clusterRoleBinding.yaml prometheusOperator-prometheusRule.yaml
blackboxExporter-serviceAccount.yaml nodeExporter-clusterRole.yaml prometheusOperator-serviceAccount.yaml
blackboxExporter-serviceMonitor.yaml nodeExporter-daemonset.yaml prometheusOperator-serviceMonitor.yaml
blackboxExporter-service.yaml nodeExporter-networkPolicy.yaml prometheusOperator-service.yaml
grafana-config.yaml nodeExporter-prometheusRule.yaml prometheus-podDisruptionBudget.yaml
grafana-dashboardDatasources.yaml nodeExporter-serviceAccount.yaml prometheus-prometheusRule.yaml
grafana-dashboardDefinitions.yaml nodeExporter-serviceMonitor.yaml prometheus-prometheus.yaml
grafana-dashboardSources.yaml nodeExporter-service.yaml prometheus-roleBindingConfig.yaml
grafana-deployment.yaml prometheusAdapter-apiService.yaml prometheus-roleBindingSpecificNamespaces.yaml
grafana-networkPolicy.yaml prometheusAdapter-clusterRoleAggregatedMetricsReader.yaml prometheus-roleConfig.yaml
grafana-prometheusRule.yaml prometheusAdapter-clusterRoleBindingDelegator.yaml prometheus-roleSpecificNamespaces.yaml
grafana-serviceAccount.yaml prometheusAdapter-clusterRoleBinding.yaml prometheus-serviceAccount.yaml
grafana-serviceMonitor.yaml prometheusAdapter-clusterRoleServerResources.yaml prometheus-serviceMonitor.yaml
grafana-service.yaml prometheusAdapter-clusterRole.yaml prometheus-service.yaml
kubePrometheus-prometheusRule.yaml prometheusAdapter-configMap.yaml setup
kubernetesControlPlane-prometheusRule.yaml prometheusAdapter-deployment.yaml
kubernetesControlPlane-serviceMonitorApiserver.yaml prometheusAdapter-networkPolicy.yaml
So the naming convention looks like {component name}-{resource kind}.yaml.
Here is the list of different components included in kube-prometheus.
setup
alertmanager
blackboxExporter
grafana
kubePrometheus
kubernetesControlPlane
kubeStateMetrics
nodeExporter
prometheus
prometheusAdapter
prometheusOperator
Let me remove setup directory as it's taken care of, and see the rest.
As stated in the repository README, here is the list of components, and let me have a look on each one from the top.
- Prometheus Operator
- Prometheus
- Alertmanager
- Node Exporter
- Prometheus Adapter
- kube-state-metrics
- Grafana
Prometheus Operator¶
These are the manifests for the service account "prometheus-operator" and what is allowed to do by this account.
- prometheusOperator-clusterRoleBinding.yaml
- prometheusOperator-clusterRole.yaml
- prometheusOperator-serviceAccount.yaml
These are the deployment and service to expose it, and the network policy to apply.
- prometheusOperator-deployment.yaml
- prometheusOperator-service.yaml
- prometheusOperator-networkPolicy.yaml
This is the PrometheusRule, alert settings, and ServiceMonitor to apply to the prometheus operator.
- prometheusOperator-prometheusRule.yaml
- prometheusOperator-serviceMonitor.yaml
Prometheus¶
These are for service account "prometheus-k8s" and its role on what's allowed to do.
- prometheus-clusterRoleBinding.yaml
- prometheus-clusterRole.yaml
- prometheus-serviceAccount.yaml
- prometheus-roleSpecificNamespaces.yaml
- prometheus-roleBindingSpecificNamespaces.yaml
- prometheus-roleConfig.yaml
- prometheus-roleBindingConfig.yaml
These are the pods and services.
- prometheus-prometheus.yaml
- prometheus-service.yaml
- prometheus-podDisruptionBudget.yaml
- prometheus-networkPolicy.yaml
These are-prometheus rules and service monitor.
- prometheus-serviceMonitor.yaml
- prometheus-prometheusRule.yaml
Alertmanager¶
This is the service account "alertmanager-main".
- alertmanager-serviceAccount.yaml
These are for pods and network policy.
- alertmanager-alertmanager.yaml
- alertmanager-service.yaml
- alertmanager-podDisruptionBudget.yaml
- alertmanager-networkPolicy.yaml
This one seems to be the config file "alertmanager.yaml".
- alertmanager-secret.yaml
And the rules and monitor files.
- alertmanager-prometheusRule.yaml
- alertmanager-serviceMonitor.yaml
node-exporter¶
These are for service account "node-exporter" and roles.
- nodeExporter-serviceAccount.yaml
- nodeExporter-clusterRoleBinding.yaml
- nodeExporter-clusterRole.yaml
These are for pods and network policy.
- nodeExporter-daemonset.yaml
- nodeExporter-service.yaml
- nodeExporter-networkPolicy.yaml
And the usual, rules and service monitor.
- nodeExporter-prometheusRule.yaml
- nodeExporter-serviceMonitor.yaml
prometheus adapter¶
These are for service account "prometheus-adapter" and roles, and delegations set for APIService.
- prometheusAdapter-serviceAccount.yaml
- prometheusAdapter-clusterRole.yaml
- prometheusAdapter-clusterRoleBinding.yaml
- prometheusAdapter-roleBindingAuthReader.yaml
- prometheusAdapter-clusterRoleAggregatedMetricsReader.yaml
- prometheusAdapter-clusterRoleBindingDelegator.yaml
- prometheusAdapter-clusterRoleServerResources.yaml
- prometheusAdapter-apiService.yaml
Pods and network policy.
- prometheusAdapter-deployment.yaml
- prometheusAdapter-configMap.yaml
- prometheusAdapter-service.yaml
- prometheusAdapter-podDisruptionBudget.yaml
- prometheusAdapter-networkPolicy.yaml
And then service monitor.
- prometheusAdapter-serviceMonitor.yaml
kube-state-metrics¶
These are for service account "kube-state-metrics" and roles.
- kubeStateMetrics-serviceAccount.yaml
- kubeStateMetrics-clusterRole.yaml
- kubeStateMetrics-clusterRoleBinding.yaml
Pods and network policy.
- kubeStateMetrics-deployment.yaml
- kubeStateMetrics-service.yaml
- kubeStateMetrics-networkPolicy.yaml
And rules and service monitoring.
- kubeStateMetrics-prometheusRule.yaml
- kubeStateMetrics-serviceMonitor.yaml
Grafana¶
Here is the service account "grafana".
- grafana-serviceAccount.yaml
Tons of kube-prometheus-builtin grafana dashboard definitions.
- grafana-dashboardDefinitions.yaml
Pods including configs to specify prometheus data source, and network policy.
- grafana-deployment.yaml
- grafana-dashboardSources.yaml
- grafana-dashboardDatasources.yaml
- grafana-config.yaml
- grafana-service.yaml
- grafana-networkPolicy.yaml
And the rules and service monitor.
- grafana-prometheusRule.yaml
- grafana-serviceMonitor.yaml
blackbox-exporter¶
Continuing on to ones not listed.
These are for service account "blackbox-exporter".
- blackboxExporter-serviceAccount.yaml
- blackboxExporter-clusterRole.yaml
- blackboxExporter-clusterRoleBinding.yaml
Pods.
- blackboxExporter-configuration.yaml
- blackboxExporter-deployment.yaml
- blackboxExporter-service.yaml
- blackboxExporter-networkPolicy.yaml
And service monitor.
- blackboxExporter-serviceMonitor.yaml
promethues rules¶
This one says it's a general rule.
- kubePrometheus-prometheusRule.yaml
service monitor¶
Service monitor for services on control plane.
- kubernetesControlPlane-prometheusRule.yaml
- kubernetesControlPlane-serviceMonitorApiserver.yaml
- kubernetesControlPlane-serviceMonitorCoreDNS.yaml
- kubernetesControlPlane-serviceMonitorKubeControllerManager.yaml
- kubernetesControlPlane-serviceMonitorKubelet.yaml
- kubernetesControlPlane-serviceMonitorKubeScheduler.yaml
installing components¶
Since the list is enormous, I will merge them by each component.
# prepare "monitoring" directory
cd {homelab repo}/infrastructure/hyper-v/controllers
mkdir monitoring
# back to the kube-prometheus repo
cd ~/repos/github.com/prometheus-operator/kube-prometheus/manifests/
# move grafana
cat grafana*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/grafana.yaml
rm grafana*.yaml
# move kube-state-metrics
cat kubeStateMetrics*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/kube-state-metrics.yaml
rm kubeStateMetrics*.yaml
# move prometheus-adapter
cat prometheusAdapter*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/prometheus-adapter.yaml
rm prometheusAdapter*.yaml
# move blackbox-exporter
cat blackboxExporter*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/blackbox-exporter.yaml
rm blackboxExporter*.yaml
# move node-exporter
cat nodeExporter*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/node-exporter.yaml
rm nodeExporter*.yaml
# move alertmanager
cat alertmanager*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/alertmanager.yaml
rm alertmanager*.yaml
# move prometheus operator
cat prometheusOperator-*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/operator.yaml
rm prometheusOperator-*.yaml
# move prometheus
cat prometheus-*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/prometheus.yaml
rm prometheus-*.yaml
# remaining rule
cat *prometheusRule.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/prometheusrule.yaml
rm *prometheusRule.yaml
# move service monitor for kubernetes
cat kubernetesControlPlane-serviceMonitor*.yaml > ~/repos/cp.blink-1x52.net/gitops/homelab/infrastructure/hyper-v/controllers/monitoring/kube-servicemonitor.yaml
rm kubernetesControlPlane-serviceMonitor*.yaml
# make sure that there is no manifest missed
# separate manifest resources
cd {homelab repo}/infrastructure/hyper-v/controllers/monitoring
sed -i '/^apiVersion/i ---' *.yaml
I don't think I had to do this... oh well. Now I update infra-controllers kustomization to include monitoring items.
.
|-kustomization.yaml
|-minio-tenant-values.yaml
|-metallb.yaml
|-gitlab-runner.yaml
|-cert-manager-values.yaml
|-cert-manager.yaml
|-minio-tenant.sh
|-metallb.sh
|-minio-operator.yaml
|-default-values
| |-minio-tenant-values.yaml
| |-cert-manager-values.yaml
| |-ngf-values.yaml
| |-metallb-values.yaml
| |-gitlab-runner-values.yaml
|-ngf-values.yaml
|-gitlab-runner.sh
|-metallb-values.yaml
|-cert-manager.sh
|-crds
| |-cert-manager-v1.14.3.yaml
| |-gateway-v1.0.0.yaml
| |-directpv-v4.0.10.yaml
| |-kube-prometheus-v0.13.yaml
|-minio-tenant.yaml
|-sops.yaml
|-gitlab-runner-values.yaml
|-ngf.yaml
|-monitoring
| |-prometheus-adapter.yaml
| |-node-exporter.yaml
| |-alertmanager.yaml
| |-kube-state-metrics.yaml
| |-prometheus.yaml
| |-grafana.yaml
| |-prometheusrule.yaml
| |-blackbox-exporter.yaml
| |-operator.yaml
|-ngf.sh
|-minio-operator.sh
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
# CRDs
- crds/gateway-v1.0.0.yaml
- crds/directpv-v4.0.10.yaml
- crds/cert-manager-v1.14.3.yaml
- crds/kube-prometheus-v0.13.yaml
# infra-controllers
- sops.yaml
- metallb.yaml
- ngf.yaml
- minio-operator.yaml
- minio-tenant.yaml
- cert-manager.yaml
- gitlab-runner.yaml
# monitoring
- monitoring/operator.yaml
- monitoring/prometheus.yaml
- monitoring/prometheus-adapter.yaml
- monitoring/prometheusrule.yaml
- monitoring/alertmanager.yaml
- monitoring/kube-state-metrics.yaml
- monitoring/node-exporter.yaml
- monitoring/blackbox-exporter.yaml
- monitoring/kube-servicemonitor.yaml
- monitoring/grafana.yaml
starting over¶
No, this was not a good idea to merge manifests so that it's easy to add components in kustomization. Still, I cannot add over 80 manifest files one by one in the kustomization resources list.
What I will do instead is to add another flux kustomization for the monitoring.
First, I clean up the monitoring directory in infra-controllers, and then create a separate monitoring directory and put all the manifests there.
# clean up what's added
rm -rf {homelab repo}/infrastructure/homelab/controllers/monitoring
mkdir {homelab repo}/infrastructure/homelab/monitoring
# back to the kube-prometheus repo
cd ~/repos/github.com/prometheus-operator/kube-prometheus/manifests
git stash
cp *.yaml {homelab repo}/infrastructure/homelab/monitoring/.
And here is the another flux ks to watch and reconcile the resources.
---
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infra-monitoring
namespace: flux-system
spec:
dependsOn:
- name: infra-controllers
interval: 1h
retryInterval: 1m
timeout: 5m
sourceRef:
kind: GitRepository
name: flux-system
path: ./infrastructure/homelab/monitoring
prune: true
installed resources¶
NAME READY STATUS RESTARTS AGE
pod/alertmanager-main-0 2/2 Running 0 3m22s
pod/alertmanager-main-1 2/2 Running 0 3m22s
pod/alertmanager-main-2 2/2 Running 0 3m22s
pod/blackbox-exporter-6cfc4bffb6-f42h8 3/3 Running 0 3m40s
pod/grafana-748964b847-fwt5p 1/1 Running 0 3m40s
pod/kube-state-metrics-6b4d48dcb4-8k4wc 3/3 Running 0 3m40s
pod/node-exporter-47flx 2/2 Running 0 3m40s
pod/node-exporter-8g88d 2/2 Running 0 3m40s
pod/node-exporter-gkqvf 2/2 Running 0 3m40s
pod/node-exporter-v9mrt 2/2 Running 0 3m40s
pod/node-exporter-xb2kq 2/2 Running 0 3m40s
pod/prometheus-adapter-79c588b474-brvs7 1/1 Running 0 3m40s
pod/prometheus-adapter-79c588b474-zwwc9 1/1 Running 0 3m40s
pod/prometheus-k8s-0 2/2 Running 0 3m21s
pod/prometheus-k8s-1 2/2 Running 0 3m21s
pod/prometheus-operator-68f6c79f9d-w2bxs 2/2 Running 0 3m40s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/alertmanager-main ClusterIP 10.110.53.41 <none> 9093/TCP,8080/TCP 3m40s
service/alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3m22s
service/blackbox-exporter ClusterIP 10.108.3.54 <none> 9115/TCP,19115/TCP 3m40s
service/grafana ClusterIP 10.101.187.130 <none> 3000/TCP 3m40s
service/kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 3m40s
service/node-exporter ClusterIP None <none> 9100/TCP 3m40s
service/prometheus-adapter ClusterIP 10.105.19.171 <none> 443/TCP 3m40s
service/prometheus-k8s ClusterIP 10.101.1.109 <none> 9090/TCP,8080/TCP 3m40s
service/prometheus-operated ClusterIP None <none> 9090/TCP 3m21s
service/prometheus-operator ClusterIP None <none> 8443/TCP 3m40s
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/node-exporter 5 5 5 5 5 kubernetes.io/os=linux 3m40s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/blackbox-exporter 1/1 1 1 3m40s
deployment.apps/grafana 1/1 1 1 3m40s
deployment.apps/kube-state-metrics 1/1 1 1 3m40s
deployment.apps/prometheus-adapter 2/2 2 2 3m40s
deployment.apps/prometheus-operator 1/1 1 1 3m40s
NAME DESIRED CURRENT READY AGE
replicaset.apps/blackbox-exporter-6cfc4bffb6 1 1 1 3m40s
replicaset.apps/grafana-748964b847 1 1 1 3m40s
replicaset.apps/kube-state-metrics-6b4d48dcb4 1 1 1 3m40s
replicaset.apps/prometheus-adapter-79c588b474 2 2 2 3m40s
replicaset.apps/prometheus-operator-68f6c79f9d 1 1 1 3m40s
NAME READY AGE
statefulset.apps/alertmanager-main 3/3 3m22s
statefulset.apps/prometheus-k8s 2/2 3m21s
GUI access¶
https://github.com/prometheus-operator/kube-prometheus/blob/main/docs/access-ui.md
There are prometheus, grafana, and alertmanager that you can access, and I am going to create gateway and httproutes for that.
One example for grafana here to add listener in the existing gateway file.
- name: https-grafana
hostname: grafana.blink-1x52.net
port: 443
protocol: HTTPS
allowedRoutes:
namespaces:
from: Selector
selector:
matchLabels:
gateway-available: yes
tls:
mode: Terminate
certificateRefs:
- name: tls-grafana-20240307
namespace: gateway
kind: Secret
And create a matching httproutes like this.
See the sectionName of the gateway "https-grafana" matches the one you defined in the gateway listener, and same goes with the hostname "grafana.blink-1x52.net".
The backend reference name "grafana" and its port matches the service.
Now, since the cluster is using calico which supports network policy, the default network policy that came with all the resource manifest files is effective, which prevents gateway to access these services in monitoring namespace. I can edit the existing network policy file by adding another ingress rule to allow access from "ngf" namespace.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
labels:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 9.5.3
name: grafana
namespace: monitoring
spec:
egress:
- {}
ingress:
- from:
- podSelector:
matchLabels:
app.kubernetes.io/name: prometheus
ports:
- port: 3000
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: ngf
ports:
- port: 3000
protocol: TCP
podSelector:
matchLabels:
app.kubernetes.io/component: grafana
app.kubernetes.io/name: grafana
app.kubernetes.io/part-of: kube-prometheus
policyTypes:
- Egress
- Ingress
Now I have access to https://grafana.blink-1x52.net. I can use the default "admin:admin" to login to set the password.
And I add similar changes for prometheus and alertmanager.
pvc for grafana¶
I'd like to have grafana remember changes I made, or favorite dashboard I set, so I am going to have PVC set to the grafana deployment. And since the PVC is coming from directpv, I also set node selector.
Below is the part of 250+ lines of grafana-deployment.yaml file. The volume "grafana-storage" is the default name used and I changed it from emptyDir to PVC. The two lines for nodeSelector were something I added.
apiVersion: apps/v1
kind: Deployment
metadata:
spec:
template:
spec:
containers:
- env: []
image: grafana/grafana:9.5.3
name: grafana
nodeSelector:
app.kubernetes.io/part-of: directpv
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
I add pvc in a separate file. I set storage class name directpv-min-io so that the requested volume gets served by directpv.
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: monitoring
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 3Gi
storageClassName: directpv-min-io
prometheus settings¶
As for persistence settings, the Prometheus kind has .spec.storage.volumeClaimTemplate
available to set pvc.
It appears that the default rentention period is 24h according to this prometheus pvc example file. I'm changing it to 48 days.
I'll mention about the additional scrape config next, but the change for that is also included.
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.46.0
name: k8s
namespace: monitoring
spec:
additionalScrapeConfigs:
name: additional-scrape-configs
key: scrape.yaml
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
enableFeatures: []
externalLabels: {}
image: quay.io/prometheus/prometheus:v2.46.0
nodeSelector:
app.kubernetes.io/part-of: directpv
podMetadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.46.0
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector: {}
replicas: 1
resources:
requests:
memory: 400Mi
retention: "48d"
ruleNamespaceSelector: {}
ruleSelector: {}
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
storage:
volumeClaimTemplate:
apiVersion: v1
kind: PersistentVolumeClaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 80Gi
storageClassName: directpv-min-io
version: 2.46.0
additional scrape config¶
https://kubectl.docs.kubernetes.io/references/kustomize/kustomization/patches/
If I have node exporter running on 192.168.1.254:9100, I can prepare scrape configuration yaml like this, turn this into a secret, and let prometheus load it.
- job_name: "exporter_outside_k8s"
static_configs:
- targets: ["192.168.1.254:9100"]
labels:
service: node-exporter
instance: node254
Generate a secret in infra-monitoring flux ks directory.
kubectl create secret generic additional-scrape-configs \
--from-file=scrape.yaml \
--namespace=monitoring \
--dry-run=client \
-oyaml >>../../monitoring/additional-scrape-config.yaml
The settings required to use the additional scrape config is already added in the prometheus manifest at .spec.additionalScrapeConfigs
.
I have custom node exporter dashboard json def to import as grafana dashboard, and since it's too long to share, I'll just skip it.
alerting¶
Login to grafana and navigate to Home > Alerting > Contact points, and add contact point. In my case I added discord webhook destination. Make sure to test it.
Then navigate to Home > Alerting > Notification policies, edit the default policy and change the destination from default empty email to the new contact point created and tested.
repository structure so far¶
I omitted lines not related to the kube-prometheus setup so the list won't be too long with 80+ kube-prometheus manifest files.
.
|-clusters
| |-homelab
| | |-infrastructure.yaml
| | |-monitoring.yaml # flux kustomization infra-monitoring
# so that I don't have to prepare a k8s kustomize with 80+ resources list
| | |-flux-system
| | | |-kustomization.yaml
| | | |-gotk-sync.yaml
| | | |-gotk-components.yaml
| | |-sops.yaml
| | |-namespace
| | | |-metallb.yaml
| | | |-cert-manager.yaml
| | | |-runner.yaml
| | | |-monitoring.yaml # monitoring namespace with label to use gateway
| | | |-minio-operator.yaml
| | | |-gateway.yaml
| | | |-minio-tenant.yaml
| | | |-ngf.yaml
|-infrastructure
| |-homelab
| | |-configs
| | | |-kustomization.yaml
| | | |-metallb-config.yaml
| | | |-issuer.yaml
| | | |-monitoring.yaml
| | | |-gateway.yaml
| | | |-minio-tenant.yaml
| | | |-scrape
| | | | |-monitoring.sh # script to convert the scrape conf file into secret and place it in infra-monitoring directory
| | | | |-scrape.yaml # prometheus scrape configuration file
| | |-controllers
| | | |-kustomization.yaml # added kube-prometheus crds
| | | |-crds
| | | | |-kube-prometheus-v0.13.yaml # kube-prometheus crds
| | |-monitoring
| | | |-grafana-networkPolicy.yaml # add ingress rule to allow access from ngf
| | | |-additional-scrape-config.yaml # scrape config secret file
| | | |-prometheus-prometheus.yaml # add pvc, nodeSelector to choose nodes with directpv, retention settings, and additional scrape config
| | | |-alertmanager-networkPolicy.yaml # add ingress rule to allow access from ngf
| | | |-pvc.yaml # directpv pvc for grafana
| | | |-prometheus-networkPolicy.yaml # add ingress rule to allow access from ngf
| | | |-grafana-deployment.yaml # change emptydir to directpv pvc, nodeSelector to choose nodes with directpv
| | | |-... and 70+ more files...