Skip to content

building homelab cluster part 7

In this part, I am going to setup monitoring system for the cluster. I am going to install kube-prometheus which comes along with grafana and other monitoring components.


install crds

As described in the quick start, the way to install kube-prometheus is to first apply everything in side ./manifests/setup directory, and then ./manifests directory.

The manifest files inside ./manifests/setup directory are all custom resource definitions, so I will merge them into a single crds file and place them in my ./infrastructure/homelab/controllers/crds directory.

# clone the repository
mkdir -p ~/repos/
cd ~/repos/
git clone
cd kube-prometheus

# confirm the version
git branch -lr

# change the version
git checkout release-0.13

# remove namespace manifest as this will be created separately
cd manifests/setup
rm namespace.yaml

# merge all the crds files
cat *.yaml > kube-prometheus-v0.13.yaml

# add '---' separator line
sed -i '/^apiVersion/i ---' kube-prometheus-v0.13.yaml

# copy
cp kube-prometheus-v0.13.yaml {homelab repo}/infrastructure/hyper-v/controllers/crds/.

Now, I changed the way I add a new namespace to add them all in ./clusters/{cluster name}/namespace so that the namespace and secret from homelab-sops repository does not get affected by trial and error adding and pulling back all the manifests in infra-controllers space.

I am planning to gain access to grafana through gateway, so I'm adding the gateway label.

apiVersion: v1
kind: Namespace
  name: monitoring
    service: monitoring
    type: infrastructure
    gateway-available: yes

I update infra-controllers kustomization to include the new kube-prometheus crds.

kind: Kustomization
  # CRDs
  - crds/gateway-v1.0.0.yaml
  - crds/directpv-v4.0.10.yaml
  - crds/cert-manager-v1.14.3.yaml
  - crds/kube-prometheus-v0.13.yaml
  # infra-controllers
  - sops.yaml
  - metallb.yaml
  - ngf.yaml
  - minio-operator.yaml
  - minio-tenant.yaml
  - cert-manager.yaml
  - gitlab-runner.yaml

Here is the result.

kubectl api-resources | grep monitoring
alertmanagerconfigs               amcfg                                            true         AlertmanagerConfig
alertmanagers                     am                                                     true         Alertmanager
podmonitors                       pmon                                                   true         PodMonitor
probes                            prb                                                    true         Probe
prometheusagents                  promagent                                        true         PrometheusAgent
prometheuses                      prom                                                   true         Prometheus
prometheusrules                   promrule                                               true         PrometheusRule
scrapeconfigs                     scfg                                             true         ScrapeConfig
servicemonitors                   smon                                                   true         ServiceMonitor
thanosrulers                      ruler                                                  true         ThanosRuler

resource manifests

Now, there are tons of files in the ./manifests directory of the kube-prometheus repo.

ls ~/repos/
alertmanager-alertmanager.yaml                       kubernetesControlPlane-serviceMonitorCoreDNS.yaml                prometheusAdapter-podDisruptionBudget.yaml
alertmanager-networkPolicy.yaml                      kubernetesControlPlane-serviceMonitorKubeControllerManager.yaml  prometheusAdapter-roleBindingAuthReader.yaml
alertmanager-podDisruptionBudget.yaml                kubernetesControlPlane-serviceMonitorKubelet.yaml                prometheusAdapter-serviceAccount.yaml
alertmanager-prometheusRule.yaml                     kubernetesControlPlane-serviceMonitorKubeScheduler.yaml          prometheusAdapter-serviceMonitor.yaml
alertmanager-secret.yaml                             kubeStateMetrics-clusterRoleBinding.yaml                         prometheusAdapter-service.yaml
alertmanager-serviceAccount.yaml                     kubeStateMetrics-clusterRole.yaml                                prometheus-clusterRoleBinding.yaml
alertmanager-serviceMonitor.yaml                     kubeStateMetrics-deployment.yaml                                 prometheus-clusterRole.yaml
alertmanager-service.yaml                            kubeStateMetrics-networkPolicy.yaml                              prometheus-networkPolicy.yaml
blackboxExporter-clusterRoleBinding.yaml             kubeStateMetrics-prometheusRule.yaml                             prometheusOperator-clusterRoleBinding.yaml
blackboxExporter-clusterRole.yaml                    kubeStateMetrics-serviceAccount.yaml                             prometheusOperator-clusterRole.yaml
blackboxExporter-configuration.yaml                  kubeStateMetrics-serviceMonitor.yaml                             prometheusOperator-deployment.yaml
blackboxExporter-deployment.yaml                     kubeStateMetrics-service.yaml                                    prometheusOperator-networkPolicy.yaml
blackboxExporter-networkPolicy.yaml                  nodeExporter-clusterRoleBinding.yaml                             prometheusOperator-prometheusRule.yaml
blackboxExporter-serviceAccount.yaml                 nodeExporter-clusterRole.yaml                                    prometheusOperator-serviceAccount.yaml
blackboxExporter-serviceMonitor.yaml                 nodeExporter-daemonset.yaml                                      prometheusOperator-serviceMonitor.yaml
blackboxExporter-service.yaml                        nodeExporter-networkPolicy.yaml                                  prometheusOperator-service.yaml
grafana-config.yaml                                  nodeExporter-prometheusRule.yaml                                 prometheus-podDisruptionBudget.yaml
grafana-dashboardDatasources.yaml                    nodeExporter-serviceAccount.yaml                                 prometheus-prometheusRule.yaml
grafana-dashboardDefinitions.yaml                    nodeExporter-serviceMonitor.yaml                                 prometheus-prometheus.yaml
grafana-dashboardSources.yaml                        nodeExporter-service.yaml                                        prometheus-roleBindingConfig.yaml
grafana-deployment.yaml                              prometheusAdapter-apiService.yaml                                prometheus-roleBindingSpecificNamespaces.yaml
grafana-networkPolicy.yaml                           prometheusAdapter-clusterRoleAggregatedMetricsReader.yaml        prometheus-roleConfig.yaml
grafana-prometheusRule.yaml                          prometheusAdapter-clusterRoleBindingDelegator.yaml               prometheus-roleSpecificNamespaces.yaml
grafana-serviceAccount.yaml                          prometheusAdapter-clusterRoleBinding.yaml                        prometheus-serviceAccount.yaml
grafana-serviceMonitor.yaml                          prometheusAdapter-clusterRoleServerResources.yaml                prometheus-serviceMonitor.yaml
grafana-service.yaml                                 prometheusAdapter-clusterRole.yaml                               prometheus-service.yaml
kubePrometheus-prometheusRule.yaml                   prometheusAdapter-configMap.yaml                                 setup
kubernetesControlPlane-prometheusRule.yaml           prometheusAdapter-deployment.yaml
kubernetesControlPlane-serviceMonitorApiserver.yaml  prometheusAdapter-networkPolicy.yaml

So the naming convention looks like {component name}-{resource kind}.yaml.

Here is the list of different components included in kube-prometheus.

ls -1 | cut -d- -f1 | sort | uniq

Let me remove setup directory as it's taken care of, and see the rest.

As stated in the repository README, here is the list of components, and let me have a look on each one from the top.

  • Prometheus Operator
  • Prometheus
  • Alertmanager
  • Node Exporter
  • Prometheus Adapter
  • kube-state-metrics
  • Grafana

Prometheus Operator

These are the manifests for the service account "prometheus-operator" and what is allowed to do by this account.

  • prometheusOperator-clusterRoleBinding.yaml
  • prometheusOperator-clusterRole.yaml
  • prometheusOperator-serviceAccount.yaml

These are the deployment and service to expose it, and the network policy to apply.

  • prometheusOperator-deployment.yaml
  • prometheusOperator-service.yaml
  • prometheusOperator-networkPolicy.yaml

This is the PrometheusRule, alert settings, and ServiceMonitor to apply to the prometheus operator.

  • prometheusOperator-prometheusRule.yaml
  • prometheusOperator-serviceMonitor.yaml


These are for service account "prometheus-k8s" and its role on what's allowed to do.

  • prometheus-clusterRoleBinding.yaml
  • prometheus-clusterRole.yaml
  • prometheus-serviceAccount.yaml
  • prometheus-roleSpecificNamespaces.yaml
  • prometheus-roleBindingSpecificNamespaces.yaml
  • prometheus-roleConfig.yaml
  • prometheus-roleBindingConfig.yaml

These are the pods and services.

  • prometheus-prometheus.yaml
  • prometheus-service.yaml
  • prometheus-podDisruptionBudget.yaml
  • prometheus-networkPolicy.yaml

These are-prometheus rules and service monitor.

  • prometheus-serviceMonitor.yaml
  • prometheus-prometheusRule.yaml


This is the service account "alertmanager-main".

  • alertmanager-serviceAccount.yaml

These are for pods and network policy.

  • alertmanager-alertmanager.yaml
  • alertmanager-service.yaml
  • alertmanager-podDisruptionBudget.yaml
  • alertmanager-networkPolicy.yaml

This one seems to be the config file "alertmanager.yaml".

  • alertmanager-secret.yaml

And the rules and monitor files.

  • alertmanager-prometheusRule.yaml
  • alertmanager-serviceMonitor.yaml


These are for service account "node-exporter" and roles.

  • nodeExporter-serviceAccount.yaml
  • nodeExporter-clusterRoleBinding.yaml
  • nodeExporter-clusterRole.yaml

These are for pods and network policy.

  • nodeExporter-daemonset.yaml
  • nodeExporter-service.yaml
  • nodeExporter-networkPolicy.yaml

And the usual, rules and service monitor.

  • nodeExporter-prometheusRule.yaml
  • nodeExporter-serviceMonitor.yaml

prometheus adapter

These are for service account "prometheus-adapter" and roles, and delegations set for APIService.

  • prometheusAdapter-serviceAccount.yaml
  • prometheusAdapter-clusterRole.yaml
  • prometheusAdapter-clusterRoleBinding.yaml
  • prometheusAdapter-roleBindingAuthReader.yaml
  • prometheusAdapter-clusterRoleAggregatedMetricsReader.yaml
  • prometheusAdapter-clusterRoleBindingDelegator.yaml
  • prometheusAdapter-clusterRoleServerResources.yaml
  • prometheusAdapter-apiService.yaml

Pods and network policy.

  • prometheusAdapter-deployment.yaml
  • prometheusAdapter-configMap.yaml
  • prometheusAdapter-service.yaml
  • prometheusAdapter-podDisruptionBudget.yaml
  • prometheusAdapter-networkPolicy.yaml

And then service monitor.

  • prometheusAdapter-serviceMonitor.yaml


These are for service account "kube-state-metrics" and roles.

  • kubeStateMetrics-serviceAccount.yaml
  • kubeStateMetrics-clusterRole.yaml
  • kubeStateMetrics-clusterRoleBinding.yaml

Pods and network policy.

  • kubeStateMetrics-deployment.yaml
  • kubeStateMetrics-service.yaml
  • kubeStateMetrics-networkPolicy.yaml

And rules and service monitoring.

  • kubeStateMetrics-prometheusRule.yaml
  • kubeStateMetrics-serviceMonitor.yaml


Here is the service account "grafana".

  • grafana-serviceAccount.yaml

Tons of kube-prometheus-builtin grafana dashboard definitions.

  • grafana-dashboardDefinitions.yaml

Pods including configs to specify prometheus data source, and network policy.

  • grafana-deployment.yaml
  • grafana-dashboardSources.yaml
  • grafana-dashboardDatasources.yaml
  • grafana-config.yaml
  • grafana-service.yaml
  • grafana-networkPolicy.yaml

And the rules and service monitor.

  • grafana-prometheusRule.yaml
  • grafana-serviceMonitor.yaml


Continuing on to ones not listed.

These are for service account "blackbox-exporter".

  • blackboxExporter-serviceAccount.yaml
  • blackboxExporter-clusterRole.yaml
  • blackboxExporter-clusterRoleBinding.yaml


  • blackboxExporter-configuration.yaml
  • blackboxExporter-deployment.yaml
  • blackboxExporter-service.yaml
  • blackboxExporter-networkPolicy.yaml

And service monitor.

  • blackboxExporter-serviceMonitor.yaml

promethues rules

This one says it's a general rule.

  • kubePrometheus-prometheusRule.yaml

service monitor

Service monitor for services on control plane.

  • kubernetesControlPlane-prometheusRule.yaml
  • kubernetesControlPlane-serviceMonitorApiserver.yaml
  • kubernetesControlPlane-serviceMonitorCoreDNS.yaml
  • kubernetesControlPlane-serviceMonitorKubeControllerManager.yaml
  • kubernetesControlPlane-serviceMonitorKubelet.yaml
  • kubernetesControlPlane-serviceMonitorKubeScheduler.yaml

installing components

Since the list is enormous, I will merge them by each component.

# prepare "monitoring" directory
cd {homelab repo}/infrastructure/hyper-v/controllers
mkdir monitoring

# back to the kube-prometheus repo
cd ~/repos/

# move grafana
cat grafana*.yaml > ~/repos/
rm grafana*.yaml

# move kube-state-metrics
cat kubeStateMetrics*.yaml > ~/repos/
rm kubeStateMetrics*.yaml

# move prometheus-adapter
cat prometheusAdapter*.yaml > ~/repos/
rm prometheusAdapter*.yaml

# move blackbox-exporter
cat blackboxExporter*.yaml > ~/repos/
rm blackboxExporter*.yaml

# move node-exporter
cat nodeExporter*.yaml > ~/repos/
rm nodeExporter*.yaml

# move alertmanager
cat alertmanager*.yaml > ~/repos/
rm alertmanager*.yaml

# move prometheus operator
cat prometheusOperator-*.yaml > ~/repos/
rm prometheusOperator-*.yaml

# move prometheus
cat prometheus-*.yaml > ~/repos/
rm prometheus-*.yaml

# remaining rule
cat *prometheusRule.yaml > ~/repos/
rm *prometheusRule.yaml

# move service monitor for kubernetes
cat kubernetesControlPlane-serviceMonitor*.yaml > ~/repos/
rm kubernetesControlPlane-serviceMonitor*.yaml

# make sure that there is no manifest missed

# separate manifest resources
cd {homelab repo}/infrastructure/hyper-v/controllers/monitoring
sed -i '/^apiVersion/i ---' *.yaml

I don't think I had to do this... oh well. Now I update infra-controllers kustomization to include monitoring items.

 | |-minio-tenant-values.yaml
 | |-cert-manager-values.yaml
 | |-ngf-values.yaml
 | |-metallb-values.yaml
 | |-gitlab-runner-values.yaml
 | |-cert-manager-v1.14.3.yaml
 | |-gateway-v1.0.0.yaml
 | |-directpv-v4.0.10.yaml
 | |-kube-prometheus-v0.13.yaml
 | |-prometheus-adapter.yaml
 | |-node-exporter.yaml
 | |-alertmanager.yaml
 | |-kube-state-metrics.yaml
 | |-prometheus.yaml
 | |-grafana.yaml
 | |-prometheusrule.yaml
 | |-blackbox-exporter.yaml
 | |-operator.yaml
kind: Kustomization
  # CRDs
  - crds/gateway-v1.0.0.yaml
  - crds/directpv-v4.0.10.yaml
  - crds/cert-manager-v1.14.3.yaml
  - crds/kube-prometheus-v0.13.yaml
  # infra-controllers
  - sops.yaml
  - metallb.yaml
  - ngf.yaml
  - minio-operator.yaml
  - minio-tenant.yaml
  - cert-manager.yaml
  - gitlab-runner.yaml
  # monitoring
  - monitoring/operator.yaml
  - monitoring/prometheus.yaml
  - monitoring/prometheus-adapter.yaml
  - monitoring/prometheusrule.yaml
  - monitoring/alertmanager.yaml
  - monitoring/kube-state-metrics.yaml
  - monitoring/node-exporter.yaml
  - monitoring/blackbox-exporter.yaml
  - monitoring/kube-servicemonitor.yaml
  - monitoring/grafana.yaml

starting over

No, this was not a good idea to merge manifests so that it's easy to add components in kustomization. Still, I cannot add over 80 manifest files one by one in the kustomization resources list.

What I will do instead is to add another flux kustomization for the monitoring.

First, I clean up the monitoring directory in infra-controllers, and then create a separate monitoring directory and put all the manifests there.

# clean up what's added
rm -rf {homelab repo}/infrastructure/homelab/controllers/monitoring
mkdir {homelab repo}/infrastructure/homelab/monitoring

# back to the kube-prometheus repo
cd ~/repos/
git stash
cp *.yaml {homelab repo}/infrastructure/homelab/monitoring/.

And here is the another flux ks to watch and reconcile the resources.

kind: Kustomization
  name: infra-monitoring
  namespace: flux-system
    - name: infra-controllers
  interval: 1h
  retryInterval: 1m
  timeout: 5m
    kind: GitRepository
    name: flux-system
  path: ./infrastructure/homelab/monitoring
  prune: true

installed resources

kubectl -n monitoring get all
NAME                                       READY   STATUS    RESTARTS   AGE
pod/alertmanager-main-0                    2/2     Running   0          3m22s
pod/alertmanager-main-1                    2/2     Running   0          3m22s
pod/alertmanager-main-2                    2/2     Running   0          3m22s
pod/blackbox-exporter-6cfc4bffb6-f42h8     3/3     Running   0          3m40s
pod/grafana-748964b847-fwt5p               1/1     Running   0          3m40s
pod/kube-state-metrics-6b4d48dcb4-8k4wc    3/3     Running   0          3m40s
pod/node-exporter-47flx                    2/2     Running   0          3m40s
pod/node-exporter-8g88d                    2/2     Running   0          3m40s
pod/node-exporter-gkqvf                    2/2     Running   0          3m40s
pod/node-exporter-v9mrt                    2/2     Running   0          3m40s
pod/node-exporter-xb2kq                    2/2     Running   0          3m40s
pod/prometheus-adapter-79c588b474-brvs7    1/1     Running   0          3m40s
pod/prometheus-adapter-79c588b474-zwwc9    1/1     Running   0          3m40s
pod/prometheus-k8s-0                       2/2     Running   0          3m21s
pod/prometheus-k8s-1                       2/2     Running   0          3m21s
pod/prometheus-operator-68f6c79f9d-w2bxs   2/2     Running   0          3m40s

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
service/alertmanager-main       ClusterIP     <none>        9093/TCP,8080/TCP            3m40s
service/alertmanager-operated   ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   3m22s
service/blackbox-exporter       ClusterIP      <none>        9115/TCP,19115/TCP           3m40s
service/grafana                 ClusterIP   <none>        3000/TCP                     3m40s
service/kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP            3m40s
service/node-exporter           ClusterIP   None             <none>        9100/TCP                     3m40s
service/prometheus-adapter      ClusterIP    <none>        443/TCP                      3m40s
service/prometheus-k8s          ClusterIP     <none>        9090/TCP,8080/TCP            3m40s
service/prometheus-operated     ClusterIP   None             <none>        9090/TCP                     3m21s
service/prometheus-operator     ClusterIP   None             <none>        8443/TCP                     3m40s

NAME                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
daemonset.apps/node-exporter   5         5         5       5            5    3m40s

NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/blackbox-exporter     1/1     1            1           3m40s
deployment.apps/grafana               1/1     1            1           3m40s
deployment.apps/kube-state-metrics    1/1     1            1           3m40s
deployment.apps/prometheus-adapter    2/2     2            2           3m40s
deployment.apps/prometheus-operator   1/1     1            1           3m40s

NAME                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/blackbox-exporter-6cfc4bffb6     1         1         1       3m40s
replicaset.apps/grafana-748964b847               1         1         1       3m40s
replicaset.apps/kube-state-metrics-6b4d48dcb4    1         1         1       3m40s
replicaset.apps/prometheus-adapter-79c588b474    2         2         2       3m40s
replicaset.apps/prometheus-operator-68f6c79f9d   1         1         1       3m40s

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   3/3     3m22s
statefulset.apps/prometheus-k8s      2/2     3m21s

GUI access

There are prometheus, grafana, and alertmanager that you can access, and I am going to create gateway and httproutes for that.

One example for grafana here to add listener in the existing gateway file.

- name: https-grafana
  port: 443
  protocol: HTTPS
      from: Selector
          gateway-available: yes
    mode: Terminate
      - name: tls-grafana-20240307
        namespace: gateway
        kind: Secret

And create a matching httproutes like this.

See the sectionName of the gateway "https-grafana" matches the one you defined in the gateway listener, and same goes with the hostname "".

The backend reference name "grafana" and its port matches the service.

kind: HTTPRoute
  name: grafana
  namespace: monitoring
    - name: gateway
      sectionName: https-grafana
      namespace: gateway
    - ""
    - matches:
        - path:
            type: PathPrefix
            value: /
        - name: grafana
          port: 3000
kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       ClusterIP     <none>        9093/TCP,8080/TCP            111m
alertmanager-operated   ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   111m
blackbox-exporter       ClusterIP      <none>        9115/TCP,19115/TCP           111m
grafana                 ClusterIP   <none>        3000/TCP                     111m
kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP            111m
node-exporter           ClusterIP   None             <none>        9100/TCP                     111m
prometheus-adapter      ClusterIP    <none>        443/TCP                      111m
prometheus-k8s          ClusterIP     <none>        9090/TCP,8080/TCP            111m
prometheus-operated     ClusterIP   None             <none>        9090/TCP                     111m
prometheus-operator     ClusterIP   None             <none>        8443/TCP                     111m

Now, since the cluster is using calico which supports network policy, the default network policy that came with all the resource manifest files is effective, which prevents gateway to access these services in monitoring namespace. I can edit the existing network policy file by adding another ingress rule to allow access from "ngf" namespace.

kind: NetworkPolicy
  labels: grafana grafana kube-prometheus 9.5.3
  name: grafana
  namespace: monitoring
    - {}
    - from:
        - podSelector:
        - port: 3000
          protocol: TCP
    - from:
        - namespaceSelector:
        - port: 3000
          protocol: TCP
    matchLabels: grafana grafana kube-prometheus
    - Egress
    - Ingress

Now I have access to I can use the default "admin:admin" to login to set the password.

And I add similar changes for prometheus and alertmanager.

pvc for grafana

I'd like to have grafana remember changes I made, or favorite dashboard I set, so I am going to have PVC set to the grafana deployment. And since the PVC is coming from directpv, I also set node selector.

Below is the part of 250+ lines of grafana-deployment.yaml file. The volume "grafana-storage" is the default name used and I changed it from emptyDir to PVC. The two lines for nodeSelector were something I added.

apiVersion: apps/v1
kind: Deployment
        - env: []
          image: grafana/grafana:9.5.3
          name: grafana
      nodeSelector: directpv
        - name: grafana-storage
            claimName: grafana-pvc

I add pvc in a separate file. I set storage class name directpv-min-io so that the requested volume gets served by directpv.

apiVersion: v1
kind: PersistentVolumeClaim
  name: grafana-pvc
  namespace: monitoring
    - ReadWriteOnce
  volumeMode: Filesystem
      storage: 3Gi
  storageClassName: directpv-min-io

prometheus settings

As for persistence settings, the Prometheus kind has available to set pvc.

It appears that the default rentention period is 24h according to this prometheus pvc example file. I'm changing it to 48 days.

I'll mention about the additional scrape config next, but the change for that is also included.

kind: Prometheus
  labels: prometheus k8s prometheus kube-prometheus 2.46.0
  name: k8s
  namespace: monitoring
    name: additional-scrape-configs
    key: scrape.yaml
      - apiVersion: v2
        name: alertmanager-main
        namespace: monitoring
        port: web
  enableFeatures: []
  externalLabels: {}
  nodeSelector: directpv
    labels: prometheus k8s prometheus kube-prometheus 2.46.0
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 1
      memory: 400Mi
  retention: "48d"
  ruleNamespaceSelector: {}
  ruleSelector: {}
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
      apiVersion: v1
      kind: PersistentVolumeClaim
          - ReadWriteOnce
            storage: 80Gi
        storageClassName: directpv-min-io
  version: 2.46.0

additional scrape config

If I have node exporter running on, I can prepare scrape configuration yaml like this, turn this into a secret, and let prometheus load it.

- job_name: "exporter_outside_k8s"
    - targets: [""]
        service: node-exporter
        instance: node254

Generate a secret in infra-monitoring flux ks directory.

kubectl create secret generic additional-scrape-configs \
    --from-file=scrape.yaml \
    --namespace=monitoring \
    --dry-run=client \
    -oyaml >>../../monitoring/additional-scrape-config.yaml

The settings required to use the additional scrape config is already added in the prometheus manifest at .spec.additionalScrapeConfigs.

I have custom node exporter dashboard json def to import as grafana dashboard, and since it's too long to share, I'll just skip it.


Login to grafana and navigate to Home > Alerting > Contact points, and add contact point. In my case I added discord webhook destination. Make sure to test it.

Then navigate to Home > Alerting > Notification policies, edit the default policy and change the destination from default empty email to the new contact point created and tested.

repository structure so far

I omitted lines not related to the kube-prometheus setup so the list won't be too long with 80+ kube-prometheus manifest files.

gitops/homelab repository
 | |-homelab
 | | |-infrastructure.yaml
 | | |-monitoring.yaml         # flux kustomization infra-monitoring
                               # so that I don't have to prepare a k8s kustomize with 80+ resources list
 | | |-flux-system
 | | | |-kustomization.yaml
 | | | |-gotk-sync.yaml
 | | | |-gotk-components.yaml
 | | |-sops.yaml
 | | |-namespace
 | | | |-metallb.yaml
 | | | |-cert-manager.yaml
 | | | |-runner.yaml
 | | | |-monitoring.yaml       # monitoring namespace with label to use gateway
 | | | |-minio-operator.yaml
 | | | |-gateway.yaml
 | | | |-minio-tenant.yaml
 | | | |-ngf.yaml
 | |-homelab
 | | |-configs
 | | | |-kustomization.yaml
 | | | |-metallb-config.yaml
 | | | |-issuer.yaml
 | | | |-monitoring.yaml
 | | | |-gateway.yaml
 | | | |-minio-tenant.yaml
 | | | |-scrape
 | | | | |       # script to convert the scrape conf file into secret and place it in infra-monitoring directory
 | | | | |-scrape.yaml         # prometheus scrape configuration file
 | | |-controllers
 | | | |-kustomization.yaml               # added kube-prometheus crds
 | | | |-crds
 | | | | |-kube-prometheus-v0.13.yaml     # kube-prometheus crds
 | | |-monitoring
 | | | |-grafana-networkPolicy.yaml       # add ingress rule to allow access from ngf
 | | | |-additional-scrape-config.yaml    # scrape config secret file
 | | | |-prometheus-prometheus.yaml       # add pvc, nodeSelector to choose nodes with directpv, retention settings, and additional scrape config
 | | | |-alertmanager-networkPolicy.yaml  # add ingress rule to allow access from ngf
 | | | |-pvc.yaml                         # directpv pvc for grafana
 | | | |-prometheus-networkPolicy.yaml    # add ingress rule to allow access from ngf
 | | | |-grafana-deployment.yaml          # change emptydir to directpv pvc, nodeSelector to choose nodes with directpv
 | | | |-... and 70+ more files...