snmp monitor #monitoring-system/todo
Table of Content
snmp monitor #monitoring-system/todo¶
Here I am documenting on how to setup a monitoring system using open source software.
- snmp_exporter
- gets snmp metrics from endpoints
- prometheus
- stores the metric data
- grafana
- uses prometheus as data source and can be used to explore and visualize obtained metric data
Note that this is somewhat different from what is covered in the other page [[building-homelab-cluster-part-7]] using kube-prometheus stack. This page covers how to setup a snmp monitoring system from scratch on docker containers.
---
title: monitoring system
---
graph LR
User -- web UI access --> grafana[grafana:3000] -- data source --> prometheus[prometheus:9090] -- obtain snmp metric data --> snmp[snmp_exporter:9100]
Items to be covered are...¶
- explanation of the monitoring system to be setup, with words and illustrations
- multiple remote devices to monitor in multiple sites
- monitoring system in one location
- well-known technologies involved in the monitoring system
- snmp v2c
- this section should be very simple and short
- docker, because the monitoring system will be running as docker containers using docker compose
- this section should be very simple and short too
- snmp v2c
- components of the monitoring system covered in this document
- snmp agent, responder on the managed device
- prometheus snmp exporter, a snmp manager which collects metric data from the managed devices
- promethues, an open source monitoring toolkit which scrapes metric data working with snmp manager and stores them
- grafana, an open source visualization platform with rich features including alerting
- building the monitoring system from ground up as a single docker compose
- snmp agent, a simple example using Cisco IOS configuration
- snmp manager, the prometheus snmp exporter
- first explain about the ideas of the scrape target to be defined in the prometheus configuration file and the modules to be defined in the snmp exporter configuration file
- illustrate this!
- and explain in which order the document is going to cover
- briefly go through how to generate the default
snmp.yml
file using thegenerator
available in the official repository- add a note that the snmp manager customization on which metric to obtain must happen here
- start with the default
if_mib
module, and add a section later about the customization
- first explain about the ideas of the scrape target to be defined in the prometheus configuration file and the modules to be defined in the snmp exporter configuration file
- promethues
- scrape config in prometheus.yml
- use
file_sd_config
to specify the monitor targets with necessary grouping and labeling
- use
- basic health check
- navigate to "status > targets" to see the list of monitor targets
- navigate to "status > tsdb status" to see the activity of the server by observing the numbers counting up
- scrape config in prometheus.yml
- grafana
- initial login credentials
- mention about the administrator section and note about adding teams and users as necessary
- initial login credentials
- how to setup and use grafana
- adding prometheus as the data source
- explore, maybe use
up
metric just to see things working - create a dashboard using the
up
metric in the gauge format- preferably also create a dashboard for the interface utilization, introducing the equation to use, to show how to come up with a complex condition for monitoring and even firing an alert
- configure notification settings
- create an alert using discord webhook for example
- how to add monitoring targets
- add a note that there is no need to restart promethues server if you are just adding lines in the existing
file_sd_configs
target files
- add a note that there is no need to restart promethues server if you are just adding lines in the existing
- how to add a monitoring metric data to obtain
- briefly touch upon OID and MIB
- manually test the target OID to a test device using
snmpwalk
- add a new module in the
generator.yml
and also prepare required MIB file, and run the generator to generate thesnmp.yml
file - replace the
snmp.yml
file and restart the snmp exporter container - manually test the new module using
curl
to the snmp exporter - add a new test job in the scrape config on
prometheus.yml
and restart the prometheus container - check the target status on the prometheus web UI
- revise the
prometheus.yml
configuration file along with the target yaml files as necessary- do not forget to show example!
- issues I encountered
- scrape intervals and timeouts
- sample size
docker compose¶
In short, I put all three services above in a single docker compose, creating volumes for data persistency and placing and mounting configuration files to specify what data to obtain from which targets.
Here is the docker compose file with image tags as of Q2 2024.
services:
prometheus:
image: quay.io/prometheus/prometheus:v2.52.0
container_name: prometheus
ports:
- "9090:9090"
volumes:
- prometheus-data:/prometheus
- ./snmp_target:/etc/prometheus/file_sd_config
- type: bind
source: ./config/prometheus.yml
target: /etc/prometheus/prometheus.yml
read_only: true
command:
- --web.enable-remote-write-receiver
- --config.file=/etc/prometheus/prometheus.yml
snmp:
image: quay.io/prometheus/snmp-exporter:v0.26.0
container_name: snmp
ports:
- "9116:9116"
volumes:
- type: bind
source: ./config/snmp.yml
target: /etc/snmp_exporter/snmp.yml
read_only: true
grafana:
image: grafana/grafana:11.0.0
container_name: grafana
ports:
- 3000:3000
environment:
- GF_SECURITY_ADMIN_USER=your_admin_username_here
- GF_SECURITY_ADMIN_PASSWORD=your_admin_password_here
volumes:
- grafana-data:/var/lib/grafana
volumes:
prometheus-data: {}
grafana-data: {}
snmp_exporter¶
https://github.com/prometheus/snmp_exporter
This snmp_exporter is the one that actually sends snmp requests to monitoring target nodes. It's capable of snmpv3, but here I will just cover snmpv3c.
As you can see in the docker compose file, it's listening on tcp port 9116, and the prometheus server sends its requests there to have snmp_exporter run snmp get requests and to receive the collected metric data.
You have to specify the target, module, and auth parameters when sending a request. The target is straightforward. The module and auth are something defined in the snmp_exporter configuration file and will be covered shortly.
curl "http://{snmp_exporter}:9116/snmp?module={module_name}&auth={auth_name}&target={ipaddr}"
# snmp_exporter: in this docker compose example above, it's going to "snmp"
# and if trying it out from outside the docker containers, it's going to be the IP address of the server
# and of course, "localhost" would work if sending the request from the host machine running this docker compose
# module_name: name of the module defined in the snmp_exporter configuration file
# auth_name: snmpv2c community string defined in the snmp_exporter configuration file
# ipaddr: the monitoring target to send snmp get request to
# example if trying it out on the server running this docker compose
curl "http://localhost:9116/snmp?module=ifModule&auth=jGc9SRTxNNH4pZk&target=192.168.200.1"
If you have the configuration file with the authentication group "jGc9SRTxNNH4pZk" and module "ifModule" to get metrics such as ifOperStatus (interface operational status), ifAdminStatus (interface admin status), and etc. set, the snmp_exporter sends relevant snmp get requests to get those metrics using the community string "jGc9SRTxNNH4pZk" against the target node 192.168.200.1.
The node being monitor should be configured something like this btw to allow others to get the snmp data on the system.
snmp-server community jGc9SRTxNNH4pZk RO
snmp-server host {ipaddr_of_the_server_running_snmp_exporter_container} version 2c jGc9SRTxNNH4pZk
---
title: snmp
---
graph LR
prometheus -- request --> snmp[snmp_exporter:9116] -- snmp request --> t[monitor target:161/udp]
snmp and OID¶
There are countless of different data you can obtain using snmp, and every one of those different metrics are stored in different addresses in the snmp system of the node being monitored.
The address of different snmp data store location is called OID. When you want to see the system description, you request for the information specifying the OID 1.3.6.1.2.1.1.1
. When you want to see the system uptime, you request for the information at OID 1.3.6.1.2.1.1.3
.
Here are some examples of manually obtaining snmp data using snmpwalk
.
$ snmpwalk -v 2c -c public localhost .1.3.6.1.2.1.1.1
iso.3.6.1.2.1.1.1.0 = STRING: "Linux d12 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64"
$ snmpwalk -v 2c -c public localhost .1.3.6.1.2.1.1.3
iso.3.6.1.2.1.1.3.0 = Timeticks: (183916) 0:30:39.16
OID and MIB¶
OID is in hierarchy like DNS where everything starts at the root .
. Try this website for example, or any other website hosting OID information, you can dig first down to .1
, and then .1.3
, and all the way down to .1.3.6.1.2.1
here, there are .1.3.6.1.2.1.1
for "system", .1.3.6.1.2.1.2
for "interfaces", and so on.
Have a look at "system", .1.3.6.1.2.1.1, and you'll see that the two OIDs mentioned earlier are "sysDescr" at .1.3.6.1.2.1.1.1
and "sysUpTime" at .1.3.6.1.2.1.1.3
. The monitoring systems can be configured to specify the target metric by OID or the name of the OID by importing the relevant dictionaries. Such a dictionary file containing the OID definitions is MIB file.
module and auth in snmp_exporter¶
Ultimately, this is all you have to do to monitor a system.
- decide what data you want to collect
- identify the OID of the data
- send snmp request to the target device to obtain the data
Now, of course it's too much of a burden to maintain and execute, say, 30 lines of snmpwalk
to collect interesting metric data for 10 devices each in 30 different office locations (9,000 lines...????!).
You also have to consider there may be different set of metrics (OIDs) to monitor for different roles of devices. Some device model offer fan rpm where some doesn't. On some network devices you may want to monitor BGP session status in addition to standard interface metrics.
Let's say you want a module named "systemModule" to cover metrics "sysDescr" and "sysUpTime", and another one named "ifModule" to cover metrics "ifOperStatus" and "ifAdminStatus", below is something you would see in the snmp_exporter configuration file.
Now you can run curl "http://localhost:9116/snmp?module=systemModule&auth=jGc9SRTxNNH4pZk&target=192.168.200.1"
to get the system description and system uptime of a node 192.168.200.1 through snmp_exporter. The other "ifModule" also works too.
Still too complicated as there is no way to write up this configuration file from scratch, right? As the comment line is saying, there is a generator tool available to generate the snmp_exporter configuration file.
# WARNING: This file was auto-generated using snmp_exporter generator, manual changes will be lost.
auths:
jGc9SRTxNNH4pZk:
community: jGc9SRTxNNH4pZk
security_level: noAuthNoPriv
auth_protocol: MD5
priv_protocol: DES
version: 2
modules:
systemModule:
walk:
- 1.3.6.1.2.1.31.1.1
get:
- 1.3.6.1.2.1.1.1.0
- 1.3.6.1.2.1.1.3.0
- 1.3.6.1.2.1.1.5.0
- 1.3.6.1.2.1.1.6.0
metrics:
- name: sysDescr
oid: 1.3.6.1.2.1.1.1
type: DisplayString
help: A textual description of the entity - 1.3.6.1.2.1.1.1
- name: sysUpTime
oid: 1.3.6.1.2.1.1.3
type: gauge
help:
The time (in hundredths of a second) since the network management portion
of the system was last re-initialized. - 1.3.6.1.2.1.1.3
ifModule:
walk:
get:
metrics:
- name: ifOperStatus
oid: 1.3.6.1.2.1.2.2.1.8
type: gauge
help: The current operational state of the interface - 1.3.6.1.2.1.2.2.1.8
indexes:
- labelname: ifIndex
type: gauge
lookups:
- labels:
- ifIndex
labelname: ifAlias
oid: 1.3.6.1.2.1.31.1.1.1.18
type: DisplayString
- labels:
- ifIndex
labelname: ifDescr
oid: 1.3.6.1.2.1.2.2.1.2
type: DisplayString
- labels:
- ifIndex
labelname: ifName
oid: 1.3.6.1.2.1.31.1.1.1.1
type: DisplayString
- labels:
- ifIndex
labelname: ifPhysAddress
oid: 1.3.6.1.2.1.2.2.1.6
type: PhysAddress48
enum_values:
1: up
2: down
3: testing
4: unknown
5: dormant
6: notPresent
7: lowerLayerDown
generator for snmp_exporter¶
The default configuration is available as mentioned here on the snmp_exporter github repo, but to prepare different modules to meet your own need, there is a generator tool you can use to generate the snmp_exporter configuration file from a simpler "generator.yml" file by following the format explained here.
For example, if there are Cisco IOS, NXOS, and ASA devices, and also NEC IX devices, you might just use the default "if_mib" module available in the example "generator.yml" file, and add something like below to also capture CPU and memory utilization data.
Generated "snmp.yml" configuration file can then be used to run snmp_exporter, which will then be ready to collect CPU and memory related metric data from NEC IX router by running curl "http://localhost:9116/snmp?module=nec_ix_cpu_mem&auth=jGc9SRTxNNH4pZk&target=192.168.200.1"
.
Now, the revised steps to setup snmp monitoring system would be:
- decide what data you want to collect
- identify the OID of the data
- prepare MIB files for the target metrics/OIDs
- edit "generator.yml" file and generate "snmp.yml" file
- run snmp_exporter using the generated "snmp.yml" file
---
auths:
jGc9SRTxNNH4pZk:
version: 2
community: jGc9SRTxNNH4pZk
modules:
...
...
if_mib:
...
# nec ix router
nec_ix_cpu_mem:
walk:
# cpu
- picoSchedRtUtl5Sec
- picoSchedRtUtl1Min
# memory util
- picoHeapUtil
# cisco asa cpu
asa_cpu:
walk:
- cpmCPUTotalMonIntervalValue # 5 sec
- cpmCPUTotal1minRev # 1 min
- cpmCPUTotal5minRev # 5 min
# cisco ios cpu
cisco_ios_cpu:
walk:
- cpmCPUTotal5secRev # 5 sec
- cpmCPUTotal1minRev # 1 min
- cpmCPUTotal5minRev # 5 min
# cisco nxos memory
cisco_nxos_memory:
walk:
- cempMemPoolHCUsed
- cempMemPoolHCFree
# cisco ios and asa memory
cisco_ios_asa_memory:
walk:
- ciscoMemoryPoolUsed
- ciscoMemoryPoolFree
how to use generator¶
Here is the example on running generator.
git clone --branch v0.26.0 https://github.com/prometheus/snmp_exporter.git
cd snmp_exporter/generator
# place necessary mib files under ./mibs/
cp {necessary mib files} mibs/.
# where to get mib files?
# https://github.com/prometheus/snmp_exporter/tree/main/generator#where-to-get-mibs
# edit generator.yml file
# execute generator to generate snmp.yml file
# you must have these files and docker installed:
# ./generator.yml
# ./mibs/{mib files}
make docker-generate
# or, you can still run generator without docker by building it
# see the instruction - https://github.com/prometheus/snmp_exporter/blob/main/generator/README.md
running snmp_exporter on docker compose¶
You are now ready to run snmp_exporter
service once you have generated your snmp.yml
file.
- place the generated
snmp.yml
file at./config/snmp.yml
- prepare the
./docker-compose.yml
file anddocker compose up -d
- now the host is listening on port 9116
- test the added module against a target 192.168.210.50 assuming it's the IP address of a NXOS switch
curl "http://localhost:9116/snmp?module=cisco_nxos_memory&auth=jGc9SRTxNNH4pZk&target=192.168.210.50"
services:
snmp:
image: quay.io/prometheus/snmp-exporter:v0.26.0
container_name: snmp
ports:
- "9116:9116"
volumes:
- type: bind
source: ./config/snmp.yml
target: /etc/snmp_exporter/snmp.yml
read_only: true