To monitor the health of each site in the Science Mesh, a custom fork of the Blackbox Exporter for Prometheus is used.
Put simply, the BBE runs a so-called prober to perform a certain check on a provided target when called via URL:
https://bbe.sciencemesh.uni-muenster.de/probe?target=google.com&module=nagios_test
The response is the result of this check in the form of various Prometheus metrics.
This means that by periodically scraping a BBE target, Prometheus can be used to (indirectly) perform active probing of target sites: Whenever Prometheus calls the BBE target, the BBE in return starts the prober to perform its checks and provides the results in a format consumable by Prometheus.
Nagios probes can be used to perform checks on specific targets; these can include simple pings, authentication tests or more advanced checks. There are several advantages when using Nagios probes to perform health checks:
Nagios probes do not require a special library to be used. Instead, they follow two simple conventions (we are leaving out support for performance data here, as it is currently not needed in the Science Mesh project):
0
= Success1
= Warning2
= Error3
= Unknownstdout
) is considered as the probe’s status message; any further output is considered as additional informationWhen writing a probe for the Science Mesh project, these additional rules apply:
--host
)Some further guidelines also apply:
The original BBE does not support Nagios probes. This is why a custom fork has been created that allows probers of a new nagios
type that will launch an external Nagios probe and convert its results into Prometheus metrics. More details can be found in above repository.