Custom Monitoring Stack: Difference between revisions
| (8 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
= Custom Monitoring Stack = | = About Custom Monitoring Stack = | ||
'' | |||
'''A lightweight, Prometheus-native monitoring system designed for small environments:''' | |||
* '''Quick to set up''' – requires little more than a Prometheus container | |||
* '''Custom exporter mode''' – exposes metrics in native Prometheus format | |||
* '''Extensible plugin framework''' – simple, stateless checks (ping, port, process, http, cert) | |||
* '''Highly flexible custom scheduler''' – orchestrates both local and remote plugin execution | |||
* '''Minimal custom job runner''' – executes plugins remotely on behalf of the scheduler | |||
* '''Grafana-ready by design''' – dashboards and alerting via Grafana and Grafana Alertmanager follow standard Prometheus workflows | |||
* '''Mobile-friendly interface''' – optional mobile phone app available, view integrated with Grafana Alertmanager for basic alert monitoring and management | |||
<p></p> | |||
__TOC__ | __TOC__ | ||
| Line 11: | Line 22: | ||
! Component !! Purpose | ! Component !! Purpose | ||
|- | |- | ||
| Exporter || Receives metrics and exposes Prometheus format | | Custom-Exporter || Receives metrics and exposes Prometheus format | ||
|- | |- | ||
| Plugins || Perform individual checks (ping, port, HTTP, cert, process) | | Plugins || Perform individual checks (ping, port, HTTP, cert, process) | ||
|- | |- | ||
| JobRunner (optional) || Executes plugins remotely via HTTP | | Custom-JobRunner (optional) || Executes plugins remotely via HTTP | ||
|- | |- | ||
| Scheduler (optional) || Orchestrates execution and intervals | | Custom-Scheduler (optional) || Orchestrates execution and intervals | ||
|} | |} | ||
| Line 36: | Line 47: | ||
---- | ---- | ||
==== A: Direct | ==== A: Direct Execution via System Scheduler (e.g. crontab) ==== | ||
In this model, plugins are executed directly by the local system scheduler, | |||
typically via crontab. No Scheduler or JobRunner components are involved. | |||
[[image:Custom-setup01.png]] | |||
In addition, check_plugins are capable of sending metric payloads directly | |||
to the Custom Exporter over the network from remote hosts. | |||
This capability is provided to demonstrate plugin flexibility and is not | |||
intended as the primary or recommended execution model. | |||
[[image:Custom-exporter01a.png]] | |||
Typical use cases: | Typical use cases: | ||
* Simple hosts | |||
* Simple or standalone hosts | |||
* Minimal dependencies | * Minimal dependencies | ||
* One-off or | * One-off, legacy, or transitional checks | ||
---- | ---- | ||
| Line 51: | Line 72: | ||
==== B: Local execution using the Scheduler ==== | ==== B: Local execution using the Scheduler ==== | ||
The Scheduler runs locally and executes plugins directly on the same system. | The Scheduler runs and orchestors locally and executes plugins directly on the same system. | ||
<br>[[image:Custom-setup02.png]]<br> | <br>[[image:Custom-setup02.png]]<br> | ||
| Line 61: | Line 82: | ||
---- | ---- | ||
==== C: Remote | ==== C: Remote Execution Using Scheduler and JobRunner ==== | ||
In this model, the Custom Scheduler coordinates execution and dispatches | |||
requests to one or more JobRunner instances. The JobRunner executes plugins | |||
on remote hosts and returns the results to the Scheduler for evaluation. | |||
After processing the results, the Custom Scheduler forwards metric payloads | |||
to the Custom Exporter. The Custom Exporter is not part of the execution chain | |||
and operates independently from plugin execution. | |||
[[image:Custom-setup03.png]] | |||
The | The following diagram illustrates an alternative execution capability where | ||
check_plugins can write metrics directly to the Custom Exporter over the network. | |||
This mode exists to demonstrate plugin capabilities only and is not intended | |||
as the preferred or recommended execution model. | |||
[[image:Custom-setup3a.png]] | |||
=== Combined Local and Remote Execution === | === Combined Local and Remote Execution === | ||
'''The preferred execution model.''' | |||
As illustrated in the diagram below, the Custom Scheduler acts as the central | |||
execution authority between local plugins, remote JobRunner instances, | |||
and the Prometheus Custom Exporter. | |||
<br/>[[image:Custom-setup04a.png]]<br/><br/> | |||
A single Scheduler instance can: | |||
* Execute selected checks locally or via remote | |||
not | * Dispatch other checks to one or more remote JobRunner instances | ||
* Collect and evaluate all plugin results centrally | |||
* Forward processed results to the Custom Exporter only after evaluation | |||
Local and remote execution paths are selected per check by scheduler | |||
configuration, not by a global operating mode. | |||
This enables mixed deployments where local services are checked directly, | |||
while remote, firewalled, or restricted hosts are monitored via JobRunner, | |||
without duplicating exporters or schedulers. | |||
=== Key characteristics === | === Key characteristics === | ||
| Line 93: | Line 131: | ||
* Downward flow represents execution requests | * Downward flow represents execution requests | ||
* Upward flow represents returned plugin results | * Upward flow represents returned plugin results | ||
* Scheduler receives and evaluates results | * Custom-Scheduler receives and evaluates results | ||
* Exporter is triggered only after | * Custom-Exporter is triggered only after result processing | ||
* No direct plugin-to-exporter communication | * No direct plugin-to-exporter communication | ||
Typical use cases: | Typical use cases: | ||
* Remote hosts | * Remote hosts | ||
* | * Environments without SSH access | ||
* Firewalled or segmented | * Firewalled or segmented networks | ||
* Centralized orchestration | * Centralized orchestration | ||
---- | ---- | ||
== Exporter == | == Custom-Exporter == | ||
=== Purpose === | === Purpose === | ||
| Line 175: | Line 214: | ||
=== Security and Exposure === | === Security and Exposure === | ||
If the exporter is exposed beyond localhost or a trusted internal network, | If the custom exporter is exposed beyond localhost or a trusted internal network, | ||
it **must** be placed behind a reverse proxy such as nginx. | it **must** be placed behind a reverse proxy such as nginx. | ||
Direct internet exposure of the exporter service is discouraged. | Direct internet exposure of the custom exporter service is discouraged. | ||
The reverse proxy is responsible for: | The reverse proxy is responsible for: | ||
| Line 188: | Line 227: | ||
* Request size limits | * Request size limits | ||
The exporter itself intentionally remains simple and does not replace | The custom exporter itself intentionally remains simple and does not replace | ||
an edge security layer. | an edge security layer. | ||
---- | ---- | ||
=== Exporter Startup Arguments === | === Custom Exporter Startup Arguments === | ||
The exporter is configured **exclusively via command-line arguments**. | The custom exporter is configured **exclusively via command-line arguments**. | ||
There is no static configuration file. | There is no static configuration file. | ||
| Line 309: | Line 348: | ||
---- | ---- | ||
== JobRunner (Optional) == | == Custom JobRunner (Optional) == | ||
=== Purpose === | === Purpose === | ||
| Line 368: | Line 407: | ||
---- | ---- | ||
== Scheduler (Optional) == | == Custom Scheduler (Optional) == | ||
=== Purpose === | === Purpose === | ||
| Line 436: | Line 475: | ||
---- | ---- | ||
=== Exporter === | === Custom Exporter === | ||
<pre> | <pre> | ||
Latest revision as of 21:09, 3 January 2026
About Custom Monitoring Stack
A lightweight, Prometheus-native monitoring system designed for small environments:
- Quick to set up – requires little more than a Prometheus container
- Custom exporter mode – exposes metrics in native Prometheus format
- Extensible plugin framework – simple, stateless checks (ping, port, process, http, cert)
- Highly flexible custom scheduler – orchestrates both local and remote plugin execution
- Minimal custom job runner – executes plugins remotely on behalf of the scheduler
- Grafana-ready by design – dashboards and alerting via Grafana and Grafana Alertmanager follow standard Prometheus workflows
- Mobile-friendly interface – optional mobile phone app available, view integrated with Grafana Alertmanager for basic alert monitoring and management
Overview
This system provides a lightweight monitoring framework built around:
| Component | Purpose |
|---|---|
| Custom-Exporter | Receives metrics and exposes Prometheus format |
| Plugins | Perform individual checks (ping, port, HTTP, cert, process) |
| Custom-JobRunner (optional) | Executes plugins remotely via HTTP |
| Custom-Scheduler (optional) | Orchestrates execution and intervals |
Design principles:
- Stateless checks
- JSON-only interfaces
- No SSH dependency
- Automatic metric expiry
- Prometheus-native output
Architecture
Logical Flow
The monitoring stack supports three execution models.
A: Direct Execution via System Scheduler (e.g. crontab)
In this model, plugins are executed directly by the local system scheduler, typically via crontab. No Scheduler or JobRunner components are involved.
In addition, check_plugins are capable of sending metric payloads directly to the Custom Exporter over the network from remote hosts.
This capability is provided to demonstrate plugin flexibility and is not intended as the primary or recommended execution model.
Typical use cases:
- Simple or standalone hosts
- Minimal dependencies
- One-off, legacy, or transitional checks
B: Local execution using the Scheduler
The Scheduler runs and orchestors locally and executes plugins directly on the same system.

Typical use cases:
- Centralized scheduling
- Consistent intervals
- Local-only monitoring
C: Remote Execution Using Scheduler and JobRunner
In this model, the Custom Scheduler coordinates execution and dispatches requests to one or more JobRunner instances. The JobRunner executes plugins on remote hosts and returns the results to the Scheduler for evaluation.
After processing the results, the Custom Scheduler forwards metric payloads to the Custom Exporter. The Custom Exporter is not part of the execution chain and operates independently from plugin execution.
The following diagram illustrates an alternative execution capability where check_plugins can write metrics directly to the Custom Exporter over the network.
This mode exists to demonstrate plugin capabilities only and is not intended as the preferred or recommended execution model.
Combined Local and Remote Execution
The preferred execution model.
As illustrated in the diagram below, the Custom Scheduler acts as the central execution authority between local plugins, remote JobRunner instances, and the Prometheus Custom Exporter.
A single Scheduler instance can:
- Execute selected checks locally or via remote
- Dispatch other checks to one or more remote JobRunner instances
- Collect and evaluate all plugin results centrally
- Forward processed results to the Custom Exporter only after evaluation
Local and remote execution paths are selected per check by scheduler configuration, not by a global operating mode.
This enables mixed deployments where local services are checked directly, while remote, firewalled, or restricted hosts are monitored via JobRunner, without duplicating exporters or schedulers.
Key characteristics
- JobRunner appears only once in the execution chain
- Downward flow represents execution requests
- Upward flow represents returned plugin results
- Custom-Scheduler receives and evaluates results
- Custom-Exporter is triggered only after result processing
- No direct plugin-to-exporter communication
Typical use cases:
- Remote hosts
- Environments without SSH access
- Firewalled or segmented networks
- Centralized orchestration
Custom-Exporter
Purpose
The exporter receives metrics via HTTP, stores them temporarily, and exposes them for Prometheus scraping.
Metrics automatically expire if not refreshed.
This is not a Pushgateway replacement.
Features
- HTTP push ingestion
- Metric TTL (expiry)
- Optional SQLite persistence
- Thread-safe in-memory storage
- Mandatory label validation
- Optional check timestamp metrics
Supported Metrics
| Metric | Description |
|---|---|
| check_ping | ICMP latency (ms) |
| check_tcp_port | TCP connect latency (ms) |
| check_proc | Process memory (RSS bytes) |
| check_http | HTTP latency or negative status |
| check_cert_expiry_days | Days until TLS expiry |
Mandatory Labels
| Label | Required | Meaning |
|---|---|---|
| source | Yes | Origin of the check |
HTTP Endpoints
| Endpoint | Method | Description |
|---|---|---|
| /push | POST | Accept metric JSON |
| /metrics | GET | Prometheus scrape endpoint |
Example Payload
{
"metric_name": "check_ping",
"value": 12.3,
"expiry": 300,
"time_label": 1,
"labels": {
"host": "example.com",
"source": "scheduler"
}
}
Security and Exposure
If the custom exporter is exposed beyond localhost or a trusted internal network, it **must** be placed behind a reverse proxy such as nginx.
Direct internet exposure of the custom exporter service is discouraged.
The reverse proxy is responsible for:
- TLS termination
- Authentication (Basic Auth, mTLS, or equivalent)
- IP allowlisting
- Rate limiting
- Request size limits
The custom exporter itself intentionally remains simple and does not replace an edge security layer.
Custom Exporter Startup Arguments
The custom exporter is configured **exclusively via command-line arguments**. There is no static configuration file.
All runtime behaviour is derived from incoming metric payloads.
Supported Arguments
| Argument | Description | Default |
|---|---|---|
| --db | Path to SQLite database for metric persistence | Disabled |
Startup Load Sequence
- Parse CLI arguments
- Initialize SQLite (if enabled)
- Load non-expired metrics from DB
- Start expiry cleanup thread
- Start HTTP server
Plugins
Common Behaviour
All plugins:
- Are standalone executables
- Output structured JSON
- Support retries
- Support optional Basic Auth
- Support no export, allowing the scheduler to forward payloads
- Can run locally or remotely via the JobRunner
Plugin Summary
| Plugin | Function |
|---|---|
| check-ping | ICMP reachability |
| check-port | TCP connection latency |
| check-proc | Process memory usage |
| check-http | HTTP latency and content checks |
| check-cert | TLS certificate expiry |
check-ping
- Uses system ping
- Measures round-trip time
- Retries on failure
check-ping --host google.de --expiry 300 --time-metric
check-port
- TCP connect test
- Measures latency
- Supports arbitrary ports
check-port --host mail.example.com --port 587
check-proc
- Uses psutil
- Reports RSS memory
- Fails if process is missing
check-proc --name dockerd
check-http
- Measures HTTP latency
- Optional regex content search
- Optional redirect handling
Return values:
| Value | Meaning |
|---|---|
| >0 | Latency in ms |
| -2 | Pattern not found |
| -3xx | Redirect blocked |
check-http -w https://example.com -p "Welcome"
check-cert
- Opens TLS connection
- Reads certificate notAfter field
- Returns remaining days
check-cert --host example.com
Custom JobRunner (Optional)
Purpose
JobRunner allows controlled remote execution of plugins via HTTP.
Features
- HTTP API
- Alias-based execution
- Sync or async execution
- Syslog logging
- No shell injection
- JSON output
Example Request
POST /run?alias=check-ping&args=--host example.com
Alias Configuration
joblist:
- alias: check-ping
cmd: /usr/local/bin/check-ping
Configuration Encryption
Sensitive values in **jobrunner.yaml** and **exporter.yaml** may be stored encrypted.
If plaintext or decrypted values are detected during load, JobRunner will automatically encrypt them using **Fernet** and persist the encrypted form.
This prevents accidental long-term storage of secrets in readable form while keeping configuration management simple.
Security and Exposure
If JobRunner is accessible from outside the local host or trusted network, it **must** be placed behind a reverse proxy such as nginx.
The reverse proxy should provide:
- TLS termination
- Authentication
- IP-based access control
- Rate limiting
JobRunner deliberately avoids implementing complex security logic and assumes a protected deployment environment.
Custom Scheduler (Optional)
Purpose
The scheduler coordinates which checks run, where, and how often.
Configuration Files
| File | Purpose |
|---|---|
| hosts.yaml | Host inventory |
| services.yaml | Service definitions |
| commands.yaml | Local command paths |
| jobrunner.yaml | Remote runners |
| exporter.yaml | Exporter targets |
Execution Logic
- Load configuration
- Replace %hostname% placeholders
- Apply random startup delay
- Execute checks at interval
- Parse JSON output
- Forward metrics to exporter
Interval Syntax
| Format | Meaning |
|---|---|
| 30s | 30 seconds |
| 5m | 5 minutes |
| 1h | 1 hour |
| 1d | 1 day |
Python Module Requirements
Overview
The monitoring stack is written in Python and relies on a small set of well-known third-party modules in addition to the Python standard library.
| Component | Purpose | Python Modules |
|---|---|---|
| Exporter | Prometheus metrics endpoint | flask, prometheus_client, requests, cryptography |
| Scheduler | Job orchestration | pyyaml |
| JobRunner | Remote execution API | flask, pyyaml, cryptography |
| Plugins | Monitoring checks | requests, psutil |
Custom Exporter
from flask import Flask from prometheus_client import CollectorRegistry, generate_latest from prometheus_client.core import GaugeMetricFamily import requests import sqlite3
Scheduler
import yaml import threading import subprocess import time
JobRunner
from flask import Flask, request, jsonify from cryptography.fernet import Fernet import yaml import subprocess import logging
Plugins
import requests import psutil import socket import ssl import argparse import json import time
Python Version
- Python 3.8 or newer
- Python 3.11 recommended
Minimal Installation
pip install flask prometheus_client requests pyyaml psutil cryptography
Summary of Custom Monitoring Stack
This monitoring stack provides:
- Stateless plugin execution
- Prometheus-compatible metrics
- Automatic metric expiry
- Remote execution without SSH
- Central scheduling
- Clear JSON boundaries
- Automatic self-healing encryption of sensitive configuration values
It is intentionally simple, predictable, and easy to reason about.




