Custom Monitoring Stack
Custom Monitoring Stack
Modular monitoring with exporter, plugins, jobrunner and scheduler
Overview
This system provides a lightweight monitoring framework built around:
| Component | Purpose |
|---|---|
| Exporter | Receives metrics and exposes Prometheus format |
| Plugins | Perform individual checks (ping, port, HTTP, cert, process) |
| JobRunner (optional) | Executes plugins remotely via HTTP |
| Scheduler (optional) | Orchestrates execution and intervals |
Design principles:
- Stateless checks
- JSON-only interfaces
- No SSH dependency
- Automatic metric expiry
- Prometheus-native output
Architecture
Logical Flow
The monitoring stack supports three execution models.
A: Direct execution via system scheduler (e.g. crontab)
In this model, plugins are executed directly by the system scheduler. No scheduler daemon or remote execution is involved.
[ Crontab ] (optional)
|
v
[ Plugins ]
|
v
[ Exporter ] ---> /metrics ---> Prometheus
Typical use cases:
- Simple hosts
- Minimal dependencies
- One-off or legacy checks
B: Local execution using the Scheduler
The Scheduler runs locally and executes plugins directly on the same system.
[ Scheduler ] (optional)
|
v
[ Plugins ]
|
v
[ Exporter ] ---> /metrics ---> Prometheus
Typical use cases:
- Centralized scheduling
- Consistent intervals
- Local-only monitoring
C: Remote execution using Scheduler and JobRunner
In this model, the Scheduler coordinates execution and sends requests to the JobRunner. The JobRunner executes plugins on remote hosts and returns the results back to the Scheduler.
The Scheduler then forwards metric payloads to the Exporter. The Exporter is not part of the execution chain and operates independently.
[ Scheduler ] -> [ Exporter ] ---> /metrics ---> Prometheus
| ^
| (exec) | (push)
v |
[ JobRunner (HTTPS) ] |
| |
| (exec) |
v |
[ Plugins ] ------------------+
(JSON result)
Key characteristics:
- JobRunner appears only once in the execution chain
- Downward flow represents execution requests
- Upward flow represents returned plugin results
- Scheduler receives and evaluates results
- Exporter is triggered only after results are processed
- No direct plugin-to-exporter communication
Typical use cases:
- Remote hosts
- No SSH access
- Firewalled or segmented environments
- Centralized orchestration
Exporter
Purpose
The exporter receives metrics via HTTP, stores them temporarily, and exposes them for Prometheus scraping.
Metrics automatically expire if not refreshed.
This is not a Pushgateway replacement.
Features
- HTTP push ingestion
- Metric TTL (expiry)
- Optional SQLite persistence
- Thread-safe in-memory storage
- Mandatory label validation
- Optional check timestamp metrics
Supported Metrics
| Metric | Description |
|---|---|
| check_ping | ICMP latency (ms) |
| check_tcp_port | TCP connect latency (ms) |
| check_proc | Process memory (RSS bytes) |
| check_http | HTTP latency or negative status |
| check_cert_expiry_days | Days until TLS expiry |
Mandatory Labels
| Label | Required | Meaning |
|---|---|---|
| source | Yes | Origin of the check |
HTTP Endpoints
| Endpoint | Method | Description |
|---|---|---|
| /push | POST | Accept metric JSON |
| /metrics | GET | Prometheus scrape endpoint |
Example Payload
{
"metric_name": "check_ping",
"value": 12.3,
"expiry": 300,
"time_label": 1,
"labels": {
"host": "example.com",
"source": "scheduler"
}
}
Plugins
Common Behaviour
All plugins:
- Are standalone executables
- Output structured JSON
- Support retries
- Support optional Basic Auth
- Support no export, this way the custom schedule will add the payload into the exporter
- Can run locally or remotely via the custom jobrunner
Plugin Summary
| Plugin | Function |
|---|---|
| check-ping | ICMP reachability |
| check-port | TCP connection latency |
| check-proc | Process memory usage |
| check-http | HTTP latency and content checks |
| check-cert | TLS certificate expiry |
check-ping
- Uses system ping
- Measures round-trip time
- Retries on failure
check-ping --host google.de --expiry 300 --time-metric
check-port
- TCP connect test
- Measures latency
- Supports arbitrary ports
check-port --host mail.example.com --port 587
check-proc
- Uses psutil
- Reports RSS memory
- Fails if process is missing
check-proc --name dockerd
check-http
- Measures HTTP latency
- Optional regex content search
- Optional redirect handling
Return values:
| Value | Meaning |
|---|---|
| >0 | Latency in ms |
| -2 | Pattern not found |
| -3xx | Redirect blocked |
check-http -w https://example.com -p "Welcome"
check-cert
- Opens TLS connection
- Reads certificate notAfter field
- Returns remaining days
check-cert --host example.com
JobRunner (Optional)
Purpose
JobRunner allows controlled remote execution of plugins via HTTP.
Features
- HTTP API
- Alias-based execution
- Sync or async execution
- Syslog logging
- No shell injection
- JSON output
Example Request
POST /run?alias=check-ping&args=--host example.com
Alias Configuration
joblist:
- alias: check-ping
cmd: /usr/local/bin/check-ping
Scheduler (Optional)
Purpose
The scheduler coordinates which checks run, where, and how often.
Configuration Files
| File | Purpose |
|---|---|
| hosts.yaml | Host inventory |
| services.yaml | Service definitions |
| commands.yaml | Local command paths |
| jobrunner.yaml | Remote runners |
| exporter.yaml | Exporter targets |
Execution Logic
- Load configuration
- Replace %hostname% placeholders
- Apply random startup delay
- Execute checks at interval
- Parse JSON output
- Forward metrics to exporter
Interval Syntax
| Format | Meaning |
|---|---|
| 30s | 30 seconds |
| 5m | 5 minutes |
| 1h | 1 hour |
| 1d | 1 day |
Summary
This monitoring stack provides:
- Stateless plugin execution
- Prometheus-compatible metrics
- Automatic metric expiry
- Remote execution without SSH
- Central scheduling
- Clear JSON boundaries
It is intentionally simple, predictable, and easy to reason about.