Custom Monitoring Stack: Difference between revisions

From Coolscript
Jump to navigation Jump to search
No edit summary
 
(One intermediate revision by the same user not shown)
Line 47: Line 47:
----
----


==== A: Direct execution via system scheduler (e.g. crontab) ====
==== A: Direct Execution via System Scheduler (e.g. crontab) ====


In this model, plugins are executed directly by the system scheduler.
In this model, plugins are executed directly by the local system scheduler,
In this way the local system crontab is in use.
typically via crontab. No Scheduler or JobRunner components are involved.
<br>[[image:Custom-setup01.png]]<br><br>


Check Plugins are also able to write from remote hosts via the network to the Custom-Exporter
[[image:Custom-setup01.png]]
<br><br>[[image:Custom-exporter01a.png]]<br>


In addition, check_plugins are capable of sending metric payloads directly
to the Custom Exporter over the network from remote hosts.
This capability is provided to demonstrate plugin flexibility and is not
intended as the primary or recommended execution model.
[[image:Custom-exporter01a.png]]


Typical use cases:
Typical use cases:
* Simple hosts
 
* Simple or standalone hosts
* Minimal dependencies
* Minimal dependencies
* One-off or legacy checks
* One-off, legacy, or transitional checks


----
----
Line 76: Line 82:
----
----


==== C: Remote execution using Scheduler and JobRunner ====
==== C: Remote Execution Using Scheduler and JobRunner ====
 
In this model, the Custom Scheduler coordinates execution and dispatches
requests to one or more JobRunner instances. The JobRunner executes plugins
on remote hosts and returns the results to the Scheduler for evaluation.


In this model, the Scheduler coordinates execution and sends requests to the
After processing the results, the Custom Scheduler forwards metric payloads
JobRunner. The JobRunner executes plugins on remote hosts and returns the
to the Custom Exporter. The Custom Exporter is not part of the execution chain
results back to the Scheduler.
and operates independently from plugin execution.


The Custom-Scheduler then forwards metric payloads to the Custom-Exporter.
[[image:Custom-setup03.png]]
The Custom-Exporter is not part of the execution chain and operates independently.


<br>[[image:Custom-setup03.png]]<br>
The following diagram illustrates an alternative execution capability where
check_plugins can write metrics directly to the Custom Exporter over the network.


Another sample which is a bit odd and should only display the capabilties of
This mode exists to demonstrate plugin capabilities only and is not intended
the check_plugins as they can write dierectly to the custom exporter via network.
as the preferred or recommended execution model.


<br>[[image:Custom-setup3a.png]]<br>
[[image:Custom-setup3a.png]]


=== Combined Local and Remote Execution ===
=== Combined Local and Remote Execution ===

Latest revision as of 21:09, 3 January 2026

About Custom Monitoring Stack

A lightweight, Prometheus-native monitoring system designed for small environments:

  • Quick to set up – requires little more than a Prometheus container
  • Custom exporter mode – exposes metrics in native Prometheus format
  • Extensible plugin framework – simple, stateless checks (ping, port, process, http, cert)
  • Highly flexible custom scheduler – orchestrates both local and remote plugin execution
  • Minimal custom job runner – executes plugins remotely on behalf of the scheduler
  • Grafana-ready by design – dashboards and alerting via Grafana and Grafana Alertmanager follow standard Prometheus workflows
  • Mobile-friendly interface – optional mobile phone app available, view integrated with Grafana Alertmanager for basic alert monitoring and management


Overview

This system provides a lightweight monitoring framework built around:

Component Purpose
Custom-Exporter Receives metrics and exposes Prometheus format
Plugins Perform individual checks (ping, port, HTTP, cert, process)
Custom-JobRunner (optional) Executes plugins remotely via HTTP
Custom-Scheduler (optional) Orchestrates execution and intervals

Design principles:

  • Stateless checks
  • JSON-only interfaces
  • No SSH dependency
  • Automatic metric expiry
  • Prometheus-native output

Architecture

Logical Flow

The monitoring stack supports three execution models.


A: Direct Execution via System Scheduler (e.g. crontab)

In this model, plugins are executed directly by the local system scheduler, typically via crontab. No Scheduler or JobRunner components are involved.

In addition, check_plugins are capable of sending metric payloads directly to the Custom Exporter over the network from remote hosts.

This capability is provided to demonstrate plugin flexibility and is not intended as the primary or recommended execution model.

Typical use cases:

  • Simple or standalone hosts
  • Minimal dependencies
  • One-off, legacy, or transitional checks

B: Local execution using the Scheduler

The Scheduler runs and orchestors locally and executes plugins directly on the same system.

Typical use cases:

  • Centralized scheduling
  • Consistent intervals
  • Local-only monitoring

C: Remote Execution Using Scheduler and JobRunner

In this model, the Custom Scheduler coordinates execution and dispatches requests to one or more JobRunner instances. The JobRunner executes plugins on remote hosts and returns the results to the Scheduler for evaluation.

After processing the results, the Custom Scheduler forwards metric payloads to the Custom Exporter. The Custom Exporter is not part of the execution chain and operates independently from plugin execution.

The following diagram illustrates an alternative execution capability where check_plugins can write metrics directly to the Custom Exporter over the network.

This mode exists to demonstrate plugin capabilities only and is not intended as the preferred or recommended execution model.

Combined Local and Remote Execution

The preferred execution model.

As illustrated in the diagram below, the Custom Scheduler acts as the central execution authority between local plugins, remote JobRunner instances, and the Prometheus Custom Exporter.




A single Scheduler instance can:

  • Execute selected checks locally or via remote
  • Dispatch other checks to one or more remote JobRunner instances
  • Collect and evaluate all plugin results centrally
  • Forward processed results to the Custom Exporter only after evaluation

Local and remote execution paths are selected per check by scheduler configuration, not by a global operating mode.

This enables mixed deployments where local services are checked directly, while remote, firewalled, or restricted hosts are monitored via JobRunner, without duplicating exporters or schedulers.

Key characteristics

  • JobRunner appears only once in the execution chain
  • Downward flow represents execution requests
  • Upward flow represents returned plugin results
  • Custom-Scheduler receives and evaluates results
  • Custom-Exporter is triggered only after result processing
  • No direct plugin-to-exporter communication

Typical use cases:

  • Remote hosts
  • Environments without SSH access
  • Firewalled or segmented networks
  • Centralized orchestration

Custom-Exporter

Purpose

The exporter receives metrics via HTTP, stores them temporarily, and exposes them for Prometheus scraping.

Metrics automatically expire if not refreshed.

This is not a Pushgateway replacement.

Features

  • HTTP push ingestion
  • Metric TTL (expiry)
  • Optional SQLite persistence
  • Thread-safe in-memory storage
  • Mandatory label validation
  • Optional check timestamp metrics

Supported Metrics

Metric Description
check_ping ICMP latency (ms)
check_tcp_port TCP connect latency (ms)
check_proc Process memory (RSS bytes)
check_http HTTP latency or negative status
check_cert_expiry_days Days until TLS expiry

Mandatory Labels

Label Required Meaning
source Yes Origin of the check

HTTP Endpoints

Endpoint Method Description
/push POST Accept metric JSON
/metrics GET Prometheus scrape endpoint

Example Payload

{
  "metric_name": "check_ping",
  "value": 12.3,
  "expiry": 300,
  "time_label": 1,
  "labels": {
    "host": "example.com",
    "source": "scheduler"
  }
}

Security and Exposure

If the custom exporter is exposed beyond localhost or a trusted internal network, it **must** be placed behind a reverse proxy such as nginx.

Direct internet exposure of the custom exporter service is discouraged.

The reverse proxy is responsible for:

  • TLS termination
  • Authentication (Basic Auth, mTLS, or equivalent)
  • IP allowlisting
  • Rate limiting
  • Request size limits

The custom exporter itself intentionally remains simple and does not replace an edge security layer.


Custom Exporter Startup Arguments

The custom exporter is configured **exclusively via command-line arguments**. There is no static configuration file.

All runtime behaviour is derived from incoming metric payloads.

Supported Arguments

Argument Description Default
--db Path to SQLite database for metric persistence Disabled

Startup Load Sequence

  1. Parse CLI arguments
  2. Initialize SQLite (if enabled)
  3. Load non-expired metrics from DB
  4. Start expiry cleanup thread
  5. Start HTTP server

Plugins

Common Behaviour

All plugins:

  • Are standalone executables
  • Output structured JSON
  • Support retries
  • Support optional Basic Auth
  • Support no export, allowing the scheduler to forward payloads
  • Can run locally or remotely via the JobRunner

Plugin Summary

Plugin Function
check-ping ICMP reachability
check-port TCP connection latency
check-proc Process memory usage
check-http HTTP latency and content checks
check-cert TLS certificate expiry

check-ping

  • Uses system ping
  • Measures round-trip time
  • Retries on failure
check-ping --host google.de --expiry 300 --time-metric

check-port

  • TCP connect test
  • Measures latency
  • Supports arbitrary ports
check-port --host mail.example.com --port 587

check-proc

  • Uses psutil
  • Reports RSS memory
  • Fails if process is missing
check-proc --name dockerd

check-http

  • Measures HTTP latency
  • Optional regex content search
  • Optional redirect handling

Return values:

Value Meaning
>0 Latency in ms
-2 Pattern not found
-3xx Redirect blocked
check-http -w https://example.com -p "Welcome"

check-cert

  • Opens TLS connection
  • Reads certificate notAfter field
  • Returns remaining days
check-cert --host example.com

Custom JobRunner (Optional)

Purpose

JobRunner allows controlled remote execution of plugins via HTTP.

Features

  • HTTP API
  • Alias-based execution
  • Sync or async execution
  • Syslog logging
  • No shell injection
  • JSON output

Example Request

POST /run?alias=check-ping&args=--host example.com

Alias Configuration

joblist:
  - alias: check-ping
    cmd: /usr/local/bin/check-ping

Configuration Encryption

Sensitive values in **jobrunner.yaml** and **exporter.yaml** may be stored encrypted.

If plaintext or decrypted values are detected during load, JobRunner will automatically encrypt them using **Fernet** and persist the encrypted form.

This prevents accidental long-term storage of secrets in readable form while keeping configuration management simple.


Security and Exposure

If JobRunner is accessible from outside the local host or trusted network, it **must** be placed behind a reverse proxy such as nginx.

The reverse proxy should provide:

  • TLS termination
  • Authentication
  • IP-based access control
  • Rate limiting

JobRunner deliberately avoids implementing complex security logic and assumes a protected deployment environment.


Custom Scheduler (Optional)

Purpose

The scheduler coordinates which checks run, where, and how often.

Configuration Files

File Purpose
hosts.yaml Host inventory
services.yaml Service definitions
commands.yaml Local command paths
jobrunner.yaml Remote runners
exporter.yaml Exporter targets

Execution Logic

  1. Load configuration
  2. Replace %hostname% placeholders
  3. Apply random startup delay
  4. Execute checks at interval
  5. Parse JSON output
  6. Forward metrics to exporter

Interval Syntax

Format Meaning
30s 30 seconds
5m 5 minutes
1h 1 hour
1d 1 day

Python Module Requirements

Overview

The monitoring stack is written in Python and relies on a small set of well-known third-party modules in addition to the Python standard library.

Component Purpose Python Modules
Exporter Prometheus metrics endpoint flask, prometheus_client, requests, cryptography
Scheduler Job orchestration pyyaml
JobRunner Remote execution API flask, pyyaml, cryptography
Plugins Monitoring checks requests, psutil

Custom Exporter

from flask import Flask
from prometheus_client import CollectorRegistry, generate_latest
from prometheus_client.core import GaugeMetricFamily
import requests
import sqlite3

Scheduler

import yaml
import threading
import subprocess
import time

JobRunner

from flask import Flask, request, jsonify
from cryptography.fernet import Fernet
import yaml
import subprocess
import logging

Plugins

import requests
import psutil
import socket
import ssl
import argparse
import json
import time

Python Version

  • Python 3.8 or newer
  • Python 3.11 recommended

Minimal Installation

pip install flask prometheus_client requests pyyaml psutil cryptography

Summary of Custom Monitoring Stack

This monitoring stack provides:

  • Stateless plugin execution
  • Prometheus-compatible metrics
  • Automatic metric expiry
  • Remote execution without SSH
  • Central scheduling
  • Clear JSON boundaries
  • Automatic self-healing encryption of sensitive configuration values

It is intentionally simple, predictable, and easy to reason about.