Ollama Systemd Unit File Sample

From Coolscript
Jump to navigation Jump to search

Ollama systemd override (annotated)

Resource limits

MemoryMax=16G # hard RAM cap (kills process if exceeded)
MemoryHigh=14G # soft limit (throttling before hard cap)
CPUQuota=400% # limit to ~4 CPU cores

File descriptors

LimitNOFILE=1048576 # prevent "too many open files"

Restart / stability

Restart=on-failure # restart only on crashes
RestartSec=3 # delay before restart
StartLimitIntervalSec=60
StartLimitBurst=5 # avoid infinite restart loops

IO / disk behavior

IOSchedulingClass=best-effort
IOSchedulingPriority=4

Optional CPU pinning

CPUAffinity=0 1 2 3 # bind to specific cores

Security (light sandboxing)

NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=full
ProtectHome=true

Logging

StandardOutput=journal
StandardError=journal
LogRateLimitIntervalSec=0

Ollama tuning

Environment="OLLAMA_NUM_PARALLEL=2" # concurrent requests
Environment="OLLAMA_MAX_LOADED_MODELS=1" # avoid VRAM exhaustion
Environment="OLLAMA_KEEP_ALIVE=5m" # unload after idle

Environment="OLLAMA_HOST=127.0.0.1" # bind local only

Minimal working example

[Service]
MemoryMax=16G
CPUQuota=400%
LimitNOFILE=1048576
Restart=on-failure
RestartSec=3
Environment="OLLAMA_NUM_PARALLEL=2"
Environment="OLLAMA_MAX_LOADED_MODELS=1"