Quick Setup (Under 15 Minutes)
Fastest path: Deploy Uptime Kuma on a separate VPS — one Docker command, web UI, monitors all your servers. Zero-install option: UptimeRobot free tier (50 monitors, 5-min checks). Full stack: Netdata for server metrics + Uptime Kuma for external checks. Enterprise grade: Prometheus + Grafana (but overkill for most VPS users).
Table of Contents
What You Actually Need to Monitor
Most monitoring guides throw 50 metrics at you and let you figure out which ones matter. Here is the prioritized list based on what actually causes outages on VPS servers:
Priority 1: Things That Cause Outages
- HTTP/HTTPS availability — Is your site actually responding? This catches 80% of problems.
- Disk space — The #1 silent killer. Log files, Docker images, temp files, database WAL logs. When the disk fills up, everything breaks at once.
- SSL certificate expiry — Let's Encrypt certs expire every 90 days. Auto-renewal fails more often than you think (see our SSL guide).
Priority 2: Things That Degrade Performance
- CPU usage — Sustained >80% means your VPS is undersized or something is wrong.
- RAM usage — Consistent >90% means you are swapping to disk, which destroys performance.
- Network traffic — Unexpected spikes could mean a DDoS or a misconfigured service. Important on providers with metered bandwidth like Vultr and DigitalOcean.
Priority 3: Operational Intelligence
- Response time trends — Is your site getting slower over time?
- Error rates — How many 500 errors are you returning?
- Database performance — Slow queries, connection counts, replication lag.
Start with Priority 1. You can add everything else later. Priority 1 alone catches 90% of real-world VPS problems, and it takes 10 minutes to set up.
External Uptime Monitoring
Your uptime monitor must be external to the server it monitors. If you install monitoring on the same VPS and the VPS goes down, your monitoring goes down with it. Nobody gets alerted. The site stays down until someone notices.
Free External Monitoring Options
| Service | Free Tier | Check Interval | Alert Methods | Self-Hosted |
|---|---|---|---|---|
| UptimeRobot | 50 monitors | 5 min | Email, Slack, Webhook | No |
| Uptime Kuma | Unlimited | Custom (20s+) | 90+ integrations | Yes |
| Hetrix Tools | 15 monitors | 1 min | Email, Slack, Discord | No |
| Freshping | 50 monitors | 1 min | Email, Slack | No |
| Better Stack | 10 monitors | 3 min | Email, Slack, Phone | No |
For a quick start with zero setup, create a UptimeRobot account and add HTTP monitors for your domains. Five minutes of work, 50 monitors, email alerts when anything goes down. This alone puts you ahead of 80% of VPS users who have no monitoring at all.
Uptime Kuma Setup (Self-Hosted)
Uptime Kuma is my recommendation for anyone who manages more than two servers. It is open source, self-hosted, and has a polished web UI that puts many paid tools to shame. Deploy it on a separate VPS — a $3.50/mo BuyVM instance or a $5/mo Vultr instance is more than enough.
# Deploy Uptime Kuma (one command) docker run -d --restart unless-stopped \ -p 3001:3001 \ -v uptime-kuma:/app/data \ --name uptime-kuma \ louislam/uptime-kuma:1
Access it at http://your-monitoring-vps:3001, create an admin account, and start adding monitors.
What to Monitor in Uptime Kuma
- HTTPS check for each domain (verifies both availability and SSL validity)
- TCP check on port 22 (SSH — catches network-level failures)
- DNS check for your domains (catches DNS resolution failures)
- Docker container check via the Docker socket (if Docker is exposed)
- Keyword check — verify that a specific string appears in the response (catches partial failures where Nginx returns 200 but serves an error page)
Alert Configuration
Uptime Kuma supports 90+ notification methods. My setup:
- Telegram for personal servers (instant push notifications to my phone)
- Slack webhook for client servers (alerts go to a #monitoring channel)
- Email as a backup for everything (in case Telegram/Slack is down)
# Telegram bot setup (2 minutes):
# 1. Message @BotFather on Telegram
# 2. /newbot, follow prompts, get token
# 3. Message your bot to start it
# 4. Get your chat ID: https://api.telegram.org/bot{TOKEN}/getUpdates
# 5. Add Telegram notification in Uptime Kuma with token + chat ID
Netdata for Server Metrics
Netdata is the fastest way to get a full server metrics dashboard. Install it, open the web UI, and you have real-time graphs for CPU, RAM, disk, network, processes, and hundreds of other metrics. Zero configuration needed. It runs directly on the server you want to monitor:
# Install Netdata (one-line installer) curl https://get.netdata.cloud/kickstart.sh > /tmp/netdata-kickstart.sh sh /tmp/netdata-kickstart.sh --stable-channel
Access the dashboard at http://your-vps:19999. Do not expose this to the internet without authentication. Use an Nginx reverse proxy with basic auth or access it through an SSH tunnel:
# Access Netdata securely via SSH tunnel ssh -L 19999:localhost:19999 user@your-vps # Then open http://localhost:19999 in your browser
Netdata Alerts
Netdata includes pre-configured alerts for common problems. The most useful ones fire automatically:
- Disk space below 10%
- RAM usage above 90% for 5 minutes
- CPU usage above 85% for 10 minutes
- OOM killer activated
- Network interface errors
Forward alerts to email or Slack by editing /etc/netdata/health_alarm_notify.conf:
# /etc/netdata/health_alarm_notify.conf SEND_SLACK="YES" SLACK_WEBHOOK_URL="https://hooks.slack.com/services/your/webhook/url" DEFAULT_RECIPIENT_SLACK="#monitoring"
Prometheus + Grafana (The Full Stack)
This is the industry-standard monitoring stack. It is powerful, flexible, and overkill for a single VPS. I deploy it when managing 3+ servers or when clients need custom dashboards. On a single server, Netdata gives you 90% of the value at 30% of the complexity.
Docker Compose Setup
# docker-compose.monitoring.yml
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
ports:
- "127.0.0.1:9090:9090"
restart: unless-stopped
grafana:
image: grafana/grafana:latest
volumes:
- grafana_data:/var/lib/grafana
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
ports:
- "127.0.0.1:3000:3000"
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
restart: unless-stopped
volumes:
prometheus_data:
grafana_data:
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['node-exporter:9100']
# Add more targets for additional servers
# - job_name: 'remote-node'
# static_configs:
# - targets: ['remote-server-ip:9100']
Access Grafana at http://localhost:3000 (through SSH tunnel or Nginx proxy with authentication). Import the "Node Exporter Full" dashboard (ID: 1860) for instant server metrics visualization. The full stack uses 400-700MB of RAM. On a Contabo VPS with 8GB RAM ($6.99/mo), that is a small percentage. On a 1GB VPS, host Prometheus and Grafana elsewhere and only run node-exporter on the target server (about 20MB of RAM). See our Docker guide for container management.
Provider-Included Monitoring
Several providers include monitoring in their dashboards. It is basic but useful as a supplement:
- Hetzner: CPU, disk, network graphs plus configurable alerts. The best built-in monitoring among VPS providers I have tested.
- DigitalOcean: Droplet metrics dashboard with custom alerting rules. Can alert on CPU, memory, disk, and bandwidth thresholds.
- Kamatera: Built-in monitoring with graphs and alerts.
- Linode: Longview agent provides detailed metrics. Free for one server, paid for additional.
- Hostinger: Basic monitoring in their VPS dashboard.
- Vultr: Recently added basic metrics. Usable but minimal compared to Hetzner or DO.
Provider monitoring is useful for spot-checking but does not replace external uptime monitoring. When the provider's infrastructure has issues, their monitoring dashboard may be affected too.
Alert Configuration That Does Not Annoy You
The number one reason people disable monitoring alerts: too many false positives. A CPU spike to 95% for 30 seconds during a cron job is not an emergency. An alert that fires every time this happens trains you to ignore alerts, which means you ignore the real ones too. Here are the thresholds I use:
| Metric | Warning | Critical | Duration |
|---|---|---|---|
| Disk usage | 80% | 90% | Instant |
| CPU usage | 85% | 95% | >10 minutes |
| RAM usage | 85% | 95% | >5 minutes |
| HTTP uptime | N/A | Down | >2 checks (2-10 min) |
| SSL expiry | 30 days | 7 days | Daily check |
| Load average | 2x vCPU count | 4x vCPU count | >5 minutes |
The duration column is what prevents alert fatigue. A 30-second CPU spike does not trigger a warning because it has to stay above 85% for 10 minutes. Disk alerts fire instantly because disk full is always an emergency. SSL expiry warns you 30 days out because you want time to fix auto-renewal, not a panic at 7 days.
Custom Monitoring Scripts
Sometimes you need to monitor something no tool covers out of the box. These scripts handle the most common VPS-specific checks:
Disk Space Alert
#!/bin/bash
# /usr/local/bin/check-disk.sh
THRESHOLD=85
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
if [ "$USAGE" -gt "$THRESHOLD" ]; then
curl -s -X POST -H 'Content-type: application/json' \
-d "{\"text\":\"Disk usage at ${USAGE}% on $(hostname)\"}" \
"$WEBHOOK_URL"
fi
Process Monitor
#!/bin/bash
# /usr/local/bin/check-processes.sh
# Alert if critical processes are not running
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
PROCESSES=("nginx" "postgresql" "docker")
for proc in "${PROCESSES[@]}"; do
if ! pgrep -x "$proc" > /dev/null; then
curl -s -X POST -H 'Content-type: application/json' \
-d "{\"text\":\"Process $proc is NOT running on $(hostname)!\"}" \
"$WEBHOOK_URL"
fi
done
Bandwidth Monitor (for Metered Providers)
#!/bin/bash
# /usr/local/bin/check-bandwidth.sh
# Alert if daily bandwidth exceeds threshold (useful for Vultr/DO/Linode)
INTERFACE="eth0"
THRESHOLD_GB=50
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
RX=$(cat /sys/class/net/$INTERFACE/statistics/rx_bytes)
TX=$(cat /sys/class/net/$INTERFACE/statistics/tx_bytes)
TOTAL_GB=$(echo "scale=2; ($RX + $TX) / 1073741824" | bc)
if (( $(echo "$TOTAL_GB > $THRESHOLD_GB" | bc -l) )); then
curl -s -X POST -H 'Content-type: application/json' \
-d "{\"text\":\"Bandwidth: ${TOTAL_GB}GB used today on $(hostname)\"}" \
"$WEBHOOK_URL"
fi
# Schedule all checks (crontab -e) */5 * * * * /usr/local/bin/check-disk.sh */2 * * * * /usr/local/bin/check-processes.sh 0 */6 * * * /usr/local/bin/check-bandwidth.sh
Log Monitoring
Metrics tell you something is wrong. Logs tell you why. At minimum, watch these log files:
# Nginx errors (failed requests, upstream errors) sudo tail -f /var/log/nginx/error.log # System authentication (SSH login attempts, sudo usage) sudo tail -f /var/log/auth.log # System messages (kernel, services starting/stopping) sudo tail -f /var/log/syslog # Docker container logs docker compose logs -f --tail 100
Automated Log Alerts
#!/bin/bash
# /usr/local/bin/check-auth-log.sh
# Alert on failed SSH login attempts (brute force detection)
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
FAILURES=$(grep "Failed password" /var/log/auth.log | \
grep "$(date +%b\ %d)" | wc -l)
if [ "$FAILURES" -gt 50 ]; then
curl -s -X POST -H 'Content-type: application/json' \
-d "{\"text\":\"SSH brute force: ${FAILURES} failed attempts today on $(hostname)\"}" \
"$WEBHOOK_URL"
fi
For comprehensive log management, consider Loki + Grafana (from the Prometheus stack) or a managed service like Better Stack Logs. For most single-VPS setups, the simple scripts above cover the critical cases. See our VPS security hardening guide for additional security monitoring.
Complete Monitoring Architecture for Multi-Server Setups
Here is how I monitor a client's infrastructure: three production VPS instances plus a staging server. One monitoring VPS watches everything. Total monitoring cost: $5/mo.
Architecture Overview
# Monitoring server: Vultr $5/mo (1 vCPU, 1GB) in Dallas
# Runs: Uptime Kuma + custom scripts
# Monitors:
# - Production Web: Hetzner Ashburn ($4.59/mo)
# - Production API: Vultr New Jersey ($10/mo)
# - Production DB: Vultr New Jersey ($20/mo)
# - Staging: Contabo St. Louis ($6.99/mo)
# Docker Compose on monitoring server
# docker-compose.yml
services:
uptime-kuma:
image: louislam/uptime-kuma:1
ports:
- "3001:3001"
volumes:
- uptime-kuma:/app/data
restart: unless-stopped
volumes:
uptime-kuma:
What I Monitor for Each Server
| Server | Uptime Kuma Checks | Custom Script Checks | Provider Monitoring |
|---|---|---|---|
| Web server | HTTPS (30s), TCP:22, SSL expiry, keyword check | Disk space, Nginx status, response time | Hetzner CPU/disk/network alerts |
| API server | HTTPS /health (30s), TCP:22, response time | Disk, process check, bandwidth | Vultr basic metrics |
| DB server | TCP:5432 (1min), TCP:22 | Disk, replication lag, connection count | Vultr basic metrics |
| Staging | HTTPS (5min), TCP:22 | Disk space only | None (Contabo has no monitoring) |
Database-Specific Monitoring
If you run PostgreSQL or MySQL on your VPS (as opposed to a managed database service), you need database-specific monitoring. These are the metrics that warn you before your database becomes the bottleneck:
#!/bin/bash
# /usr/local/bin/check-postgres.sh
# PostgreSQL health check script
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
HOSTNAME=$(hostname)
# Check connection count (max is usually 100-200)
CONN_COUNT=$(sudo -u postgres psql -t -c "SELECT count(*) FROM pg_stat_activity;" 2>/dev/null | tr -d ' ')
MAX_CONN=$(sudo -u postgres psql -t -c "SHOW max_connections;" 2>/dev/null | tr -d ' ')
if [ -n "$CONN_COUNT" ] && [ -n "$MAX_CONN" ]; then
USAGE_PCT=$(( CONN_COUNT * 100 / MAX_CONN ))
if [ "$USAGE_PCT" -gt 80 ]; then
curl -s -X POST -H 'Content-type: application/json' \
-d "{\"text\":\"DB ALERT: PostgreSQL connections at ${USAGE_PCT}% ($CONN_COUNT/$MAX_CONN) on $HOSTNAME\"}" \
"$WEBHOOK_URL"
fi
fi
# Check for long-running queries (>60 seconds)
LONG_QUERIES=$(sudo -u postgres psql -t -c \
"SELECT count(*) FROM pg_stat_activity WHERE state = 'active' AND now() - query_start > interval '60 seconds';" \
2>/dev/null | tr -d ' ')
if [ "${LONG_QUERIES:-0}" -gt 0 ]; then
curl -s -X POST -H 'Content-type: application/json' \
-d "{\"text\":\"DB WARNING: $LONG_QUERIES long-running queries (>60s) on $HOSTNAME\"}" \
"$WEBHOOK_URL"
fi
# Check database size growth
DB_SIZE=$(sudo -u postgres psql -t -c \
"SELECT pg_size_pretty(pg_database_size('your_database'));" 2>/dev/null | tr -d ' ')
echo "Database size: $DB_SIZE"
# MySQL equivalent checks
#!/bin/bash
# /usr/local/bin/check-mysql.sh
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
# Connection count
CONN_COUNT=$(mysqladmin status 2>/dev/null | awk '{print $4}')
if [ "${CONN_COUNT:-0}" -gt 100 ]; then
curl -s -X POST -H 'Content-type: application/json' \
-d "{\"text\":\"DB ALERT: MySQL connections at $CONN_COUNT on $(hostname)\"}" \
"$WEBHOOK_URL"
fi
# Slow query count (last hour)
SLOW_QUERIES=$(mysql -e "SHOW GLOBAL STATUS LIKE 'Slow_queries';" -s -N 2>/dev/null | awk '{print $2}')
echo "Slow queries: $SLOW_QUERIES"
Application Performance Monitoring on a Budget
Full APM tools (Datadog, New Relic) cost $25-100/server/month. On a VPS budget, here is how to get 80% of the value for free:
#!/bin/bash
# /usr/local/bin/check-response-time.sh
# Track response time trends over time
LOG_DIR="/var/log/response-times"
mkdir -p "$LOG_DIR"
DATE=$(date +%Y-%m-%d)
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
# Measure response time for key endpoints
declare -A ENDPOINTS
ENDPOINTS=(
["homepage"]="https://yoursite.com/"
["api_health"]="https://api.yoursite.com/health"
["dashboard"]="https://app.yoursite.com/login"
)
for name in "${!ENDPOINTS[@]}"; do
url="${ENDPOINTS[$name]}"
# Run 3 requests, take the median
TIMES=()
for i in 1 2 3; do
t=$(curl -o /dev/null -s -w "%{time_starttransfer}" "$url")
TIMES+=("$t")
done
# Sort and take middle value
MEDIAN=$(printf '%s\n' "${TIMES[@]}" | sort -n | sed -n '2p')
MS=$(echo "$MEDIAN * 1000" | bc | cut -d. -f1)
echo "$(date +%H:%M) $name ${MS}ms" >> "$LOG_DIR/$DATE.log"
# Alert if response time exceeds threshold
if [ "${MS:-0}" -gt 2000 ]; then
curl -s -X POST -H 'Content-type: application/json' \
-d "{\"text\":\"PERF ALERT: $name response time ${MS}ms (>2s) on $(hostname)\"}" \
"$WEBHOOK_URL"
fi
done
# Run every 10 minutes */10 * * * * /usr/local/bin/check-response-time.sh
This gives you response time trend data and threshold alerts without installing any agent or paying for any service. After a week of data collection, you will know your normal response time baselines and can set meaningful alert thresholds. For more advanced performance monitoring, see our VPS performance tuning guide.
Monitoring Cost Comparison
The total cost of a monitoring setup varies dramatically depending on your approach:
| Approach | Monthly Cost | Monitors | Best For |
|---|---|---|---|
| UptimeRobot free + scripts | $0 | 50 uptime + custom | Single VPS, hobby projects |
| Uptime Kuma on RackNerd ($1.49) | $1.49 | Unlimited | Budget-conscious, 2-5 servers |
| Uptime Kuma on Vultr ($5) | $5.00 | Unlimited | Reliable monitoring, 5-20 servers |
| Netdata Cloud (free tier) | $0 | 5 nodes | Single team, server metrics |
| Prometheus + Grafana (self-hosted) | $5-10 (VPS) | Unlimited | 10+ servers, custom dashboards |
| Better Stack (paid) | $24 | Unlimited | Teams, incident management |
| Datadog (full APM) | $90+ | Per-host pricing | Enterprise, deep APM |
For most VPS users, the Uptime Kuma on a $5 Vultr instance is the sweet spot. It costs less than a single month of any paid monitoring service and gives you unlimited monitors with full control. If you are on a strict budget, a $1.49 RackNerd VPS runs Uptime Kuma perfectly well — just accept that your monitoring server might occasionally be less reliable than a Vultr or Hetzner instance.
Setting Up Monitoring for a New VPS Checklist
Here is the exact sequence I follow when setting up monitoring for a new VPS, in order of priority. The entire process takes about 15 minutes:
- Minute 0-2: Add HTTPS monitor in Uptime Kuma (or UptimeRobot) for the primary domain. Set 30-second interval, Telegram/Slack notification.
- Minute 2-3: Add TCP port 22 monitor (SSH availability check).
- Minute 3-5: Add SSL certificate expiry monitor (Uptime Kuma has this built in).
- Minute 5-8: Deploy disk space monitoring script via cron (from the Custom Scripts section above).
- Minute 8-10: Deploy process monitoring script for critical services (nginx, postgres, docker).
- Minute 10-12: If on a metered bandwidth provider (Vultr, DigitalOcean, Linode), add bandwidth monitoring script.
- Minute 12-15: Verify all alerts fire correctly by testing (stop Nginx briefly, fill a temp file to trigger disk alert).
That last step — actually testing your alerts — is the most important one. I have seen people set up monitoring and discover months later that their Slack webhook URL was wrong and no alerts were ever delivered. Test on day one.
Need a Monitoring VPS?
Your uptime monitor should live on a separate VPS from the servers it watches. These cheap options work perfectly for running Uptime Kuma:
Frequently Asked Questions
What is the best free VPS monitoring tool?
Uptime Kuma for external uptime monitoring — self-hosted, unlimited monitors, 90+ alert integrations. Netdata for server metrics — zero-config real-time dashboards. UptimeRobot for a managed service with zero setup (50 monitors free). For a full stack, Prometheus + Grafana is the industry standard but requires more setup.
Should I host monitoring on the same VPS I am monitoring?
No. If the VPS goes down, your monitoring dies too. Host uptime monitoring on a separate server — a $3.50/mo BuyVM or $5/mo Vultr instance is enough. Resource monitoring agents (Netdata, node-exporter) on the same server are fine because they send data externally. The rule: uptime checks must be external.
Which VPS providers include built-in monitoring?
Hetzner: CPU/disk/network graphs plus alerting (best built-in). DigitalOcean: custom alert rules. Kamatera: dashboard monitoring. Linode: Longview agent. Hostinger: basic graphs. No monitoring: Contabo, RackNerd, InterServer.
How much RAM does a monitoring stack use?
Uptime Kuma: ~100MB. Netdata: ~150-200MB. Prometheus + Grafana: 400-700MB total. On a 1GB VPS, only Uptime Kuma or basic scripts fit comfortably. On 2GB+, Netdata works well. Host heavy monitoring stacks on a separate server.
What should I monitor on my VPS?
Priority 1: HTTP/HTTPS uptime, SSL expiry, disk space. Priority 2: CPU, RAM, network traffic. Priority 3: database performance, error rates, response times. Start with Priority 1 — it catches 90% of real problems and takes 10 minutes to set up.
How do I get alerted when my VPS goes down?
External uptime monitor (Uptime Kuma, UptimeRobot) checking every 1-5 minutes. Notifications via Telegram (fastest for personal), Slack (best for teams), or email (most reliable). The monitoring must be external to the server being monitored.
Is Netdata better than Prometheus for VPS monitoring?
Different tools. Netdata: zero-config, real-time, great for single servers, 150-200MB RAM. Prometheus + Grafana: requires setup, scales to many servers, custom dashboards, 400-700MB. Single VPS: Netdata. Multiple servers: Prometheus. Both are free and open source.