VPS Monitoring Setup — Know Your Server Is Down Before Your Users Do

Last month, one of my VPS instances ran out of disk space at 2:17 AM. A log file grew to 18GB because I forgot to configure logrotate on a new service. The site went down. Nginx could not write to its error log, which meant it could not serve requests, which meant the WordPress site hosted on that server returned 500 errors to anyone who visited. My monitoring caught it at 2:18 AM. By 2:22 AM, I had cleared the log file from my phone and the site was back up. Total downtime: 5 minutes, and only 3 of those were me fumbling with a mobile SSH client.

Without monitoring, I would have discovered the problem the next morning — when the client called. That is the entire value proposition of server monitoring distilled into one real example. The disk space alert cost me $0/month (Uptime Kuma on a separate $3.50 VPS) and saved me a client relationship. Here is how to set up the same system.

Quick Setup (Under 15 Minutes)

Fastest path: Deploy Uptime Kuma on a separate VPS — one Docker command, web UI, monitors all your servers. Zero-install option: UptimeRobot free tier (50 monitors, 5-min checks). Full stack: Netdata for server metrics + Uptime Kuma for external checks. Enterprise grade: Prometheus + Grafana (but overkill for most VPS users).

What You Actually Need to Monitor

Most monitoring guides throw 50 metrics at you and let you figure out which ones matter. Here is the prioritized list based on what actually causes outages on VPS servers:

Priority 1: Things That Cause Outages

  • HTTP/HTTPS availability — Is your site actually responding? This catches 80% of problems.
  • Disk space — The #1 silent killer. Log files, Docker images, temp files, database WAL logs. When the disk fills up, everything breaks at once.
  • SSL certificate expiry — Let's Encrypt certs expire every 90 days. Auto-renewal fails more often than you think (see our SSL guide).

Priority 2: Things That Degrade Performance

  • CPU usage — Sustained >80% means your VPS is undersized or something is wrong.
  • RAM usage — Consistent >90% means you are swapping to disk, which destroys performance.
  • Network traffic — Unexpected spikes could mean a DDoS or a misconfigured service. Important on providers with metered bandwidth like Vultr and DigitalOcean.

Priority 3: Operational Intelligence

  • Response time trends — Is your site getting slower over time?
  • Error rates — How many 500 errors are you returning?
  • Database performance — Slow queries, connection counts, replication lag.

Start with Priority 1. You can add everything else later. Priority 1 alone catches 90% of real-world VPS problems, and it takes 10 minutes to set up.

External Uptime Monitoring

Your uptime monitor must be external to the server it monitors. If you install monitoring on the same VPS and the VPS goes down, your monitoring goes down with it. Nobody gets alerted. The site stays down until someone notices.

Free External Monitoring Options

Service Free Tier Check Interval Alert Methods Self-Hosted
UptimeRobot50 monitors5 minEmail, Slack, WebhookNo
Uptime KumaUnlimitedCustom (20s+)90+ integrationsYes
Hetrix Tools15 monitors1 minEmail, Slack, DiscordNo
Freshping50 monitors1 minEmail, SlackNo
Better Stack10 monitors3 minEmail, Slack, PhoneNo

For a quick start with zero setup, create a UptimeRobot account and add HTTP monitors for your domains. Five minutes of work, 50 monitors, email alerts when anything goes down. This alone puts you ahead of 80% of VPS users who have no monitoring at all.

Uptime Kuma Setup (Self-Hosted)

Uptime Kuma is my recommendation for anyone who manages more than two servers. It is open source, self-hosted, and has a polished web UI that puts many paid tools to shame. Deploy it on a separate VPS — a $3.50/mo BuyVM instance or a $5/mo Vultr instance is more than enough.

# Deploy Uptime Kuma (one command)
docker run -d --restart unless-stopped \
  -p 3001:3001 \
  -v uptime-kuma:/app/data \
  --name uptime-kuma \
  louislam/uptime-kuma:1

Access it at http://your-monitoring-vps:3001, create an admin account, and start adding monitors.

What to Monitor in Uptime Kuma

  • HTTPS check for each domain (verifies both availability and SSL validity)
  • TCP check on port 22 (SSH — catches network-level failures)
  • DNS check for your domains (catches DNS resolution failures)
  • Docker container check via the Docker socket (if Docker is exposed)
  • Keyword check — verify that a specific string appears in the response (catches partial failures where Nginx returns 200 but serves an error page)

Alert Configuration

Uptime Kuma supports 90+ notification methods. My setup:

  • Telegram for personal servers (instant push notifications to my phone)
  • Slack webhook for client servers (alerts go to a #monitoring channel)
  • Email as a backup for everything (in case Telegram/Slack is down)
# Telegram bot setup (2 minutes):
# 1. Message @BotFather on Telegram
# 2. /newbot, follow prompts, get token
# 3. Message your bot to start it
# 4. Get your chat ID: https://api.telegram.org/bot{TOKEN}/getUpdates
# 5. Add Telegram notification in Uptime Kuma with token + chat ID

Netdata for Server Metrics

Netdata is the fastest way to get a full server metrics dashboard. Install it, open the web UI, and you have real-time graphs for CPU, RAM, disk, network, processes, and hundreds of other metrics. Zero configuration needed. It runs directly on the server you want to monitor:

# Install Netdata (one-line installer)
curl https://get.netdata.cloud/kickstart.sh > /tmp/netdata-kickstart.sh
sh /tmp/netdata-kickstart.sh --stable-channel

Access the dashboard at http://your-vps:19999. Do not expose this to the internet without authentication. Use an Nginx reverse proxy with basic auth or access it through an SSH tunnel:

# Access Netdata securely via SSH tunnel
ssh -L 19999:localhost:19999 user@your-vps
# Then open http://localhost:19999 in your browser

Netdata Alerts

Netdata includes pre-configured alerts for common problems. The most useful ones fire automatically:

  • Disk space below 10%
  • RAM usage above 90% for 5 minutes
  • CPU usage above 85% for 10 minutes
  • OOM killer activated
  • Network interface errors

Forward alerts to email or Slack by editing /etc/netdata/health_alarm_notify.conf:

# /etc/netdata/health_alarm_notify.conf
SEND_SLACK="YES"
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/your/webhook/url"
DEFAULT_RECIPIENT_SLACK="#monitoring"

Prometheus + Grafana (The Full Stack)

This is the industry-standard monitoring stack. It is powerful, flexible, and overkill for a single VPS. I deploy it when managing 3+ servers or when clients need custom dashboards. On a single server, Netdata gives you 90% of the value at 30% of the complexity.

Docker Compose Setup

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'
    ports:
      - "127.0.0.1:9090:9090"
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
    ports:
      - "127.0.0.1:3000:3000"
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter:latest
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:
# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  # Add more targets for additional servers
  # - job_name: 'remote-node'
  #   static_configs:
  #     - targets: ['remote-server-ip:9100']

Access Grafana at http://localhost:3000 (through SSH tunnel or Nginx proxy with authentication). Import the "Node Exporter Full" dashboard (ID: 1860) for instant server metrics visualization. The full stack uses 400-700MB of RAM. On a Contabo VPS with 8GB RAM ($6.99/mo), that is a small percentage. On a 1GB VPS, host Prometheus and Grafana elsewhere and only run node-exporter on the target server (about 20MB of RAM). See our Docker guide for container management.

Provider-Included Monitoring

Several providers include monitoring in their dashboards. It is basic but useful as a supplement:

  • Hetzner: CPU, disk, network graphs plus configurable alerts. The best built-in monitoring among VPS providers I have tested.
  • DigitalOcean: Droplet metrics dashboard with custom alerting rules. Can alert on CPU, memory, disk, and bandwidth thresholds.
  • Kamatera: Built-in monitoring with graphs and alerts.
  • Linode: Longview agent provides detailed metrics. Free for one server, paid for additional.
  • Hostinger: Basic monitoring in their VPS dashboard.
  • Vultr: Recently added basic metrics. Usable but minimal compared to Hetzner or DO.

Provider monitoring is useful for spot-checking but does not replace external uptime monitoring. When the provider's infrastructure has issues, their monitoring dashboard may be affected too.

Alert Configuration That Does Not Annoy You

The number one reason people disable monitoring alerts: too many false positives. A CPU spike to 95% for 30 seconds during a cron job is not an emergency. An alert that fires every time this happens trains you to ignore alerts, which means you ignore the real ones too. Here are the thresholds I use:

Metric Warning Critical Duration
Disk usage80%90%Instant
CPU usage85%95%>10 minutes
RAM usage85%95%>5 minutes
HTTP uptimeN/ADown>2 checks (2-10 min)
SSL expiry30 days7 daysDaily check
Load average2x vCPU count4x vCPU count>5 minutes

The duration column is what prevents alert fatigue. A 30-second CPU spike does not trigger a warning because it has to stay above 85% for 10 minutes. Disk alerts fire instantly because disk full is always an emergency. SSL expiry warns you 30 days out because you want time to fix auto-renewal, not a panic at 7 days.

Custom Monitoring Scripts

Sometimes you need to monitor something no tool covers out of the box. These scripts handle the most common VPS-specific checks:

Disk Space Alert

#!/bin/bash
# /usr/local/bin/check-disk.sh
THRESHOLD=85
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"

USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
if [ "$USAGE" -gt "$THRESHOLD" ]; then
  curl -s -X POST -H 'Content-type: application/json' \
    -d "{\"text\":\"Disk usage at ${USAGE}% on $(hostname)\"}" \
    "$WEBHOOK_URL"
fi

Process Monitor

#!/bin/bash
# /usr/local/bin/check-processes.sh
# Alert if critical processes are not running

WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
PROCESSES=("nginx" "postgresql" "docker")

for proc in "${PROCESSES[@]}"; do
  if ! pgrep -x "$proc" > /dev/null; then
    curl -s -X POST -H 'Content-type: application/json' \
      -d "{\"text\":\"Process $proc is NOT running on $(hostname)!\"}" \
      "$WEBHOOK_URL"
  fi
done

Bandwidth Monitor (for Metered Providers)

#!/bin/bash
# /usr/local/bin/check-bandwidth.sh
# Alert if daily bandwidth exceeds threshold (useful for Vultr/DO/Linode)

INTERFACE="eth0"
THRESHOLD_GB=50
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"

RX=$(cat /sys/class/net/$INTERFACE/statistics/rx_bytes)
TX=$(cat /sys/class/net/$INTERFACE/statistics/tx_bytes)
TOTAL_GB=$(echo "scale=2; ($RX + $TX) / 1073741824" | bc)

if (( $(echo "$TOTAL_GB > $THRESHOLD_GB" | bc -l) )); then
  curl -s -X POST -H 'Content-type: application/json' \
    -d "{\"text\":\"Bandwidth: ${TOTAL_GB}GB used today on $(hostname)\"}" \
    "$WEBHOOK_URL"
fi
# Schedule all checks (crontab -e)
*/5 * * * * /usr/local/bin/check-disk.sh
*/2 * * * * /usr/local/bin/check-processes.sh
0 */6 * * * /usr/local/bin/check-bandwidth.sh

Log Monitoring

Metrics tell you something is wrong. Logs tell you why. At minimum, watch these log files:

# Nginx errors (failed requests, upstream errors)
sudo tail -f /var/log/nginx/error.log

# System authentication (SSH login attempts, sudo usage)
sudo tail -f /var/log/auth.log

# System messages (kernel, services starting/stopping)
sudo tail -f /var/log/syslog

# Docker container logs
docker compose logs -f --tail 100

Automated Log Alerts

#!/bin/bash
# /usr/local/bin/check-auth-log.sh
# Alert on failed SSH login attempts (brute force detection)

WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
FAILURES=$(grep "Failed password" /var/log/auth.log | \
  grep "$(date +%b\ %d)" | wc -l)

if [ "$FAILURES" -gt 50 ]; then
  curl -s -X POST -H 'Content-type: application/json' \
    -d "{\"text\":\"SSH brute force: ${FAILURES} failed attempts today on $(hostname)\"}" \
    "$WEBHOOK_URL"
fi

For comprehensive log management, consider Loki + Grafana (from the Prometheus stack) or a managed service like Better Stack Logs. For most single-VPS setups, the simple scripts above cover the critical cases. See our VPS security hardening guide for additional security monitoring.

Complete Monitoring Architecture for Multi-Server Setups

Here is how I monitor a client's infrastructure: three production VPS instances plus a staging server. One monitoring VPS watches everything. Total monitoring cost: $5/mo.

Architecture Overview

# Monitoring server: Vultr $5/mo (1 vCPU, 1GB) in Dallas
# Runs: Uptime Kuma + custom scripts
# Monitors:
#   - Production Web: Hetzner Ashburn ($4.59/mo)
#   - Production API: Vultr New Jersey ($10/mo)
#   - Production DB:  Vultr New Jersey ($20/mo)
#   - Staging:        Contabo St. Louis ($6.99/mo)

# Docker Compose on monitoring server
# docker-compose.yml
services:
  uptime-kuma:
    image: louislam/uptime-kuma:1
    ports:
      - "3001:3001"
    volumes:
      - uptime-kuma:/app/data
    restart: unless-stopped

volumes:
  uptime-kuma:

What I Monitor for Each Server

Server Uptime Kuma Checks Custom Script Checks Provider Monitoring
Web serverHTTPS (30s), TCP:22, SSL expiry, keyword checkDisk space, Nginx status, response timeHetzner CPU/disk/network alerts
API serverHTTPS /health (30s), TCP:22, response timeDisk, process check, bandwidthVultr basic metrics
DB serverTCP:5432 (1min), TCP:22Disk, replication lag, connection countVultr basic metrics
StagingHTTPS (5min), TCP:22Disk space onlyNone (Contabo has no monitoring)

Database-Specific Monitoring

If you run PostgreSQL or MySQL on your VPS (as opposed to a managed database service), you need database-specific monitoring. These are the metrics that warn you before your database becomes the bottleneck:

#!/bin/bash
# /usr/local/bin/check-postgres.sh
# PostgreSQL health check script

WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"
HOSTNAME=$(hostname)

# Check connection count (max is usually 100-200)
CONN_COUNT=$(sudo -u postgres psql -t -c "SELECT count(*) FROM pg_stat_activity;" 2>/dev/null | tr -d ' ')
MAX_CONN=$(sudo -u postgres psql -t -c "SHOW max_connections;" 2>/dev/null | tr -d ' ')

if [ -n "$CONN_COUNT" ] && [ -n "$MAX_CONN" ]; then
  USAGE_PCT=$(( CONN_COUNT * 100 / MAX_CONN ))
  if [ "$USAGE_PCT" -gt 80 ]; then
    curl -s -X POST -H 'Content-type: application/json' \
      -d "{\"text\":\"DB ALERT: PostgreSQL connections at ${USAGE_PCT}% ($CONN_COUNT/$MAX_CONN) on $HOSTNAME\"}" \
      "$WEBHOOK_URL"
  fi
fi

# Check for long-running queries (>60 seconds)
LONG_QUERIES=$(sudo -u postgres psql -t -c \
  "SELECT count(*) FROM pg_stat_activity WHERE state = 'active' AND now() - query_start > interval '60 seconds';" \
  2>/dev/null | tr -d ' ')

if [ "${LONG_QUERIES:-0}" -gt 0 ]; then
  curl -s -X POST -H 'Content-type: application/json' \
    -d "{\"text\":\"DB WARNING: $LONG_QUERIES long-running queries (>60s) on $HOSTNAME\"}" \
    "$WEBHOOK_URL"
fi

# Check database size growth
DB_SIZE=$(sudo -u postgres psql -t -c \
  "SELECT pg_size_pretty(pg_database_size('your_database'));" 2>/dev/null | tr -d ' ')
echo "Database size: $DB_SIZE"
# MySQL equivalent checks
#!/bin/bash
# /usr/local/bin/check-mysql.sh

WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"

# Connection count
CONN_COUNT=$(mysqladmin status 2>/dev/null | awk '{print $4}')
if [ "${CONN_COUNT:-0}" -gt 100 ]; then
  curl -s -X POST -H 'Content-type: application/json' \
    -d "{\"text\":\"DB ALERT: MySQL connections at $CONN_COUNT on $(hostname)\"}" \
    "$WEBHOOK_URL"
fi

# Slow query count (last hour)
SLOW_QUERIES=$(mysql -e "SHOW GLOBAL STATUS LIKE 'Slow_queries';" -s -N 2>/dev/null | awk '{print $2}')
echo "Slow queries: $SLOW_QUERIES"

Application Performance Monitoring on a Budget

Full APM tools (Datadog, New Relic) cost $25-100/server/month. On a VPS budget, here is how to get 80% of the value for free:

#!/bin/bash
# /usr/local/bin/check-response-time.sh
# Track response time trends over time

LOG_DIR="/var/log/response-times"
mkdir -p "$LOG_DIR"
DATE=$(date +%Y-%m-%d)
WEBHOOK_URL="https://hooks.slack.com/services/your/webhook"

# Measure response time for key endpoints
declare -A ENDPOINTS
ENDPOINTS=(
  ["homepage"]="https://yoursite.com/"
  ["api_health"]="https://api.yoursite.com/health"
  ["dashboard"]="https://app.yoursite.com/login"
)

for name in "${!ENDPOINTS[@]}"; do
  url="${ENDPOINTS[$name]}"
  # Run 3 requests, take the median
  TIMES=()
  for i in 1 2 3; do
    t=$(curl -o /dev/null -s -w "%{time_starttransfer}" "$url")
    TIMES+=("$t")
  done

  # Sort and take middle value
  MEDIAN=$(printf '%s\n' "${TIMES[@]}" | sort -n | sed -n '2p')
  MS=$(echo "$MEDIAN * 1000" | bc | cut -d. -f1)

  echo "$(date +%H:%M) $name ${MS}ms" >> "$LOG_DIR/$DATE.log"

  # Alert if response time exceeds threshold
  if [ "${MS:-0}" -gt 2000 ]; then
    curl -s -X POST -H 'Content-type: application/json' \
      -d "{\"text\":\"PERF ALERT: $name response time ${MS}ms (>2s) on $(hostname)\"}" \
      "$WEBHOOK_URL"
  fi
done
# Run every 10 minutes
*/10 * * * * /usr/local/bin/check-response-time.sh

This gives you response time trend data and threshold alerts without installing any agent or paying for any service. After a week of data collection, you will know your normal response time baselines and can set meaningful alert thresholds. For more advanced performance monitoring, see our VPS performance tuning guide.

Monitoring Cost Comparison

The total cost of a monitoring setup varies dramatically depending on your approach:

Approach Monthly Cost Monitors Best For
UptimeRobot free + scripts$050 uptime + customSingle VPS, hobby projects
Uptime Kuma on RackNerd ($1.49)$1.49UnlimitedBudget-conscious, 2-5 servers
Uptime Kuma on Vultr ($5)$5.00UnlimitedReliable monitoring, 5-20 servers
Netdata Cloud (free tier)$05 nodesSingle team, server metrics
Prometheus + Grafana (self-hosted)$5-10 (VPS)Unlimited10+ servers, custom dashboards
Better Stack (paid)$24UnlimitedTeams, incident management
Datadog (full APM)$90+Per-host pricingEnterprise, deep APM

For most VPS users, the Uptime Kuma on a $5 Vultr instance is the sweet spot. It costs less than a single month of any paid monitoring service and gives you unlimited monitors with full control. If you are on a strict budget, a $1.49 RackNerd VPS runs Uptime Kuma perfectly well — just accept that your monitoring server might occasionally be less reliable than a Vultr or Hetzner instance.

Setting Up Monitoring for a New VPS Checklist

Here is the exact sequence I follow when setting up monitoring for a new VPS, in order of priority. The entire process takes about 15 minutes:

  1. Minute 0-2: Add HTTPS monitor in Uptime Kuma (or UptimeRobot) for the primary domain. Set 30-second interval, Telegram/Slack notification.
  2. Minute 2-3: Add TCP port 22 monitor (SSH availability check).
  3. Minute 3-5: Add SSL certificate expiry monitor (Uptime Kuma has this built in).
  4. Minute 5-8: Deploy disk space monitoring script via cron (from the Custom Scripts section above).
  5. Minute 8-10: Deploy process monitoring script for critical services (nginx, postgres, docker).
  6. Minute 10-12: If on a metered bandwidth provider (Vultr, DigitalOcean, Linode), add bandwidth monitoring script.
  7. Minute 12-15: Verify all alerts fire correctly by testing (stop Nginx briefly, fill a temp file to trigger disk alert).

That last step — actually testing your alerts — is the most important one. I have seen people set up monitoring and discover months later that their Slack webhook URL was wrong and no alerts were ever delivered. Test on day one.

Need a Monitoring VPS?

Your uptime monitor should live on a separate VPS from the servers it watches. These cheap options work perfectly for running Uptime Kuma:

Vultr ($5/mo) → RackNerd ($1.49/mo) → Cheap VPS Under $5

Frequently Asked Questions

What is the best free VPS monitoring tool?

Uptime Kuma for external uptime monitoring — self-hosted, unlimited monitors, 90+ alert integrations. Netdata for server metrics — zero-config real-time dashboards. UptimeRobot for a managed service with zero setup (50 monitors free). For a full stack, Prometheus + Grafana is the industry standard but requires more setup.

Should I host monitoring on the same VPS I am monitoring?

No. If the VPS goes down, your monitoring dies too. Host uptime monitoring on a separate server — a $3.50/mo BuyVM or $5/mo Vultr instance is enough. Resource monitoring agents (Netdata, node-exporter) on the same server are fine because they send data externally. The rule: uptime checks must be external.

Which VPS providers include built-in monitoring?

Hetzner: CPU/disk/network graphs plus alerting (best built-in). DigitalOcean: custom alert rules. Kamatera: dashboard monitoring. Linode: Longview agent. Hostinger: basic graphs. No monitoring: Contabo, RackNerd, InterServer.

How much RAM does a monitoring stack use?

Uptime Kuma: ~100MB. Netdata: ~150-200MB. Prometheus + Grafana: 400-700MB total. On a 1GB VPS, only Uptime Kuma or basic scripts fit comfortably. On 2GB+, Netdata works well. Host heavy monitoring stacks on a separate server.

What should I monitor on my VPS?

Priority 1: HTTP/HTTPS uptime, SSL expiry, disk space. Priority 2: CPU, RAM, network traffic. Priority 3: database performance, error rates, response times. Start with Priority 1 — it catches 90% of real problems and takes 10 minutes to set up.

How do I get alerted when my VPS goes down?

External uptime monitor (Uptime Kuma, UptimeRobot) checking every 1-5 minutes. Notifications via Telegram (fastest for personal), Slack (best for teams), or email (most reliable). The monitoring must be external to the server being monitored.

Is Netdata better than Prometheus for VPS monitoring?

Different tools. Netdata: zero-config, real-time, great for single servers, 150-200MB RAM. Prometheus + Grafana: requires setup, scales to many servers, custom dashboards, 400-700MB. Single VPS: Netdata. Multiple servers: Prometheus. Both are free and open source.

AC
Alex Chen — Senior Systems Engineer

I monitor 40+ VPS instances across seven providers. The Uptime Kuma instance described in this guide has been running for two years with 99.99% uptime itself, alerting me to 47 real incidents that would have otherwise gone unnoticed. Learn more about our testing methodology →