551 lines
20 KiB
Markdown
551 lines
20 KiB
Markdown
# swarmprom
|
||
|
||
Swarmprom is a starter kit for Docker Swarm monitoring with [Prometheus](https://prometheus.io/),
|
||
[Grafana](http://grafana.org/),
|
||
[cAdvisor](https://github.com/google/cadvisor),
|
||
[Node Exporter](https://github.com/prometheus/node_exporter),
|
||
[Alert Manager](https://github.com/prometheus/alertmanager)
|
||
and [Unsee](https://github.com/cloudflare/unsee).
|
||
|
||
## Install
|
||
|
||
Clone this repository and run the monitoring stack:
|
||
|
||
```bash
|
||
$ git clone https://github.com/stefanprodan/swarmprom.git
|
||
$ cd swarmprom
|
||
|
||
ADMIN_USER=admin \
|
||
ADMIN_PASSWORD=admin \
|
||
SLACK_URL=https://hooks.slack.com/services/TOKEN \
|
||
SLACK_CHANNEL=devops-alerts \
|
||
SLACK_USER=alertmanager \
|
||
docker stack deploy -c docker-compose.yml mon
|
||
```
|
||
|
||
Prerequisites:
|
||
|
||
* Docker CE 17.09.0-ce or Docker EE 17.06.2-ee-3
|
||
* Swarm cluster with one manager and a worker node
|
||
* Docker engine experimental enabled and metrics address set to `0.0.0.0:9323`
|
||
|
||
Services:
|
||
|
||
* prometheus (metrics database) `http://<swarm-ip>:9090`
|
||
* grafana (visualize metrics) `http://<swarm-ip>:3000`
|
||
* node-exporter (host metrics collector)
|
||
* cadvisor (containers metrics collector)
|
||
* dockerd-exporter (Docker daemon metrics collector, requires Docker experimental metrics-addr to be enabled)
|
||
* alertmanager (alerts dispatcher) `http://<swarm-ip>:9093`
|
||
* unsee (alert manager dashboard) `http://<swarm-ip>:9094`
|
||
* caddy (reverse proxy and basic auth provider for prometheus, alertmanager and unsee)
|
||
|
||
|
||
## Alternative install with Traefik and HTTPS
|
||
|
||
If you have a Docker Swarm cluster with a global Traefik set up as described in [DockerSwarm.rocks](https://dockerswarm.rocks), you can deploy Swarmprom integrated with that global Traefik proxy.
|
||
|
||
This way, each Swarmprom service will have its own domain, and each of them will be served using HTTPS, with certificates generated (and renewed) automatically.
|
||
|
||
### Requisites
|
||
|
||
These instructions assume you already have Traefik set up following that guide above, in short:
|
||
|
||
* With automatic HTTPS certificate generation.
|
||
* A Docker Swarm network `traefik-public`.
|
||
* Filtering to only serve containers with a label `traefik.constraint-label=traefik-public`.
|
||
|
||
### Instructions
|
||
|
||
* Clone this repository and enter into the directory:
|
||
|
||
```bash
|
||
$ git clone https://github.com/stefanprodan/swarmprom.git
|
||
$ cd swarmprom
|
||
```
|
||
|
||
* Set and export an `ADMIN_USER` environment variable:
|
||
|
||
```bash
|
||
export ADMIN_USER=admin
|
||
```
|
||
|
||
* Set and export an `ADMIN_PASSWORD` environment variable:
|
||
|
||
|
||
```bash
|
||
export ADMIN_PASSWORD=changethis
|
||
```
|
||
|
||
* Set and export a hashed version of the `ADMIN_PASSWORD` using `openssl`, it will be used by Traefik's HTTP Basic Auth for most of the services:
|
||
|
||
```bash
|
||
export HASHED_PASSWORD=$(openssl passwd -apr1 $ADMIN_PASSWORD)
|
||
```
|
||
|
||
* You can check the contents with:
|
||
|
||
```bash
|
||
echo $HASHED_PASSWORD
|
||
```
|
||
|
||
it will look like:
|
||
|
||
```
|
||
$apr1$89eqM5Ro$CxaFELthUKV21DpI3UTQO.
|
||
```
|
||
|
||
* Create and export an environment variable `DOMAIN`, e.g.:
|
||
|
||
```bash
|
||
export DOMAIN=example.com
|
||
```
|
||
|
||
and make sure that the following sub-domains point to your Docker Swarm cluster IPs:
|
||
|
||
* `grafana.example.com`
|
||
* `alertmanager.example.com`
|
||
* `unsee.example.com`
|
||
* `prometheus.example.com`
|
||
|
||
(and replace `example.com` with your actual domain).
|
||
|
||
**Note**: You can also use a subdomain, like `swarmprom.example.com`. Just make sure that the subdomains point to (at least one of) your cluster IPs. Or set up a wildcard subdomain (`*`).
|
||
|
||
* If you are using Slack and want to integrate it, set the following environment variables:
|
||
|
||
```bash
|
||
export SLACK_URL=https://hooks.slack.com/services/TOKEN
|
||
export SLACK_CHANNEL=devops-alerts
|
||
export SLACK_USER=alertmanager
|
||
```
|
||
|
||
**Note**: by using `export` when declaring all the environment variables above, the next command will be able to use them.
|
||
|
||
* Deploy the Traefik version of the stack:
|
||
|
||
|
||
```bash
|
||
docker stack deploy -c docker-compose.traefik.yml swarmprom
|
||
```
|
||
|
||
To test it, go to each URL:
|
||
|
||
* `https://grafana.example.com`
|
||
* `https://alertmanager.example.com`
|
||
* `https://unsee.example.com`
|
||
* `https://prometheus.example.com`
|
||
|
||
|
||
## Setup Grafana
|
||
|
||
Navigate to `http://<swarm-ip>:3000` and login with user ***admin*** password ***admin***.
|
||
You can change the credentials in the compose file or
|
||
by supplying the `ADMIN_USER` and `ADMIN_PASSWORD` environment variables at stack deploy.
|
||
|
||
Swarmprom Grafana is preconfigured with two dashboards and Prometheus as the default data source:
|
||
|
||
* Name: Prometheus
|
||
* Type: Prometheus
|
||
* Url: http://prometheus:9090
|
||
* Access: proxy
|
||
|
||
After you login, click on the home drop down, in the left upper corner and you'll see the dashboards there.
|
||
|
||
***Docker Swarm Nodes Dashboard***
|
||
|
||

|
||
|
||
URL: `http://<swarm-ip>:3000/dashboard/db/docker-swarm-nodes`
|
||
|
||
This dashboard shows key metrics for monitoring the resource usage of your Swarm nodes and can be filtered by node ID:
|
||
|
||
* Cluster up-time, number of nodes, number of CPUs, CPU idle gauge
|
||
* System load average graph, CPU usage graph by node
|
||
* Total memory, available memory gouge, total disk space and available storage gouge
|
||
* Memory usage graph by node (used and cached)
|
||
* I/O usage graph (read and write Bps)
|
||
* IOPS usage (read and write operation per second) and CPU IOWait
|
||
* Running containers graph by Swarm service and node
|
||
* Network usage graph (inbound Bps, outbound Bps)
|
||
* Nodes list (instance, node ID, node name)
|
||
|
||
***Docker Swarm Services Dashboard***
|
||
|
||

|
||
|
||
URL: `http://<swarm-ip>:3000/dashboard/db/docker-swarm-services`
|
||
|
||
This dashboard shows key metrics for monitoring the resource usage of your Swarm stacks and services, can be filtered by node ID:
|
||
|
||
* Number of nodes, stacks, services and running container
|
||
* Swarm tasks graph by service name
|
||
* Health check graph (total health checks and failed checks)
|
||
* CPU usage graph by service and by container (top 10)
|
||
* Memory usage graph by service and by container (top 10)
|
||
* Network usage graph by service (received and transmitted)
|
||
* Cluster network traffic and IOPS graphs
|
||
* Docker engine container and network actions by node
|
||
* Docker engine list (version, node id, OS, kernel, graph driver)
|
||
|
||
***Prometheus Stats Dashboard***
|
||
|
||

|
||
|
||
URL: `http://<swarm-ip>:3000/dashboard/db/prometheus`
|
||
|
||
* Uptime, local storage memory chunks and series
|
||
* CPU usage graph
|
||
* Memory usage graph
|
||
* Chunks to persist and persistence urgency graphs
|
||
* Chunks ops and checkpoint duration graphs
|
||
* Target scrapes, rule evaluation duration, samples ingested rate and scrape duration graphs
|
||
|
||
|
||
## Prometheus service discovery
|
||
|
||
In order to collect metrics from Swarm nodes you need to deploy the exporters on each server.
|
||
Using global services you don't have to manually deploy the exporters. When you scale up your
|
||
cluster, Swarm will launch a cAdvisor, node-exporter and dockerd-exporter instance on the newly created nodes.
|
||
All you need is an automated way for Prometheus to reach these instances.
|
||
|
||
Running Prometheus on the same overlay network as the exporter services allows you to use the DNS service
|
||
discovery. Using the exporters service name, you can configure DNS discovery:
|
||
|
||
```yaml
|
||
scrape_configs:
|
||
- job_name: 'node-exporter'
|
||
dns_sd_configs:
|
||
- names:
|
||
- 'tasks.node-exporter'
|
||
type: 'A'
|
||
port: 9100
|
||
- job_name: 'cadvisor'
|
||
dns_sd_configs:
|
||
- names:
|
||
- 'tasks.cadvisor'
|
||
type: 'A'
|
||
port: 8080
|
||
- job_name: 'dockerd-exporter'
|
||
dns_sd_configs:
|
||
- names:
|
||
- 'tasks.dockerd-exporter'
|
||
type: 'A'
|
||
port: 9323
|
||
```
|
||
|
||
When Prometheus runs the DNS lookup, Docker Swarm will return a list of IPs for each task.
|
||
Using these IPs, Prometheus will bypass the Swarm load-balancer and will be able to scrape each exporter
|
||
instance.
|
||
|
||
The problem with this approach is that you will not be able to tell which exporter runs on which node.
|
||
Your Swarm nodes' real IPs are different from the exporters IPs since exporters IPs are dynamically
|
||
assigned by Docker and are part of the overlay network.
|
||
Swarm doesn't provide any records for the tasks DNS, besides the overlay IP.
|
||
If Swarm provides SRV records with the nodes hostname or IP, you can re-label the source
|
||
and overwrite the overlay IP with the real IP.
|
||
|
||
In order to tell which host a node-exporter instance is running, I had to create a prom file inside
|
||
the node-exporter containing the hostname and the Docker Swarm node ID.
|
||
|
||
When a node-exporter container starts `node-meta.prom` is generated with the following content:
|
||
|
||
```bash
|
||
"node_meta{node_id=\"$NODE_ID\", node_name=\"$NODE_NAME\"} 1"
|
||
```
|
||
|
||
The node ID value is supplied via `{{.Node.ID}}` and the node name is extracted from the `/etc/hostname`
|
||
file that is mounted inside the node-exporter container.
|
||
|
||
```yaml
|
||
node-exporter:
|
||
image: stefanprodan/swarmprom-node-exporter
|
||
environment:
|
||
- NODE_ID={{.Node.ID}}
|
||
volumes:
|
||
- /etc/hostname:/etc/nodename
|
||
command:
|
||
- '-collector.textfile.directory=/etc/node-exporter/'
|
||
```
|
||
|
||
Using the textfile command, you can instruct node-exporter to collect the `node_meta` metric.
|
||
Now that you have a metric containing the Docker Swarm node ID and name, you can use it in promql queries.
|
||
|
||
Let's say you want to find the available memory on each node, normally you would write something like this:
|
||
|
||
```
|
||
sum(node_memory_MemAvailable) by (instance)
|
||
|
||
{instance="10.0.0.5:9100"} 889450496
|
||
{instance="10.0.0.13:9100"} 1404162048
|
||
{instance="10.0.0.15:9100"} 1406574592
|
||
```
|
||
|
||
The above result is not very helpful since you can't tell what Swarm node is behind the instance IP.
|
||
So let's write that query taking into account the node_meta metric:
|
||
|
||
```sql
|
||
sum(node_memory_MemAvailable * on(instance) group_left(node_id, node_name) node_meta) by (node_id, node_name)
|
||
|
||
{node_id="wrdvtftteo0uaekmdq4dxrn14",node_name="swarm-manager-1"} 889450496
|
||
{node_id="moggm3uaq8tax9ptr1if89pi7",node_name="swarm-worker-1"} 1404162048
|
||
{node_id="vkdfx99mm5u4xl2drqhnwtnsv",node_name="swarm-worker-2"} 1406574592
|
||
```
|
||
|
||
This is much better. Instead of overlay IPs, now I can see the actual Docker Swarm nodes ID and hostname. Knowing the hostname of your nodes is useful for alerting as well.
|
||
|
||
You can define an alert when available memory reaches 10%. You also will receive the hostname in the alert message
|
||
and not some overlay IP that you can't correlate to a infrastructure item.
|
||
|
||
Maybe you are wondering why you need the node ID if you have the hostname. The node ID will help you match
|
||
node-exporter instances to cAdvisor instances. All metrics exported by cAdvisor have a label named `container_label_com_docker_swarm_node_id`,
|
||
and this label can be used to filter containers metrics by Swarm nodes.
|
||
|
||
Let's write a query to find out how many containers are running on a Swarm node.
|
||
Knowing from the `node_meta` metric all the nodes IDs you can define a filter with them in Grafana.
|
||
Assuming the filter is `$node_id` the container count query should look like this:
|
||
|
||
```
|
||
count(rate(container_last_seen{container_label_com_docker_swarm_node_id=~"$node_id"}[5m]))
|
||
```
|
||
|
||
Another use case for node ID is filtering the metrics provided by the Docker engine daemon.
|
||
Docker engine doesn't have a label with the node ID attached on every metric, but there is a `swarm_node_info`
|
||
metric that has this label. If you want to find out the number of failed health checks on a Swarm node
|
||
you would write a query like this:
|
||
|
||
```
|
||
sum(engine_daemon_health_checks_failed_total) * on(instance) group_left(node_id) swarm_node_info{node_id=~"$node_id"})
|
||
```
|
||
|
||
For now the engine metrics are still experimental. If you want to use dockerd-exporter you have to enable
|
||
the experimental feature and set the metrics address to `0.0.0.0:9323`.
|
||
|
||
If you are running Docker with systemd create or edit
|
||
/etc/systemd/system/docker.service.d/docker.conf file like so:
|
||
|
||
```
|
||
[Service]
|
||
ExecStart=
|
||
ExecStart=/usr/bin/dockerd \
|
||
--storage-driver=overlay2 \
|
||
--dns 8.8.4.4 --dns 8.8.8.8 \
|
||
--experimental=true \
|
||
--metrics-addr 0.0.0.0:9323
|
||
```
|
||
|
||
Apply the config changes with `systemctl daemon-reload && systemctl restart docker` and
|
||
check if the docker_gwbridge ip address is 172.18.0.1:
|
||
|
||
```bash
|
||
ip -o addr show docker_gwbridge
|
||
```
|
||
|
||
Replace 172.18.0.1 with your docker_gwbridge address in the compose file:
|
||
|
||
```yaml
|
||
dockerd-exporter:
|
||
image: stefanprodan/caddy
|
||
environment:
|
||
- DOCKER_GWBRIDGE_IP=172.18.0.1
|
||
```
|
||
|
||
Collecting Docker Swarm metrics with Prometheus is not a smooth process, and
|
||
because of `group_left` queries tend to become more complex.
|
||
In the future I hope Swarm DNS will contain the SRV record for hostname and Docker engine
|
||
metrics will expose container metrics replacing cAdvisor all together.
|
||
|
||
## Configure Prometheus
|
||
|
||
I've set the Prometheus retention period to 24h, you can change these values in the
|
||
compose file or using the env variable `PROMETHEUS_RETENTION`.
|
||
|
||
```yaml
|
||
prometheus:
|
||
image: stefanprodan/swarmprom-prometheus
|
||
command:
|
||
- '-storage.tsdb.retention=24h'
|
||
deploy:
|
||
resources:
|
||
limits:
|
||
memory: 2048M
|
||
reservations:
|
||
memory: 1024M
|
||
```
|
||
|
||
When using host volumes you should ensure that Prometheus doesn't get scheduled on different nodes. You can
|
||
pin the Prometheus service on a specific host with placement constraints.
|
||
|
||
```yaml
|
||
prometheus:
|
||
image: stefanprodan/swarmprom-prometheus
|
||
volumes:
|
||
- prometheus:/prometheus
|
||
deploy:
|
||
mode: replicated
|
||
replicas: 1
|
||
placement:
|
||
constraints:
|
||
- node.labels.monitoring.role == prometheus
|
||
```
|
||
|
||
## Configure alerting
|
||
|
||
The Prometheus swarmprom comes with the following alert rules:
|
||
|
||
***Swarm Node CPU Usage***
|
||
|
||
Alerts when a node CPU usage goes over 80% for five minutes.
|
||
|
||
```
|
||
ALERT node_cpu_usage
|
||
IF 100 - (avg(irate(node_cpu{mode="idle"}[1m]) * on(instance) group_left(node_name) node_meta * 100) by (node_name)) > 80
|
||
FOR 5m
|
||
LABELS { severity="warning" }
|
||
ANNOTATIONS {
|
||
summary = "CPU alert for Swarm node '{{ $labels.node_name }}'",
|
||
description = "Swarm node {{ $labels.node_name }} CPU usage is at {{ humanize $value}}%.",
|
||
}
|
||
```
|
||
***Swarm Node Memory Alert***
|
||
|
||
Alerts when a node memory usage goes over 80% for five minutes.
|
||
|
||
```
|
||
ALERT node_memory_usage
|
||
IF sum(((node_memory_MemTotal - node_memory_MemAvailable) / node_memory_MemTotal) * on(instance) group_left(node_name) node_meta * 100) by (node_name) > 80
|
||
FOR 5m
|
||
LABELS { severity="warning" }
|
||
ANNOTATIONS {
|
||
summary = "Memory alert for Swarm node '{{ $labels.node_name }}'",
|
||
description = "Swarm node {{ $labels.node_name }} memory usage is at {{ humanize $value}}%.",
|
||
}
|
||
```
|
||
***Swarm Node Disk Alert***
|
||
|
||
Alerts when a node storage usage goes over 85% for five minutes.
|
||
|
||
```
|
||
ALERT node_disk_usage
|
||
IF ((node_filesystem_size{mountpoint="/rootfs"} - node_filesystem_free{mountpoint="/rootfs"}) * 100 / node_filesystem_size{mountpoint="/rootfs"}) * on(instance) group_left(node_name) node_meta > 85
|
||
FOR 5m
|
||
LABELS { severity="warning" }
|
||
ANNOTATIONS {
|
||
summary = "Disk alert for Swarm node '{{ $labels.node_name }}'",
|
||
description = "Swarm node {{ $labels.node_name }} disk usage is at {{ humanize $value}}%.",
|
||
}
|
||
```
|
||
|
||
***Swarm Node Disk Fill Rate Alert***
|
||
|
||
Alerts when a node storage is going to remain out of free space in six hours.
|
||
|
||
```
|
||
ALERT node_disk_fill_rate_6h
|
||
IF predict_linear(node_filesystem_free{mountpoint="/rootfs"}[1h], 6*3600) * on(instance) group_left(node_name) node_meta < 0
|
||
FOR 1h
|
||
LABELS { severity="critical" }
|
||
ANNOTATIONS {
|
||
summary = "Disk fill alert for Swarm node '{{ $labels.node_name }}'",
|
||
description = "Swarm node {{ $labels.node_name }} disk is going to fill up in 6h.",
|
||
}
|
||
```
|
||
|
||
You can add alerts to
|
||
[swarm_node](https://github.com/stefanprodan/swarmprom/blob/master/prometheus/rules/swarm_node.rules)
|
||
and [swarm_task](https://github.com/stefanprodan/swarmprom/blob/master/prometheus/rules/swarm_task.rules)
|
||
files and rerun stack deploy to update them. Because these files are mounted inside the Prometheus
|
||
container at run time as [Docker configs](https://docs.docker.com/engine/swarm/configs/)
|
||
you don't have to bundle them with the image.
|
||
|
||
The Alertmanager swarmprom image is configured with the Slack receiver.
|
||
In order to receive alerts on Slack you have to provide the Slack API url,
|
||
username and channel via environment variables:
|
||
|
||
```yaml
|
||
alertmanager:
|
||
image: stefanprodan/swarmprom-alertmanager
|
||
environment:
|
||
- SLACK_URL=${SLACK_URL}
|
||
- SLACK_CHANNEL=${SLACK_CHANNEL}
|
||
- SLACK_USER=${SLACK_USER}
|
||
```
|
||
|
||
You can install the `stress` package with apt and test out the CPU alert, you should receive something like this:
|
||
|
||

|
||
|
||
Cloudflare has made a great dashboard for managing alerts.
|
||
Unsee can aggregate alerts from multiple Alertmanager instances, running either in HA mode or separate.
|
||
You can access unsee at `http://<swarm-ip>:9094` using the admin user/password set via compose up:
|
||
|
||

|
||
|
||
## Monitoring applications and backend services
|
||
|
||
You can extend swarmprom with special-purpose exporters for services like MongoDB, PostgreSQL, Kafka,
|
||
Redis and also instrument your own applications using the Prometheus client libraries.
|
||
|
||
In order to scrape other services you need to attach those to the `mon_net` network so Prometheus
|
||
can reach them. Or you can attach the `mon_prometheus` service to the networks where your services are running.
|
||
|
||
Once your services are reachable by Prometheus you can add the dns name and port of those services to the
|
||
Prometheus config using the `JOBS` environment variable:
|
||
|
||
```yaml
|
||
prometheus:
|
||
image: stefanprodan/swarmprom-prometheus
|
||
environment:
|
||
- JOBS=mongo-exporter:9216 kafka-exporter:9216 redis-exporter:9216
|
||
```
|
||
|
||
## Monitoring production systems
|
||
|
||
The swarmprom project is meant as a starting point in developing your own monitoring solution. Before running this
|
||
in production you should consider building and publishing your own Prometheus, node exporter and alert manager
|
||
images. Docker Swarm doesn't play well with locally built images, the first step would be to setup a secure Docker
|
||
registry that your Swarm has access to and push the images there. Your CI system should assign version tags to each
|
||
image. Don't rely on the latest tag for continuous deployments, Prometheus will soon reach v2 and the data store
|
||
will not be backwards compatible with v1.x.
|
||
|
||
Another thing you should consider is having redundancy for Prometheus and alert manager.
|
||
You could run them as a service with two replicas pinned on different nodes, or even better,
|
||
use a service like Weave Cloud Cortex to ship your metrics outside of your current setup.
|
||
You can use Weave Cloud not only as a backup of your
|
||
metrics database but you can also define alerts and use it as a data source for your Grafana dashboards.
|
||
Having the alerting and monitoring system hosted on a different platform other than your production
|
||
is good practice that will allow you to react quickly and efficiently when a major disaster strikes.
|
||
|
||
Swarmprom comes with built-in [Weave Cloud](https://www.weave.works/product/cloud/) integration,
|
||
what you need to do is run the weave-compose stack with your Weave service token:
|
||
|
||
```bash
|
||
TOKEN=<WEAVE-TOKEN> \
|
||
ADMIN_USER=admin \
|
||
ADMIN_PASSWORD=admin \
|
||
docker stack deploy -c weave-compose.yml mon
|
||
```
|
||
|
||
This will deploy Weave Scope and Prometheus with Weave Cortex as remote write.
|
||
The local retention is set to 24h so even if your internet connection drops you'll not lose data
|
||
as Prometheus will retry pushing data to Weave Cloud when the connection is up again.
|
||
|
||
You can define alerts and notifications routes in Weave Cloud in the same way you would do with alert manager.
|
||
|
||
To use Grafana with Weave Cloud you have to reconfigure the Prometheus data source like this:
|
||
|
||
* Name: Prometheus
|
||
* Type: Prometheus
|
||
* Url: https://cloud.weave.works/api/prom
|
||
* Access: proxy
|
||
* Basic auth: use your service token as password, the user value is ignored
|
||
|
||
Weave Scope automatically generates a map of your application, enabling you to intuitively understand,
|
||
monitor, and control your microservices based application.
|
||
You can view metrics, tags and metadata of the running processes, containers and hosts.
|
||
Scope offers remote access to the Swarm’s nods and containers, making it easy to diagnose issues in real-time.
|
||
|
||

|
||
|
||

|