Some Arvados services publish Prometheus/OpenMetrics-compatible metrics at /metrics. Metrics can help you understand how components perform under load, find performance bottlenecks, and detect and diagnose problems.
To access metrics endpoints, services must be configured with a management token. When accessing a metrics endpoint, prefix the management token with "Bearer " and supply it in the Authorization request header.
curl -sfH "Authorization: Bearer your_management_token_goes_here" "https://0.0.0.0:25107/metrics"
The plain text export format includes “help” messages with a description of each reported metric.
When configuring Prometheus, use a bearer_token or bearer_token_file option to authenticate requests.
scrape_configs:
- job_name: keepstore
bearer_token: your_management_token_goes_here
static_configs:
- targets:
- "keep0.ClusterID.example.com:25107"
| Component | Metrics endpoint |
|---|---|
| arvados-api-server | |
| arvados-controller | ✓ |
| arvados-dispatch-cloud | ✓ |
| arvados-git-httpd | |
| arvados-node-manager | |
| arvados-ws | |
| composer | |
| keepproxy | |
| keepstore | ✓ |
| keep-balance | ✓ |
| keep-web | ✓ |
| sso-provider | |
| workbench1 | |
| workbench2 |
The node manager does not export prometheus-style metrics, but its /status.json endpoint provides a snapshot of internal status at the time of the most recent wishlist update.
curl -sfH "Authorization: Bearer your_management_token_goes_here" "http://0.0.0.0:8989/status.json"
| Attribute | Type | Description |
|---|---|---|
| nodes_booting | int | Number of nodes in booting state |
| nodes_unpaired | int | Number of nodes in unpaired state |
| nodes_busy | int | Number of nodes in busy state |
| nodes_idle | int | Number of nodes in idle state |
| nodes_fail | int | Number of nodes in fail state |
| nodes_down | int | Number of nodes in down state |
| nodes_shutdown | int | Number of nodes in shutdown state |
| nodes_wish | int | Number of nodes in the current wishlist |
| node_quota | int | Current node count ceiling due to cloud quota limits |
| config_max_nodes | int | Configured max node count |
{
"actor_exceptions": 0,
"idle_times": {
"compute1": 0,
"compute3": 0,
"compute2": 0,
"compute4": 0
},
"create_node_errors": 0,
"destroy_node_errors": 0,
"nodes_idle": 0,
"config_max_nodes": 8,
"list_nodes_errors": 0,
"node_quota": 8,
"Version": "1.1.4.20180719160944",
"nodes_wish": 0,
"nodes_unpaired": 0,
"nodes_busy": 4,
"boot_failures": 0
}
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.