Some Arvados services publish Prometheus/OpenMetrics-compatible metrics at /metrics
. Metrics can help you understand how components perform under load, find performance bottlenecks, and detect and diagnose problems.
To access metrics endpoints, services must be configured with a management token. When accessing a metrics endpoint, prefix the management token with "Bearer "
and supply it in the Authorization
request header.
curl -sfH "Authorization: Bearer your_management_token_goes_here" "https://0.0.0.0:25107/metrics"
The plain text export format includes “help” messages with a description of each reported metric.
When configuring Prometheus, use a bearer_token
or bearer_token_file
option to authenticate requests.
scrape_configs: - job_name: keepstore bearer_token: your_management_token_goes_here static_configs: - targets: - "keep0.ClusterID.example.com:25107"
Component | Metrics endpoint |
---|---|
arvados-api-server | |
arvados-controller | ✓ |
arvados-dispatch-cloud | ✓ |
arvados-git-httpd | |
arvados-node-manager | |
arvados-ws | |
composer | |
keepproxy | |
keepstore | ✓ |
keep-balance | ✓ |
keep-web | ✓ |
sso-provider | |
workbench1 | |
workbench2 |
The node manager does not export prometheus-style metrics, but its /status.json
endpoint provides a snapshot of internal status at the time of the most recent wishlist update.
curl -sfH "Authorization: Bearer your_management_token_goes_here" "http://0.0.0.0:8989/status.json"
Attribute | Type | Description |
---|---|---|
nodes_booting | int | Number of nodes in booting state |
nodes_unpaired | int | Number of nodes in unpaired state |
nodes_busy | int | Number of nodes in busy state |
nodes_idle | int | Number of nodes in idle state |
nodes_fail | int | Number of nodes in fail state |
nodes_down | int | Number of nodes in down state |
nodes_shutdown | int | Number of nodes in shutdown state |
nodes_wish | int | Number of nodes in the current wishlist |
node_quota | int | Current node count ceiling due to cloud quota limits |
config_max_nodes | int | Configured max node count |
{ "actor_exceptions": 0, "idle_times": { "compute1": 0, "compute3": 0, "compute2": 0, "compute4": 0 }, "create_node_errors": 0, "destroy_node_errors": 0, "nodes_idle": 0, "config_max_nodes": 8, "list_nodes_errors": 0, "node_quota": 8, "Version": "1.1.4.20180719160944", "nodes_wish": 0, "nodes_unpaired": 0, "nodes_busy": 4, "boot_failures": 0 }
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.