Health checks

Health check endpoints are found at /_health/ping on many Arvados services. The purpose of the health check is to offer a simple method of determining if a service can be reached and allow the service to self-report any problems, suitable for integrating into operational alert systems.

To access health check endpoints, services must be configured with a management token .

Health check endpoints return a JSON object with the field health. This has a value of either OK or ERROR. On error, it may also include a field error with additional information. Examples:

{
  "health": "OK"
}
{
  "health": "ERROR"
  "error": "Inverted polarity in the warp core"
}

Healthcheck aggregator

The service arvados-health performs health checks on all configured services and returns a single value of OK or ERROR for the entire cluster. It exposes the endpoint /_health/all .

The healthcheck aggregator uses the NodeProfile section of the cluster-wide arvados.yml configuration file. Here is an example.

Cluster:
  # The cluster uuid prefix
  zzzzz:
    ManagementToken: xyzzy
    NodeProfile:
      # For each node, the profile name corresponds to a
      # locally-resolvable hostname, and describes which Arvados
      # services are available on that machine.
      api:
        arvados-controller:
          Listen: :8000
        arvados-api-server:
          Listen: :8001
      manage:
	arvados-node-manager:
	  Listen: :8002
      workbench:
	arvados-workbench:
	  Listen: :8003
	arvados-ws:
	  Listen: :8004
      keep:
	keep-web:
	  Listen: :8005
	keepproxy:
	  Listen: :8006
	keep-balance:
	  Listen: :9005
      keep0:
        keepstore:
	  Listen: :25107
      keep1:
        keepstore:
	  Listen: :25107

Previous: Synchronizing external groups Next: Metrics

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.