The arvados-client diagnostics
command exercises basic cluster functionality, and identifies some common installation and configuration problems. Especially after upgrading or reconfiguring Arvados or server/network infrastructure, it can be the quickest way to identify problems.
On a server node, it is easiest to run the diagnostics command with system privileges. The word sudo
here instructs the arvados-client
command to load Controller.ExternalURL
and SystemRootToken
from /etc/arvados/config.yml
and use those credentials to run tests with system privileges.
When run this way, diagnostics will also include health checks.
# arvados-client sudo diagnostics
On any node (server node, shell node, or a workstation outside the system network), you can also run diagnostics by setting the usual ARVADOS_API_HOST
and ARVADOS_API_TOKEN
environment variables. Typically this is done with a regular user account.
$ export ARVADOS_API_HOST=zzzzz.arvadosapi.com $ export ARVADOS_API_TOKEN=xxxxxxxxxx $ arvados-client diagnostics
The diagnostics output indicates whether its client connection is categorized by the server as internal or external. If you run diagnostics automatically with cron or a monitoring tool, you can use the -internal-client
or -external-client
flag to specify how you expect the client to be categorized, and the test will fail otherwise. Example:
# arvados-client sudo diagnostics -internal-client
[...]
--- cut here --- error summary ---
ERROR 60: checking internal/external client detection (11 ms): expecting internal=true external=false, but found internal=false external=true
By default, the diagnostics
command builds a custom Docker image containing a copy of its own binary, and uses that image to run diagnostic checks from inside an Arvados container. This can help detect problems like lack of network connectivity between containers and Arvados cluster services.
The default approach works well if the client host (i.e., the host where you invoke arvados-client diagnostics
) meets certain conditions:
docker build
and docker save
).arvados-client
binary and the custom-built Docker image are compatible with the compute instances).The following options provide flexibility in case the default approach is not suitable.
-priority=0
skips the container-running part of the diagnostics suite.-docker-image="hello-world"
uses a tiny “hello world” image that is already embedded in the arvados-client
binary. This works even if the client host does not have any docker tools installed, and it minimizes the data transferred during the diagnostics suite. It provides less test coverage than the default option, but it will at least check that it is possible to run a container on the cluster.-docker-image=X
(where X
is a Docker image name or a portable data hash) uses a Docker image that has already been uploaded to your Arvados cluster using arv keep docker
. In this case the diagnostics tool will run a container with the command echo {timestamp}
.-docker-image-from=NAME
builds a custom Docker image on the fly as described above, but using the specified image as a base instead of the default debian:slim-stable
image. Note that the build recipe runs commands like apt-get install [...] libfuse2 ca-certificates
so only Debian-based base images are supported. For more flexibility, use one of the above -docker-image=...
options.-timeout=2m
extends the time limit for each HTTP request made by the diagnostics suite, including the process of uploading a custom-built Docker image, to 2 minutes (the default HTTP request timeout is 10 seconds, and the default upload time limit is either the HTTP timeout or 1 minute, whichever is longer).
# arvados-client sudo diagnostics
INFO 5: running health check (same as `arvados-server check`)
INFO 10: getting discovery document from https://zzzzz.arvadosapi.com/discovery/v1/apis/arvados/v1/rest
INFO 20: getting exported config from https://zzzzz.arvadosapi.com/arvados/v1/config
INFO 30: getting current user record
INFO 40: connecting to service endpoint https://keep.zzzzz.arvadosapi.com/
INFO 41: connecting to service endpoint https://*.collections.zzzzz.arvadosapi.com/
INFO 42: connecting to service endpoint https://download.zzzzz.arvadosapi.com/
INFO 43: connecting to service endpoint wss://ws.zzzzz.arvadosapi.com/websocket
INFO 44: connecting to service endpoint https://workbench.zzzzz.arvadosapi.com/
INFO 45: connecting to service endpoint https://workbench2.zzzzz.arvadosapi.com/
INFO 50: checking CORS headers at https://zzzzz.arvadosapi.com/
INFO 51: checking CORS headers at https://keep.zzzzz.arvadosapi.com/d41d8cd98f00b204e9800998ecf8427e+0
INFO 52: checking CORS headers at https://download.zzzzz.arvadosapi.com/
INFO 60: checking internal/external client detection
INFO 61: reading+writing via keep service at https://keep.zzzzz.arvadosapi.com:443/
INFO 80: finding/creating "scratch area for diagnostics" project
INFO 90: creating temporary collection
INFO 100: uploading file via webdav
INFO 110: checking WebDAV ExternalURL wildcard (https://*.collections.zzzzz.arvadosapi.com/)
INFO 120: downloading from webdav (https://d41d8cd98f00b204e9800998ecf8427e-0.collections.zzzzz.arvadosapi.com/foo)
INFO 121: downloading from webdav (https://d41d8cd98f00b204e9800998ecf8427e-0.collections.zzzzz.arvadosapi.com/sha256:feb5d9fea6a5e9606aa995e879d862b825965ba48de054caab5ef356dc6b3412.tar)
INFO 122: downloading from webdav (https://download.zzzzz.arvadosapi.com/c=d41d8cd98f00b204e9800998ecf8427e+0/_/foo)
INFO 123: downloading from webdav (https://download.zzzzz.arvadosapi.com/c=d41d8cd98f00b204e9800998ecf8427e+0/_/sha256:feb5d9fea6a5e9606aa995e879d862b825965ba48de054caab5ef356dc6b3412.tar)
INFO 124: downloading from webdav (https://a15a27cbc1c7d2d4a0d9e02529aaec7e-128.collections.zzzzz.arvadosapi.com/sha256:feb5d9fea6a5e9606aa995e879d862b825965ba48de054caab5ef356dc6b3412.tar)
INFO 125: downloading from webdav (https://download.zzzzz.arvadosapi.com/c=zzzzz-4zz18-twitqma8mbvwydy/_/sha256:feb5d9fea6a5e9606aa995e879d862b825965ba48de054caab5ef356dc6b3412.tar)
INFO 130: getting list of virtual machines
INFO 150: connecting to webshell service
INFO 160: running a container
INFO ... container request submitted, waiting up to 10m for container to run
INFO 9990: deleting temporary collection
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.