This page describes the requirements for a compute node in a Slurm or LSF cluster that will run containers dispatched by crunch-dispatch-slurm
or arvados-dispatch-lsf
. If you are installing a cloud cluster, refer to Build a cloud compute node image.
These instructions apply when Containers.RuntimeEngine is set to docker
, refer to Set up a compute node with Singularity when running singularity
.
This page describes how to configure a compute node so that it can be used to run containers dispatched by Arvados on a static cluster. These steps must be performed on every compute node.
See Set up Docker
If you want to use NVIDIA GPUs, install the CUDA toolkit.
In addition, you also must install the NVIDIA Container Toolkit:
DIST=$(. /etc/os-release; echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo apt-key add - curl -s -L https://nvidia.github.io/libnvidia-container/$DIST/libnvidia-container.list | \ sudo tee /etc/apt/sources.list.d/libnvidia-container.list sudo apt-get update apt-get install libnvidia-container1 libnvidia-container-tools nvidia-container-toolkit
FUSE must be configured with the user_allow_other
option enabled for Crunch to set up Keep mounts that are readable by containers. Install this file as /etc/fuse.conf
:
# Allow non-root users to specify the 'allow_other' or 'allow_root' # mount options. user_allow_other
The arvados-docker-cleaner
program removes least recently used Docker images as needed to keep disk usage below a configured limit.
Create a file /etc/arvados/docker-cleaner/docker-cleaner.json
in an editor, with the following contents.
{
"Quota": "10G",
"RemoveStoppedContainers": "always"
}
Choosing a quota: Most deployments will want a quota that’s at least 10G. From there, a larger quota can help reduce compute overhead by preventing reloading the same Docker image repeatedly, but will leave less space for other files on the same storage (usually Docker volumes). Make sure the quota is less than the total space available for Docker images.
This also removes all containers as soon as they exit, as if they were run with docker run --rm
. If you need to debug or inspect containers after they stop, temporarily stop arvados-docker-cleaner or configure it with "RemoveStoppedContainers":"never"
.
# dnf install python-arvados-fuse crunch-run arvados-docker-cleaner
# apt-get install python-arvados-fuse crunch-run arvados-docker-cleaner
# systemctl enable --now arvados-docker-cleaner
# systemctl status arvados-docker-cleaner
[...]
If systemctl status
indicates it is not running, use journalctl
to check logs for errors:
# journalctl -n12 --unit arvados-docker-cleaner
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.