Install a compute node

Install dependencies

First, add the appropriate package repository for your distribution.

Note:

On CentOS and RHEL, these packages require a more recent version from Software Collections. The Software Collection will be installed automatically as long as Software Collections are enabled on your system.

To enable Software Collections on CentOS, run:

~$ sudo yum install centos-release-scl scl-utils

To enable Software Collections on RHEL:

~$ sudo yum-config-manager --enable rhel-server-rhscl-7-rpms

See also section 2.1 of Red Hat’s Installation chapter .

On Red Hat-based systems:

~$ echo 'exclude=python2-llfuse' | sudo tee -a /etc/yum.conf
~$ sudo yum install perl python-virtualenv fuse python-arvados-python-client python-arvados-fuse crunchrunner crunchstat arvados-docker-cleaner iptables ca-certificates

On Debian-based systems:

~$ sudo apt-get install perl python-virtualenv fuse python-arvados-python-client python-arvados-fuse crunchrunner crunchstat arvados-docker-cleaner iptables ca-certificates

Install Docker

Compute nodes must have Docker installed to run containers. This requires a relatively recent version of Linux (at least upstream version 3.10, or a distribution version with the appropriate patches backported). Follow the Docker Engine installation documentation for your distribution.

For Debian-based systems, the Arvados package repository includes a backported docker.io package with a known-good version you can install.

Configure the Docker daemon

Crunch runs Docker containers with relatively little configuration. You may need to start the Docker daemon with specific options to make sure these jobs run smoothly in your environment. This section highlights options that are useful to most installations. Refer to the Docker daemon reference for complete information about all available options.

The best way to configure these options varies by distribution.

  • If you’re using our backported docker.io package, you can list these options in the DOCKER_OPTS setting in /etc/default/docker.io.
  • If you’re using another Debian-based package, you can list these options in the DOCKER_OPTS setting in /etc/default/docker.
  • On Red Hat-based distributions, you can list these options in the other_args setting in /etc/sysconfig/docker.

Default ulimits

Docker containers inherit ulimits from the Docker daemon. However, the ulimits for a single Unix daemon may not accommodate a long-running Crunch job. You may want to increase default limits for compute containers by passing --default-ulimit options to the Docker daemon. For example, to allow containers to open 10,000 files, set --default-ulimit nofile=10000:10000.

DNS

Your containers must be able to resolve the hostname of your API server and any hostnames returned in Keep service records. If these names are not in public DNS records, you may need to specify a DNS resolver for the containers by setting the --dns address to an IP address of an appropriate nameserver. You may specify this option more than once to use multiple nameservers.

Configure Linux cgroups accounting

Linux can report what compute resources are used by processes in a specific cgroup or Docker container. Crunch can use these reports to share that information with users running compute work. This can help pipeline authors debug and optimize their workflows.

To enable cgroups accounting, you must boot Linux with the command line parameters cgroup_enable=memory swapaccount=1.

On Debian-based systems, open the file /etc/default/grub in an editor. Find where the string GRUB_CMDLINE_LINUX is set. Add cgroup_enable=memory swapaccount=1 to that string. Save the file and exit the editor. Then run:

~$ sudo update-grub

On Red Hat-based systems, run:

~$ sudo grubby --update-kernel=ALL --args='cgroup_enable=memory swapaccount=1'

Finally, reboot the system to make these changes effective.

Create a project for Docker images

Here we create a default project for the standard Arvados Docker images, and give all users read access to it. The project is owned by the system user.

~$ uuid_prefix=`arv --format=uuid user current | cut -d- -f1`
~$ all_users_group_uuid="$uuid_prefix-j7d0g-fffffffffffffff"
~$ project_uuid=`arv --format=uuid group create --group "{\"owner_uuid\":\"$uuid_prefix-tpzed-000000000000000\", \"group_class\":\"project\", \"name\":\"Arvados Standard Docker Images\"}"`
~$ echo "Arvados project uuid is '$project_uuid'"
~$ read -rd $'\000' newlink <<EOF; arv link create --link "$newlink"
{
 "tail_uuid":"$all_users_group_uuid",
 "head_uuid":"$project_uuid",
 "link_class":"permission",
 "name":"can_read"
}
EOF

Download and tag the latest arvados/jobs docker image

In order to start workflows from workbench, there needs to be Docker image tagged arvados/jobs:latest. The following command downloads the latest arvados/jobs image from Docker Hub, loads it into Keep, and tags it as ‘latest’. In this example $project_uuid should be the the UUID of the “Arvados Standard Docker Images” project.

~$ arv-keepdocker --pull arvados/jobs latest --project-uuid $project_uuid

If the image needs to be downloaded from Docker Hub, the command can take a few minutes to complete, depending on available network bandwidth.

Set up SLURM

Install SLURM following the same process you used to install the Crunch dispatcher.

Copy configuration files from the dispatcher (API server)

The slurm.conf and /etc/munge/munge.key files need to be identical across the dispatcher and all compute nodes. Copy the files you created in the Install the Crunch dispatcher step to this compute node.

Configure FUSE

FUSE must be configured with the user_allow_other option enabled for Crunch to set up Keep mounts that are readable by containers. Install this file as /etc/fuse.conf:

# Set the maximum number of FUSE mounts allowed to non-root users.
# The default is 1000.
#
#mount_max = 1000

# Allow non-root users to specify the 'allow_other' or 'allow_root'
# mount options.
#
user_allow_other

Configure the Docker cleaner

The arvados-docker-cleaner program removes least recently used Docker images as needed to keep disk usage below a configured limit.

Note:

This also removes all containers as soon as they exit, as if they were run with docker run --rm. If you need to debug or inspect containers after they stop, temporarily stop arvados-docker-cleaner or configure it with "RemoveStoppedContainers":"never".

Create a file /etc/arvados/docker-cleaner/docker-cleaner.json in an editor, with the following contents.

{
    "Quota": "10G",
    "RemoveStoppedContainers": "always"
}

Choosing a quota: Most deployments will want a quota that’s at least 10G. From there, a larger quota can help reduce compute overhead by preventing reloading the same Docker image repeatedly, but will leave less space for other files on the same storage (usually Docker volumes). Make sure the quota is less than the total space available for Docker images.

Restart the service after updating the configuration file.

~$ sudo systemctl restart arvados-docker-cleaner

If you are using a different daemon supervisor, or if you want to test the daemon in a terminal window, run arvados-docker-cleaner. Run arvados-docker-cleaner --help for more configuration options.

Add a Crunch user account

Create a Crunch user account, and add it to the fuse and docker groups so it can use those tools:

~$ sudo useradd --groups fuse,docker crunch

The crunch user should have the same UID, GID, and home directory across all compute nodes and the dispatcher (API server).

Tell the API server about this compute node

Load your API superuser token on the compute node:


~$ HISTIGNORE=$HISTIGNORE:'export ARVADOS_API_TOKEN=*'
~$ export ARVADOS_API_TOKEN=@your-superuser-token@
~$ export ARVADOS_API_HOST=@uuid_prefix.your.domain@
~$ unset ARVADOS_API_HOST_INSECURE

Then execute this script to create a compute node object, and set up a cron job to have the compute node ping the API server every five minutes:


#!/bin/bash
set -e
if ! test -f /root/node.json ; then
    python - <<EOF
import arvados, json, socket
fqdn = socket.getfqdn()
hostname, _, domain = fqdn.partition('.')
node = arvados.api('v1').nodes().create(body={'hostname': hostname, 'domain': domain}).execute()
with open('/root/node.json', 'w') as node_file:
    json.dump(node, node_file, indent=2)
EOF

    # Make sure /dev/fuse permissions are correct (the device appears after fuse is loaded)
    chmod 1660 /dev/fuse && chgrp fuse /dev/fuse
fi

UUID=`grep \"uuid\" /root/node.json  |cut -f4 -d\"`
PING_SECRET=`grep \"ping_secret\" /root/node.json  |cut -f4 -d\"`

if ! test -f /etc/cron.d/node_ping ; then
    echo "*/5 * * * * root /usr/bin/curl -k -d ping_secret=$PING_SECRET https://$ARVADOS_API_HOST/arvados/v1/nodes/$UUID/ping" > /etc/cron.d/node_ping
fi

/usr/bin/curl -k -d ping_secret=$PING_SECRET https://$ARVADOS_API_HOST/arvados/v1/nodes/$UUID/ping?ping_secret=$PING_SECRET

And remove your token from the environment:


~$ unset ARVADOS_API_TOKEN
~$ unset ARVADOS_API_HOST


Previous: Install the Crunch dispatcher

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.