Build a cloud compute node image

Note:

arvados-dispatch-cloud is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm or LSF.

This page describes how to build a compute node image that can be used to run containers dispatched by Arvados in the cloud.

Prerequisites
Fully automated build with Packer and Ansible
Partially automated build with Ansible
Manual build

Prerequisites

Create and configure an SSH keypair

arvados-dispatch-cloud communicates with the compute nodes via SSH. To do this securely, an SSH keypair is needed. The key type must be RSA or ED25519 to work with Amazon EC2. Generate an ED25519 keypair with no passphrase:

~$ ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_dispatcher
Generating public/private ed25519 key pair.
Your identification has been saved in /home/user/.ssh/id_dispatcher.
Your public key has been saved in /home/user/.ssh/id_dispatcher.pub.
The key fingerprint is:
[...]

After you do this, the contents of the private key in ~/.ssh/id_dispatcher need to be stored in your cluster configuration file under Containers.DispatchPrivateKey.

The public key at ~/.ssh/id_dispatcher.pub will need to be authorized to access instances booted from the image. Keep this file; our Ansible playbook will read it to set this up for you.

Get the Arvados source

Compute node templates are only available in the Arvados source tree. Clone a copy of the Arvados source for the version of Arvados you’re using in a directory convenient for you:

~$ git clone --depth=1 --branch=main git://git.arvados.org/arvados.git ~/arvados

Install Ansible

Option 1. Install Ansible with pipx

The pipx tool is packaged in many of our supported distributions. You can install it on Debian/Ubuntu by running:

# apt install pipx

Or install it on Red Hat/AlmaLinux/Rocky Linux by running:

# dnf install pipx

Note:

If the pipx package is not found, it is not available for your distribution. Instead install Ansible with virtualenv and pip.

After pipx is installed, install Ansible by running:

$ arvados/tools/ansible/install-ansible.sh
  installed package ansible-core 2.15.13, installed using Python 3.11.2
  These apps are now globally available
    - ansible
    - ansible-config
    - ansible-connection
    - ansible-console
    - ansible-doc
    - ansible-galaxy
    - ansible-inventory
    - ansible-playbook
    - ansible-pull
    - ansible-test
    - ansible-vault
done! ✨ 🌟 ✨

[…]

Ansible successfully installed!

If this script reports the final success message, skip the next section.

Option 2. Install Ansible in a virtualenv

This method works on all of our supported distributions, but requires you to configure a lot of paths manually. Install Python and virtualenv on Debian/Ubuntu by running:

# apt install python3-venv

Or install it on Red Hat/AlmaLinux/Rocky Linux by running:

# dnf install python3

Next, set up a virtualenv. If you want to install this somewhere other than ~/arvados-ansible, you may change that path each time it appears.

$ arvados/tools/ansible/install-ansible.sh ~/arvados-ansible
Collecting ansible-core~=2.15.13
[…]

Ansible successfully installed!

Finally, add all the Ansible tools to your executable path. If you keep personal executables somewhere other than ~/.local/bin, you may change that path.

$ ln -st ~/.local/bin ~/arvados-ansible/bin/ansible*

Alternatively, you may reconfigure your shell to add $HOME/arvados-ansible/bin to the end of your $PATH variable.

Install Packer and the Ansible plugin

We provide Packer templates that can automatically create a compute instance, configure it with Ansible, shut it down, and create a cloud image from the result. Install Packer following their instructions. After you do, install Packer’s Ansible provisioner by running:

~$ packer plugins install github.com/hashicorp/ansible

Fully automated build with Packer and Ansible

After you have both tools installed, you can configure both with information about your Arvados cluster and cloud environment and then run a fully automated build.

Write Ansible settings for the compute node

In the tools/compute-images directory of your Arvados source checkout, copy host_config.example.yml to host_config.yml. Edit host_config.yml with information about how your compute nodes should be set up following the instructions in the comments.

Set up Packer for your cloud

You need to provide different configuration to Packer depending on which cloud you’re deploying Arvados in.

AWS

Install Packer’s AWS builder by running:

~$ packer plugins install github.com/hashicorp/amazon

In the tools/compute-images directory of your Arvados source checkout, copy aws_config.example.json to aws_config.json. Fill in values for the configuration settings as follows:

If you already have AWS credentials configured that Packer can use to create and manage an EC2 instance, set aws_profile to the name of those credentials in your configuration. Otherwise, set aws_access_key and aws_secret_key with information from an API token with those permissions.
Set aws_region, vpc_id, and subnet_id with identifiers for the network where Packer should create the EC2 instance.
Set aws_source_ami to the AMI of the base image that should be booted and used as the base for your compute node image. Set ssh_user to the name of administrator account that is used on that image.
Set aws_volume_gb to the size of of the image you want to create in GB. The default 20 should be sufficient for most installs. You may increase this if you’re using a custom source AMI with more software pre-installed.
Set arvados_cluster to the same five-alphanumeric identifier used under Clusters in your Arvados cluster configuration.
If you installed Ansible to a nonstandard location, set ansible_command to the absolute path of ansible-playbook. For example, if you installed Ansible in a virtualenv at ~/ansible, set ansible_command to "{{env `HOME`}}/ansible/bin/ansible-playbook".

When you finish writing your configuration, run Packer.

Azure

Install Packer’s Azure builder by running:

~$ packer plugins install github.com/hashicorp/azure

In the tools/compute-images directory of your Arvados source checkout, copy azure_config.example.json to azure_config.json. Fill in values for the configuration settings as follows:

The settings load credentials from Azure’s standard environment variables. As long as you have these environment variables set in the shell before you run Packer, they will be loaded as normal. Alternatively, you can set them directly in the configuration file. These secrets can be generated from the Azure portal, or with the CLI using a command like:
```
~$ az ad sp create-for-rbac --name Packer --password ...
```
Set location and resource_group with identifiers for where Packer should create the cloud instance.
Set image_sku to the identifier of the base image that should be booted and used as the base for your compute node image. Set ssh_user to the name of administrator account you want to use on that image.
Set ssh_private_key_file to the path with the private key you generated earlier for the dispatcher to use. For example, "{{env `HOME`}}/.ssh/id_dispatcher".
Set arvados_cluster to the same five-alphanumeric identifier used under Clusters in your Arvados cluster configuration.
If you installed Ansible to a nonstandard location, set ansible_command to the absolute path of ansible-playbook. For example, if you installed Ansible in a virtualenv at ~/ansible, set ansible_command to "{{env `HOME`}}/ansible/bin/ansible-playbook".

When you finish writing your configuration, run Packer.

Run Packer

In the tools/compute-images directory of your Arvados source checkout, run Packer with your configuration and the template appropriate for your cloud. For example, to build an image on AWS, run:

arvados/tools/compute-images$ packer build -var-file=aws_config.json aws_template.json

To build an image on Azure, replace both instances of aws with azure, and run that command.

Note:

If packer build fails early with ok=0, changed=0, failed=1, and a message like this:

TASK [Gathering Facts] *********************************************************
fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/you/.ansible/tmp/ansible-local-1821271ym6nh1cw/tmp2kyfkhy4 /home/admin/.ansible/tmp/ansible-tmp-1732380360.0917368-1821275-172216075852170/AnsiballZ_setup.py:\n\n"}

PLAY RECAP *********************************************************************
default : ok=0  changed=0  unreachable=0  failed=1  skipped=0  rescued=0  ignored=0

This might mean the version of scp on your computer is trying to use new protocol features that doesn’t work with the older SSH server on the cloud image. You can work around this by running:

$ export ANSIBLE_SCP_EXTRA_ARGS="'-O'"

Then rerun your full packer build command from the same shell.

If the build succeeds, it will report the identifier of your image at the end of the process. For example, when you build an AWS image, it will look like this:

==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
us-east-1: ami-012345abcdef56789

That identifier can now be set as CloudVMs.ImageID in your cluster configuration. You do not need to run any other compute node build process on this page; continue to installing the cloud dispatcher.

Partially automated build with Ansible

If Arvados does not include a template for your cloud, or you do not have permission to run Packer, you can run the Ansible playbook by itself. This can set up a base Debian or Ubuntu system with all the software and configuration necessary to do Arvados compute work. After it’s done, you can manually snapshot the node and create a cloud image from it.

Write Ansible settings for the compute node

Write an Ansible inventory

The compute node playbook runs on a host named default. In the tools/compute-images directory of your Arvados source checkout, write a file named inventory.ini with information about how to connect to this node via SSH. It should be one line like this:

# Example inventory.ini for an Arvados compute node
default ansible_host=192.0.2.9 ansible_user=admin

ansible_host can be the running node’s hostname or IP address. You need to be able to reach this host from the system where you’re running Ansible.
ansible_user names the user account that Ansible should use for the SSH connection. It needs to have permission to use sudo on the running node.

You can add other Ansible configuration options like ansible_port to your inventory if needed. Refer to the Ansible inventory documentation for details.

Run Ansible

If you installed Ansible inside a virtualenv, activate that virtualenv now. Then, in the tools/compute-images directory of your Arvados source checkout, run ansible-playbook with your inventory and configuration:

arvados/tools/compute-images$ ansible-playbook --ask-become-pass --inventory=inventory.ini --extra-vars=@host_config.yml ../ansible/build-compute-image.yml

You’ll be prompted with BECOME password:. Enter the password for the ansible_user you defined in the inventory to use sudo on the running node.

Note:

If ansible-playbook fails early with ok=0, changed=0, failed=1, and a message like this:

TASK [Gathering Facts] *********************************************************
fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/you/.ansible/tmp/ansible-local-1821271ym6nh1cw/tmp2kyfkhy4 /home/admin/.ansible/tmp/ansible-tmp-1732380360.0917368-1821275-172216075852170/AnsiballZ_setup.py:\n\n"}

PLAY RECAP *********************************************************************
default : ok=0  changed=0  unreachable=0  failed=1  skipped=0  rescued=0  ignored=0

This might mean the version of scp on your computer is trying to use new protocol features that doesn’t work with the older SSH server on the cloud image. You can work around this by running:

$ export ANSIBLE_SCP_EXTRA_ARGS="'-O'"

Then rerun your full ansible-playbook command from the same shell.

If it succeeds, Ansible should report a “PLAY RECAP” with failed=0:

PLAY RECAP *********************************************************************
default : ok=41  changed=37  unreachable=0  failed=0  skipped=5  rescued=0  ignored=0

Your node is now ready to run Arvados compute work. You can snapshot the node, create an image from it, and set that image as CloudVMs.ImageID in your Arvados cluster configuration. The details of that process are cloud-specific and out of scope for this documentation. You do not need to run any other compute node build process on this page; continue to installing the cloud dispatcher.

Manual build

If you cannot run Ansible, you can create a cloud instance, manually set it up to be a compute node, and then create an image from it. The details of this process depend on which distribution you use on the cloud instance and which cloud you use; all these variations are out of scope for this documentation. These are the requirements:

Except on Azure, the SSH public key you generated previously must be an authorized key for the user that Crunch is configured to use. For example, if your cluster’s CloudVMs.DriverParameters.AdminUsername setting is crunch, then the dispatcher’s public key should be listed in ~crunch/.ssh/authorized_keys in the image. This user must also be allowed to use sudo without a password unless the user is root.
(On Azure, the dispatcher makes additional calls to automatically set up and authorize the user, making these steps unnecessary.)
SSH needs to be running and reachable by arvados-dispatch-cloud on the port named by CloudVMs.SSHPort in your cluster’s configuration file (default 22).
Install the python3-arvados-fuse package. Enable the user_allow_other option in /etc/fuse.conf.
Install either Docker or Singularity as appropriate based on the Containers.RuntimeEngine setting in your cluster’s configuration file. If you install Docker, you may also want to install and set up the arvados-docker-cleaner package to conserve space on long-running instances, but it’s not strictly required.
All available scratch space should be made available under /tmp.

Previous: Configure webshell Next: Install the cloud dispatcher