arvados-dispatch-cloud
is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm or LSF.
This page describes how to build a compute node image that can be used to run containers dispatched by Arvados in the cloud.
arvados-dispatch-cloud
communicates with the compute nodes via SSH. To do this securely, a SSH keypair is needed. Generate a SSH keypair with no passphrase:
~$ ssh-keygen -N '' -f ~/.ssh/id_dispatcher
Generating public/private rsa key pair.
Your identification has been saved in /home/user/.ssh/id_dispatcher.
Your public key has been saved in /home/user/.ssh/id_dispatcher.pub.
The key fingerprint is:
[...]
After you do this, the contents of the private key in ~/.ssh/id_dispatcher
need to be stored in your cluster configuration file under Containers.DispatchPrivateKey
.
The public key at ~/.ssh/id_dispatcher.pub
will need to be authorized to access instances booted from the image. Keep this file; our Ansible playbook will read it to set this up for you.
Compute node templates are only available in the Arvados source tree. Clone a copy of the Arvados source for the version of Arvados you’re using in a directory convenient for you:
~$ git clone --depth=1 --branch=main git://git.arvados.org/arvados.git ~/arvados
We provide an Ansible playbook that can run on a Debian or Ubuntu system to set up a node with all the software and configuration necessary to do Arvados compute work. Install Ansible following their instructions. It is okay to install Ansible to a nonstandard location; you can configure other parts of the automation with that location.
We provide Packer templates that can automatically create a compute instance, configure it with Ansible, shut it down, and create a cloud image from the result. Install Packer following their instructions. After you do, install Packer’s Ansible provisioner by running:
~$ packer plugins install github.com/hashicorp/ansible
After you have both tools installed, you can configure both with information about your Arvados cluster and cloud environment and then run a fully automated build.
In the tools/compute-images
directory of your Arvados source checkout, copy host_config.example.yml
to host_config.yml
. Edit host_config.yml
with information about how your compute nodes should be set up following the instructions in the comments.
You need to provide different configuration to Packer depending on which cloud you’re deploying Arvados in.
Install Packer’s AWS builder by running:
~$ packer plugins install github.com/hashicorp/amazon
In the tools/compute-images
directory of your Arvados source checkout, copy aws_config.example.json
to aws_config.json
. Fill in values for the configuration settings as follows:
aws_profile
to the name of those credentials in your configuration. Otherwise, set aws_access_key
and aws_secret_key
with information from an API token with those permissions.aws_region
, vpc_id
, and subnet_id
with identifiers for the network where Packer should create the EC2 instance.aws_source_ami
to the AMI of the base image that should be booted and used as the base for your compute node image. Set ssh_user
to the name of administrator account that is used on that image.arvados_cluster
to the same five-alphanumeric identifier used under Clusters
in your Arvados cluster configuration.ansible_command
to the absolute path of ansible-playbook
. For example, if you installed Ansible in a virtualenv at ~/ansible
, set ansible_command
to "{{env `HOME`}}/ansible/bin/ansible-playbook"
.When you finish writing your configuration, run Packer.
Install Packer’s Azure builder by running:
~$ packer plugins install github.com/hashicorp/azure
In the tools/compute-images
directory of your Arvados source checkout, copy azure_config.example.json
to azure_config.json
. Fill in values for the configuration settings as follows:
~$ az ad sp create-for-rbac --name Packer --password ...
location
and resource_group
with identifiers for where Packer should create the cloud instance.image_sku
to the identifier of the base image that should be booted and used as the base for your compute node image. Set ssh_user
to the name of administrator account you want to use on that image.ssh_private_key_file
to the path with the private key you generated earlier for the dispatcher to use. For example, "{{env `HOME`}}/.ssh/id_dispatcher"
.arvados_cluster
to the same five-alphanumeric identifier used under Clusters
in your Arvados cluster configuration.ansible_command
to the absolute path of ansible-playbook
. For example, if you installed Ansible in a virtualenv at ~/ansible
, set ansible_command
to "{{env `HOME`}}/ansible/bin/ansible-playbook"
.When you finish writing your configuration, run Packer.
In the tools/compute-images
directory of your Arvados source checkout, run Packer with your configuration and the template appropriate for your cloud. For example, to build an image on AWS, run:
arvados/tools/compute-images$ packer build -var-file=aws_config.json aws_template.json
To build an image on Azure, replace both instances of aws
with azure
, and run that command.
If packer build
fails early with ok=0
, changed=0
, failed=1
, and a message like this:
TASK [Gathering Facts] *********************************************************
fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/you/.ansible/tmp/ansible-local-1821271ym6nh1cw/tmp2kyfkhy4 /home/admin/.ansible/tmp/ansible-tmp-1732380360.0917368-1821275-172216075852170/AnsiballZ_setup.py:\n\n"}
PLAY RECAP *********************************************************************
default : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
This might mean the version of scp
on your computer is trying to use new protocol features that doesn’t work with the older SSH server on the cloud image. You can work around this by running:
$ export ANSIBLE_SCP_EXTRA_ARGS="'-O'"
Then rerun your full packer build
command from the same shell.
If the build succeeds, it will report the identifier of your image at the end of the process. For example, when you build an AWS image, it will look like this:
==> Builds finished. The artifacts of successful builds are:
--> amazon-ebs: AMIs were created:
us-east-1: ami-012345abcdef56789
That identifier can now be set as CloudVMs.ImageID
in your cluster configuration. You do not need to run any other compute node build process on this page; continue to installing the cloud dispatcher.
If Arvados does not include a template for your cloud, or you do not have permission to run Packer, you can run the Ansible playbook by itself. This can set up a base Debian or Ubuntu system with all the software and configuration necessary to do Arvados compute work. After it’s done, you can manually snapshot the node and create a cloud image from it.
In the tools/compute-images
directory of your Arvados source checkout, copy host_config.example.yml
to host_config.yml
. Edit host_config.yml
with information about how your compute nodes should be set up following the instructions in the comments. Note that you must set arvados_cluster_id
in this file since you are not running Packer.
The compute node playbook runs on a host named default
. In the tools/compute-images
directory of your Arvados source checkout, write a file named inventory.ini
with information about how to connect to this node via SSH. It should be one line like this:
# Example inventory.ini for an Arvados compute node
default ansible_host=192.0.2.9 ansible_user=admin
ansible_host
can be the running node’s hostname or IP address. You need to be able to reach this host from the system where you’re running Ansible.ansible_user
names the user account that Ansible should use for the SSH connection. It needs to have permission to use sudo
on the running node.You can add other Ansible configuration options like ansible_port
to your inventory if needed. Refer to the Ansible inventory documentation for details.
If you installed Ansible inside a virtualenv, activate that virtualenv now. Then, in the tools/compute-images
directory of your Arvados source checkout, run ansible-playbook
with your inventory and configuration:
arvados/tools/compute-images$ ansible-playbook --ask-become-pass --inventory=inventory.ini --extra-vars=@host_config.yml
You’ll be prompted with BECOME password:
. Enter the password for the ansible_user
you defined in the inventory to use sudo on the running node.
If ansible-playbook
fails early with ok=0
, changed=0
, failed=1
, and a message like this:
TASK [Gathering Facts] *********************************************************
fatal: [default]: FAILED! => {"msg": "failed to transfer file to /home/you/.ansible/tmp/ansible-local-1821271ym6nh1cw/tmp2kyfkhy4 /home/admin/.ansible/tmp/ansible-tmp-1732380360.0917368-1821275-172216075852170/AnsiballZ_setup.py:\n\n"}
PLAY RECAP *********************************************************************
default : ok=0 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
This might mean the version of scp
on your computer is trying to use new protocol features that doesn’t work with the older SSH server on the cloud image. You can work around this by running:
$ export ANSIBLE_SCP_EXTRA_ARGS="'-O'"
Then rerun your full ansible-playbook
command from the same shell.
If it succeeds, Ansible should report a “PLAY RECAP” with failed=0
:
PLAY RECAP *********************************************************************
default : ok=41 changed=37 unreachable=0 failed=0 skipped=5 rescued=0 ignored=0
Your node is now ready to run Arvados compute work. You can snapshot the node, create an image from it, and set that image as CloudVMs.ImageID
in your Arvados cluster configuration. The details of that process are cloud-specific and out of scope for this documentation. You do not need to run any other compute node build process on this page; continue to installing the cloud dispatcher.
If you cannot run Ansible, you can create a cloud instance, manually set it up to be a compute node, and then create an image from it. The details of this process depend on which distribution you use on the cloud instance and which cloud you use; all these variations are out of scope for this documentation. These are the requirements:
CloudVMs.DriverParameters.AdminUsername
setting is crunch
, then the dispatcher’s public key should be listed in ~crunch/.ssh/authorized_keys
in the image. This user must also be allowed to use sudo without a password unless the user is root
.arvados-dispatch-cloud
on the port named by CloudVMs.SSHPort
in your cluster’s configuration file (default 22).python3-arvados-fuse
package. Enable the user_allow_other
option in /etc/fuse.conf
.Containers.RuntimeEngine
setting in your cluster’s configuration file. If you install Docker, you may also want to install and set up the arvados-docker-cleaner
package to conserve space on long-running instances, but it’s not strictly required./tmp
.
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.