arvados-dispatch-cloud
is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm or LSF.
This page describes how to build a compute node image that can be used to run containers dispatched by Arvados in the cloud.
Packer templates for AWS and Azure are provided with Arvados. To use them, the following are needed:
arvados-dispatch-cloud
communicates with the compute nodes via SSH. To do this securely, a SSH keypair is needed.
Generate a SSH keypair with no passphrase. The private key needs to be stored in the cluster configuration file (see Containers/DispatchPrivateKey
) for use by arvados-dispatch-cloud
, as described in the next section. The public key will be baked into the compute node images, see the cloud-specific documentation below.
~$ ssh-keygen -N '' -f ~/.ssh/id_dispatcher
Generating public/private rsa key pair.
Your identification has been saved in /home/user/.ssh/id_dispatcher.
Your public key has been saved in /home/user/.ssh/id_dispatcher.pub.
The key fingerprint is:
[...]
~$ cat ~/.ssh/id_dispatcher
-----BEGIN RSA PRIVATE KEY-----
MIIEpQIBAAKCAQEAqXoCzcOBkFQ7w4dvXf9B++1ctgZRqEbgRYL3SstuMV4oawks
ttUuxJycDdsPmeYcHsKo8vsEZpN6iYsX6ZZzhkO5nEayUTU8sBjmg1ZCTo4QqKXr
...
oFyAjVoexx0RBcH6BveTfQtJKbktP1qBO4mXo2dP0cacuZEtlAqW9Eb06Pvaw/D9
foktmqOY8MyctzFgXBpGTxPliGjqo8OkrOyQP2g+FL7v+Km31Xs61P8=
-----END RSA PRIVATE KEY-----
Arvados comes with a build script to automate the creation of a suitable compute node image (see The build script below). It is provided as a convenience. It is also possible to create a compute node image via other means. These are the requirements:
arvados-dispatch-cloud
(the one that corresponds with Containers.DispatchPrivateKey
in the Arvados config file) needs to go into ~/.ssh/authorized_keys for the SSH user you want arvados-dispatch-cloud
to use (cf. CloudVMs.DriverParameters.AdminUsername
in the Arvados config file) and that user needs to be able to sudo without password prompt, unless you use `root` in which case sudo is not used.arvados-dispatch-cloud
automatically extracts the SSH public key from the value of Containers.DispatchPrivateKey
and uses an API call to create the user specified in CloudVMs.DriverParameters.AdminUsername
with that SSH public key and password-less sudo enabled.arvados-dispatch-cloud
on port 22 (or a custom port, see CloudVMS.SSHPort
to in the Arvados config file)python3-arvados-fuse
package needs to be installedDocker
or Singularity
needs to be installed (cf. Containers.RuntimeEngine
in the Arvados config file).The necessary files are located in the arvados/tools/compute-images
directory in the source tree. A build script is provided to generate the image. The --help
argument lists all available options:
~$ ./build.sh --help
build.sh: Build cloud images for arvados-dispatch-cloud
Syntax:
build.sh [options]
Options:
--json-file <path>
Path to the packer json file (required)
--arvados-cluster-id <xxxxx>
The ID of the Arvados cluster, e.g. zzzzz(required)
--aws-profile <profile>
AWS profile to use (valid profile from ~/.aws/config (optional)
--aws-secrets-file <path>
AWS secrets file which will be sourced from this script (optional)
When building for AWS, either an AWS profile or an AWS secrets file
must be provided.
--aws-source-ami <ami-xxxxxxxxxxxxxxxxx>
The AMI to use as base for building the images (required if building for AWS)
--aws-region <region> (default: us-east-1)
The AWS region to use for building the images
--aws-vpc-id <vpc-id>
VPC id for AWS, if not specified packer will derive from the subnet id or pick the default one.
--aws-subnet-id <subnet-xxxxxxxxxxxxxxxxx>
Subnet id for AWS, if not specified packer will pick the default one for the VPC.
--aws-ebs-autoscale
Install the AWS EBS autoscaler daemon (default: do not install the AWS EBS autoscaler).
--aws-associate-public-ip <true|false>
Associate a public IP address with the node used for building the compute image.
Required when the machine running packer can not reach the node used for building
the compute image via its private IP. (default: true if building for AWS)
Note: if the subnet has "Auto-assign public IPv4 address" enabled, disabling this
flag will have no effect.
--aws-ena-support <true|false>
Enable enhanced networking (default: true if building for AWS)
--gcp-project-id <project-id>
GCP project id (required if building for GCP)
--gcp-account-file <path>
GCP account file (required if building for GCP)
--gcp-zone <zone> (default: us-central1-f)
GCP zone
--azure-secrets-file <patch>
Azure secrets file which will be sourced from this script (required if building for Azure)
--azure-resource-group <resouce-group>
Azure resource group (required if building for Azure)
--azure-location <location>
Azure location, e.g. centralus, eastus, westeurope (required if building for Azure)
--azure-sku <sku> (required if building for Azure, e.g. 16.04-LTS)
Azure SKU image to use
--ssh_user <user> (default: packer)
The user packer will use to log into the image
--workdir <path> (default: /tmp)
The directory where data files are staged and setup scripts are run
--resolver <resolver_IP>
The dns resolver for the machine (default: host's network provided)
--reposuffix <suffix>
Set this to "-dev" to track the unstable/dev Arvados repositories
--pin-packages, --no-pin-packages
These flags determine whether or not to configure apt pins for Arvados
and third-party packages it depends on. By default packages are pinned
unless you set `--reposuffix -dev`.
--public-key-file <path>
Path to the public key file that a-d-c will use to log into the compute node (required)
--mksquashfs-mem (default: 256M)
Only relevant when using Singularity. This is the amount of memory mksquashfs is allowed to use.
--nvidia-gpu-support
Install all the necessary tooling for Nvidia GPU support (default: do not install Nvidia GPU support)
--debug
Output debug information (default: no debug output is printed)
The following sections highlight common configuration settings with more information.
Compute nodes must be able to resolve the hostnames of the API server and any keepstore servers to your internal IP addresses. If you are on AWS and using Route 53 for your DNS, the default resolver configuration can be used with no extra options.
You can also run your own internal DNS resolver. In that case, the IP address of the resolver should be passed as the value for the --resolver
argument to the build script.
As a third option, the services could be hardcoded into an /etc/hosts
file. For example:
10.20.30.40 ClusterID.example.com
10.20.30.41 keep1.ClusterID.example.com
10.20.30.42 keep2.ClusterID.example.com
Adding these lines to the /etc/hosts
file in the compute node image could be done with a small change to the Packer template and the scripts/base.sh
script, which will be left as an exercise for the reader.
If you plan on using instance types with NVIDIA GPUs, add --nvidia-gpu-support
to the build command line. Arvados uses the same compute image for both GPU and non-GPU instance types. The GPU tooling is ignored when using the image with a non-GPU instance type.
This section is only relevant when using Singularity. Skip this section when using Docker.
Docker images are converted on the fly by mksquashfs
, which can consume a considerable amount of RAM. The RAM usage of mksquashfs can be restricted in /etc/singularity/singularity.conf
with a line like mksquashfs mem = 256M
. The amount of memory made available for mksquashfs should be configured lower than the smallest amount of memory requested by a container on the cluster to avoid the conversion being killed for using too much memory. The default memory allocation in CWL is 256M, so that is also a good choice for the mksquashfs mem
setting.
The desired amount of memory to make available for mksquashfs
can be configured in an argument to the build script. It defaults to 256M
.
By default, unless your image uses development packages with --reposuffix -dev
, the build script configures apt with version pins for Arvados and third-party packages it depends on. This ensures that everything apt installs has been tested and is known to work with your version of Arvados. The version pins have some flexibility to allow apt to install security updates and other changes that are unlikely to interfere with Arvados.
You can explicitly control whether or not these version pins are configured with the --pin-packages
and --no-pin-packages
flags. You should only need to use these flags if you are doing development work on the compute image and specifically want to test compatibility with different versions.
For ClusterID
, fill in your cluster ID.
AWSProfile
is the name of an AWS profile in your credentials file (~/.aws/credentials
) listing the aws_access_key_id
and aws_secret_access_key
to use.
The AMI
is the identifier for the base image to be used. Current AMIs are maintained by Debian and Ubuntu.
The VPC
and Subnet
should be configured for where you want the compute image to be generated and stored.
ArvadosDispatchCloudPublicKeyPath
should be replaced with the path to the ssh public key file generated in Create an SSH keypair, above.
~$ ./build.sh --json-file arvados-images-aws.json \
--arvados-cluster-id ClusterID \
--aws-profile AWSProfile \
--aws-source-ami AMI \
--aws-vpc-id VPC \
--aws-subnet-id Subnet \
--ssh_user admin \
--public-key-file ArvadosDispatchCloudPublicKeyPath
Arvados supports AWS EBS autoscaler. This feature automatically expands the scratch space on the compute node on demand by 200 GB at a time, up to 5 TB.
If you want to add the daemon in your images, add the --aws-ebs-autoscale
flag to the the build script.
The AWS EBS autoscaler daemon will be installed with this configuration:
{
"mountpoint": "/tmp",
"filesystem": "lvm.ext4",
"lvm": {
"volume_group": "autoscale_vg",
"logical_volume": "autoscale_lv"
},
"volume": {
"type": "gp3",
"iops": 3000,
"encrypted": 1
},
"detection_interval": 2,
"limits": {
"max_ebs_volume_size": 1500,
"max_logical_volume_size": 8000,
"max_ebs_volume_count": 16
},
"logging": {
"log_file": "/var/log/ebs-autoscale.log",
"log_interval": 300
}
}
Changing the ebs-autoscale configuration is left as an exercise for the reader.
This feature also requires a few Arvados configuration changes, described in EBS Autoscale configuration.
~$ ./build.sh --json-file arvados-images-azure.json \
--arvados-cluster-id ClusterID \
--azure-resource-group ResourceGroup \
--azure-location AzureRegion \
--azure-sku AzureSKU \
--azure-secrets-file AzureSecretsFilePath \
--resolver ResolverIP \
--public-key-file ArvadosDispatchCloudPublicKeyPath
For ClusterID
, fill in your cluster ID. The ResourceGroup
and AzureRegion
(e.g. ‘eastus2’) should be configured for where you want the compute image to be generated and stored. The AzureSKU
is the SKU of the base image to be used, e.g. ‘18.04-LTS’ for Ubuntu 18.04.
AzureSecretsFilePath
should be replaced with the path to a shell script that loads the Azure secrets with sufficient permissions to create the image. The file would look like this:
export ARM_CLIENT_ID=...
export ARM_CLIENT_SECRET=...
export ARM_SUBSCRIPTION_ID=...
export ARM_TENANT_ID=...
These secrets can be generated from the Azure portal, or with the cli using a command like this:
~$ az ad sp create-for-rbac --name Packer --password ...
ArvadosDispatchCloudPublicKeyPath
should be replaced with the path to the ssh public key file generated in Create an SSH keypair, above.
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.