Build a cloud compute node image

Note:

arvados-dispatch-cloud is only relevant for cloud installations. Skip this section if you are installing an on premises cluster that will spool jobs to Slurm or LSF.

Introduction
Create an SSH keypair
Compute image requirements
The build script
DNS resolution
Singularity mksquashfs configuration
Build an AWS image
1. Autoscaling compute node scratch space
Build an Azure image

Introduction

This page describes how to build a compute node image that can be used to run containers dispatched by Arvados in the cloud.

Packer templates for AWS and Azure are provided with Arvados. To use them, the following are needed:

Packer
credentials for your cloud account
configuration details for your cloud account

Create a SSH keypair

arvados-dispatch-cloud communicates with the compute nodes via SSH. To do this securely, a SSH keypair is needed.

Generate a SSH keypair with no passphrase. The private key needs to be stored in the cluster configuration file (see Containers/DispatchPrivateKey) for use by arvados-dispatch-cloud, as described in the next section. The public key will be baked into the compute node images, see the cloud-specific documentation below.

~$ ssh-keygen -N '' -f ~/.ssh/id_dispatcher
Generating public/private rsa key pair.
Your identification has been saved in /home/user/.ssh/id_dispatcher.
Your public key has been saved in /home/user/.ssh/id_dispatcher.pub.
The key fingerprint is:
[...]
~$ cat ~/.ssh/id_dispatcher
-----BEGIN RSA PRIVATE KEY-----
MIIEpQIBAAKCAQEAqXoCzcOBkFQ7w4dvXf9B++1ctgZRqEbgRYL3SstuMV4oawks
ttUuxJycDdsPmeYcHsKo8vsEZpN6iYsX6ZZzhkO5nEayUTU8sBjmg1ZCTo4QqKXr
...
oFyAjVoexx0RBcH6BveTfQtJKbktP1qBO4mXo2dP0cacuZEtlAqW9Eb06Pvaw/D9
foktmqOY8MyctzFgXBpGTxPliGjqo8OkrOyQP2g+FL7v+Km31Xs61P8=
-----END RSA PRIVATE KEY-----

Compute image requirements

Arvados comes with a build script to automate the creation of a suitable compute node image (see The build script below). It is provided as a convenience. It is also possible to create a compute node image via other means. These are the requirements:

for AWS: the SSH public key for arvados-dispatch-cloud (the one that corresponds with Containers.DispatchPrivateKey in the Arvados config file) needs to go into ~/.ssh/authorized_keys for the SSH user you want arvados-dispatch-cloud to use (cf. CloudVMs.DriverParameters.AdminUsername in the Arvados config file) and that user needs to be able to sudo without password prompt, unless you use `root` in which case sudo is not used.
for Azure: arvados-dispatch-cloud automatically extracts the SSH public key from the value of Containers.DispatchPrivateKey and uses an API call to create the user specified in CloudVMs.DriverParameters.AdminUsername with that SSH public key and password-less sudo enabled.
SSH needs to be running and reachable by arvados-dispatch-cloud on port 22 (or a custom port, see CloudVMS.SSHPort to in the Arvados config file)
the python3-arvados-fuse package needs to be installed
Docker or Singularity needs to be installed (cf. Containers.RuntimeEngine in the Arvados config file).
all available scratch space should be made available under `/tmp`.

The build script

The necessary files are located in the arvados/tools/compute-images directory in the source tree. A build script is provided to generate the image. The --help argument lists all available options:

~$ ./build.sh --help
build.sh: Build cloud images for arvados-dispatch-cloud

Syntax:
        build.sh [options]

Options:

  --json-file (required)
      Path to the packer json file
  --arvados-cluster-id (required)
      The ID of the Arvados cluster, e.g. zzzzz
  --aws-profile (default: false)
      AWS profile to use (valid profile from ~/.aws/config
  --aws-secrets-file (default: false, required if building for AWS)
      AWS secrets file which will be sourced from this script
  --aws-source-ami (default: false, required if building for AWS)
      The AMI to use as base for building the images
  --aws-region (default: us-east-1)
      The AWS region to use for building the images
  --aws-vpc-id (optional)
      VPC id for AWS, otherwise packer will pick the default one
  --aws-subnet-id
      Subnet id for AWS otherwise packer will pick the default one for the VPC
  --aws-ebs-autoscale (default: false)
      Install the AWS EBS autoscaler daemon.
  --gcp-project-id (default: false, required if building for GCP)
      GCP project id
  --gcp-account-file (default: false, required if building for GCP)
      GCP account file
  --gcp-zone (default: us-central1-f)
      GCP zone
  --azure-secrets-file (default: false, required if building for Azure)
      Azure secrets file which will be sourced from this script
  --azure-resource-group (default: false, required if building for Azure)
      Azure resource group
  --azure-location (default: false, required if building for Azure)
      Azure location, e.g. centralus, eastus, westeurope
  --azure-sku (default: unset, required if building for Azure, e.g. 16.04-LTS)
      Azure SKU image to use
  --ssh_user  (default: packer)
      The user packer will use to log into the image
  --resolver (default: host's network provided)
      The dns resolver for the machine
  --reposuffix (default: unset)
      Set this to "-dev" to track the unstable/dev Arvados repositories
  --public-key-file (required)
      Path to the public key file that a-d-c will use to log into the compute node
  --mksquashfs-mem (default: 256M)
      Only relevant when using Singularity. This is the amount of memory mksquashfs is allowed to use.
  --debug
      Output debug information (default: false)

DNS resolution

Compute nodes must be able to resolve the hostnames of the API server and any keepstore servers to your internal IP addresses. You can do this by running an internal DNS resolver. The IP address of the resolver should be passed as the value for the --resolver argument to the build script.

Alternatively, the services could be hardcoded into an /etc/hosts file. For example:

10.20.30.40     ClusterID.example.com
10.20.30.41     keep1.ClusterID.example.com
10.20.30.42     keep2.ClusterID.example.com

Adding these lines to the /etc/hosts file in the compute node image could be done with a small change to the Packer template and the scripts/base.sh script, which will be left as an exercise for the reader.

Singularity mksquashfs configuration

Note:

This section is only relevant when using Singularity. Skip this section when using Docker.

Docker images are converted on the fly by mksquashfs, which can consume a considerable amount of RAM. The RAM usage of mksquashfs can be restricted in /etc/singularity/singularity.conf with a line like mksquashfs mem = 256M. The amount of memory made available for mksquashfs should be configured lower than the smallest amount of memory requested by a container on the cluster to avoid the conversion being killed for using too much memory. The default memory allocation in CWL is 256M, so that is also a good choice for the mksquashfs mem setting.

The desired amount of memory to make available for mksquashfs can be configured in an argument to the build script. It defaults to 256M.

Build an AWS image

~$ ./build.sh --json-file arvados-images-aws.json \
           --arvados-cluster-id ClusterID \
           --aws-profile AWSProfile \
           --aws-source-ami AMI \
           --aws-vpc-id VPC \
           --aws-subnet-id Subnet \
           --ssh_user admin \
           --resolver ResolverIP \
           --public-key-file ArvadosDispatchCloudPublicKeyPath

For ClusterID, fill in your cluster ID. The VPC and Subnet should be configured for where you want the compute image to be generated and stored. The AMI is the identifier for the base image to be used. Current AMIs are maintained by Debian and Ubuntu.

AWSProfile should be replaced with the name of an AWS profile with sufficient permissions to create the image.

ArvadosDispatchCloudPublicKeyPath should be replaced with the path to the ssh public key file generated in Create an SSH keypair, above.

Autoscaling compute node scratch space

If you want to add the AWS EBS autoscaler daemon in your images, add the --aws-ebs-autoscale flag to the the build script. Doing so will make the compute image scratch space scale automatically as needed.

The AWS EBS autoscaler daemon will be installed with this configuration:

{
    "mountpoint": "/tmp",
    "filesystem": "lvm.ext4",
    "lvm": {
      "volume_group": "autoscale_vg",
      "logical_volume": "autoscale_lv"
    },
    "volume": {
        "type": "gp3",
        "iops": 3000,
        "encrypted": 1
    },
    "detection_interval": 2,
    "limits": {
        "max_ebs_volume_size": 1500,
        "max_logical_volume_size": 8000,
        "max_ebs_volume_count": 16
    },
    "logging": {
        "log_file": "/var/log/ebs-autoscale.log",
        "log_interval": 300
    }
}

Changing the configuration is left as an exercise for the reader.

Using this feature also requires a few Arvados configuration changes in config.yml:

The Containers/InstanceTypes list should be modified so that all AddedScratch lines are removed, and the IncludedScratch value should be set to a (fictional) high number. This way, the scratch space requirements will be met by all the defined instance type. For example:

    InstanceTypes:
      c5large:
        ProviderType: c5.large
        VCPUs: 2
        RAM: 4GiB
        IncludedScratch: 16TB
        Price: 0.085
      m5large:
        ProviderType: m5.large
        VCPUs: 2
        RAM: 8GiB
        IncludedScratch: 16TB
        Price: 0.096
...

You will also need to create an IAM role in AWS with these permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachVolume",
                "ec2:DescribeVolumeStatus",
                "ec2:DescribeVolumes",
                "ec2:DescribeTags",
                "ec2:ModifyInstanceAttribute",
                "ec2:DescribeVolumeAttribute",
                "ec2:CreateVolume",
                "ec2:DeleteVolume",
                "ec2:CreateTags"
            ],
            "Resource": "*"
        }
    ]
}

Then, in config.yml set Containers/CloudVMs/DriverParameters/IAMInstanceProfile to the name of the IAM role. This will make arvados-dispatch-cloud pass an IAMInstanceProfile to the compute nodes as they start up, giving them sufficient permissions to attach and grow EBS volumes.

Build an Azure image

~$ ./build.sh --json-file arvados-images-azure.json \
           --arvados-cluster-id ClusterID \
           --azure-resource-group ResourceGroup \
           --azure-location AzureRegion \
           --azure-sku AzureSKU \
           --azure-secrets-file AzureSecretsFilePath \
           --resolver ResolverIP \
           --public-key-file ArvadosDispatchCloudPublicKeyPath

For ClusterID, fill in your cluster ID. The ResourceGroup and AzureRegion (e.g. ‘eastus2’) should be configured for where you want the compute image to be generated and stored. The AzureSKU is the SKU of the base image to be used, e.g. ‘18.04-LTS’ for Ubuntu 18.04.

AzureSecretsFilePath should be replaced with the path to a shell script that loads the Azure secrets with sufficient permissions to create the image. The file would look like this:

export ARM_CLIENT_ID=...
export ARM_CLIENT_SECRET=...
export ARM_SUBSCRIPTION_ID=...
export ARM_TENANT_ID=...

These secrets can be generated from the Azure portal, or with the cli using a command like this:

~$ az ad sp create-for-rbac --name Packer --password ...

ArvadosDispatchCloudPublicKeyPath should be replaced with the path to the ssh public key file generated in Create an SSH keypair, above.

Previous: Install arvados/jobs image Next: Install the cloud dispatcher