Single host Arvados

  1. Limitations of the single host install
  2. Prerequisites and planning
  3. Download the installer
  4. Edit local.params
  5. Choose the SSL configuration
    1. Using a self-signed certificate
    2. Using a Let’s Encrypt certificate
    3. Bring your own certificate
  6. Configure your authentication provider
  7. Further customization of the installation
  8. Begin installation
  9. Install the CA root certificate
  10. Confirm the cluster is working
  11. Initial user and login
  12. After the installation

Limitations of the single host install

NOTE: The single host installation is a good choice for evaluating Arvados, but it is not recommended for production use.

Using the default configuration, the single host install has scaling limitations compared to a production multi-host install:

  • It uses the local disk for Keep storage (under the /var/lib/arvados directory).
  • It uses the crunch-dispatch-local dispatcher, which has a limit of eight concurrent jobs.
  • Because jobs and Arvados services all run on the same machine, they will compete for CPU/RAM resources.

Prerequisites and planning

Cluster ID and base domain

Choose a 5-character cluster identifier that will represent the cluster. Here are guidelines on choosing a cluster identifier . Only lowercase letters and digits 0-9 are allowed. Examples will use xarv1 or ${CLUSTER}, you should substitute the cluster id you have selected.

Determine if you will use a single hostname, or multiple hostnames.

  • Single hostname is simpler to set up and can even be used without a hostname at all, just a bare IP address.
  • Multiple hostnames is more similar to the recommended production configuration may make it easier to migrate to a multi-host production configuration in the future, but is more complicated as it requires adding a number of DNS entries.

If you are using multiple hostnames, determine the base domain for the cluster. This will be referred to as ${DOMAIN}.

For example, if CLUSTER is xarv1 and DOMAIN is example.com, then controller.${CLUSTER}.${DOMAIN}" means controller.xarv1.example.com.

Machine specification

You will need a dedicated (virtual) machine for your Arvados server with at least 2 cores and 8 GiB of RAM (4+ cores / 16+ GiB recommended if you are running workflows) running a supported Linux distribution:

Supported Linux Distributions
CentOS 7
Debian 11 (“bullseye”)
Debian 10 (“buster”)
Ubuntu 20.04 (“focal”)
Ubuntu 18.04 (“bionic”)

Arvados packages are published for current Debian releases (until the EOL date), current Ubuntu LTS releases (until the end of standard support), and the latest version of CentOS.

Note: if you want to try out Arvados inside a Docker container, use Arvbox. The package-based install method uses systemd to manage services; lightweight container images generally lack an init system and other tools that the installer requires.

The single host install stores user data in a PostgreSQL database (usually found under /var/lib/postgresql) and as Keep blocks that are stored as files under /var/lib/arvados/.
Arvados logs are also kept in /var/log and /var/www/arvados-api/shared/log. Accordingly, you should ensure that the disk partition containing /var has adequate storage for your planned usage. We suggest starting with at least 50GiB of free space.

DNS hostnames for each service (multi-hostname only)

If you are using a single hostname for all services (they will be distingushed by listening port), you can skip this section.

If you are using the multi-hostname configuration, you will need a DNS entry for each service. If you are using “bring-your-own” TLS certificates, your certificate will need to include all of these hostnames.

In the default configuration these are:

  1. controller.${CLUSTER}.${DOMAIN}
  2. ws.${CLUSTER}.${DOMAIN}
  3. keep0.${CLUSTER}.${DOMAIN}
  4. keep1.${CLUSTER}.${DOMAIN}
  5. keep.${CLUSTER}.${DOMAIN}
  6. download.${CLUSTER}.${DOMAIN}
  7. *.collections.${CLUSTER}.${DOMAIN} — important note, this must be a wildcard DNS, resolving to the keepweb service
  8. workbench.${CLUSTER}.${DOMAIN}
  9. workbench2.${CLUSTER}.${DOMAIN}
  10. webshell.${CLUSTER}.${DOMAIN}
  11. shell.${CLUSTER}.${DOMAIN}

This is described in more detail in DNS entries and TLS certificates.

Additional prerequisites

  1. root or passwordless sudo access on the account where you are doing the install
    this usually means adding the account to the sudo group and having a rule like this in /etc/sudoers.d/arvados_passwordless that allows members of group sudo to execute any command without entering a password.
    %sudo ALL=(ALL:ALL) NOPASSWD:ALL
  2. git installed on the machine
  3. Port 443 reachable by clients
  4. For the single-host install, ports 8800-8805 also need to be reachable from your client (configurable in local.params, see below)
  5. When using Let’s Encrypt port 80 needs to be reachable from everywhere on the internet
  6. When using bring your own certificate you need TLS certificate(s) covering the hostname(s) used by Arvados

Download the installer

This is a package-based installation method, however the installation script is currently distributed in source form via git. We recommend checking out the git tree on your local workstation, not directly on the target(s) where you want to install and run Arvados.

git clone https://github.com/arvados/arvados.git
cd arvados
git checkout 2.5-release
cd tools/salt-install

The install.sh and provision.sh scripts will help you deploy Arvados by preparing your environment to be able to run the installer, then running it. The actual installer is located in the arvados-formula git repository and will be cloned during the running of the provision.sh script. The installer is built using Saltstack and provision.sh performs the install using masterless mode.

Initialize the installer

Replace “xarv1” with the cluster id you selected earlier.

This creates a git repository in ~/setup-arvados-xarv1. The installer.sh will record all the configuration changes you make, as well as using git push to synchronize configuration edits if you have multiple nodes.

Important! Once you have initialized the installer directory, all further commands must be run with ~/setup-arvados-${CLUSTER} as the current working directory.

Using Terraform (AWS specific)

If you are going to use Terraform to set up the infrastructure on AWS, you first need to install the Terraform CLI and the AWS CLI tool. Then you can initialize the installer.

CLUSTER=xarv1
./installer.sh initialize ~/setup-arvados-${CLUSTER} single_host_single_hostname single_host/single_hostname 
cd ~/setup-arvados-${CLUSTER}

Without Terraform

CLUSTER=xarv1
./installer.sh initialize ~/setup-arvados-${CLUSTER} single_host_single_hostname single_host/single_hostname
cd ~/setup-arvados-${CLUSTER}

If you are using multiple hostname configuration, substitute ‘multiple_hostnames’ where it says ‘single_hostname’ in the command above.

Edit local.params

This can be found wherever you choose to initialize the install files (~/setup-arvados-xarv1 in these examples).

  1. Set CLUSTER to the 5-character cluster identifier (e.g “xarv1”)
  2. Set DOMAIN to the base DNS domain of the environment, e.g. “example.com”
  3. Single hostname only: set IP_INT to the host’s IP address.
  4. Single hostname only: set HOSTNAME_EXT to the hostname that users will use to connect.
  5. Set INITIAL_USER_EMAIL to your email address, as you will be the first admin user of the system.
  6. Set each KEY / TOKEN to a random string
    Here’s an easy way to create five random tokens:
    for i in 1 2 3 4 5; do
      tr -dc A-Za-z0-9 </dev/urandom | head -c 32 ; echo ''
    done
    
  7. Set DATABASE_PASSWORD to a random string
    Important! If this contains any non-alphanumeric characters, in particular ampersand (‘&’), it is necessary to add backslash quoting.
    For example, if the password is Lq&MZ<V']d?j
    With backslash quoting the special characters it should appear like this in local.params:
    DATABASE_PASSWORD="Lq\&MZ\<V\'\]d\?j"

Choose the SSL configuration (SSL_MODE)

Arvados requires an SSL certificate to work correctly. This installer supports these options:

  • self-signed: let the installer create a self-signed certificate
  • lets-encrypt: automatically obtain and install an SSL certificate for your hostname
  • bring-your-own: supply your own certificate in the `certs` directory

Using a self-signed certificate

In the default configuration, this installer uses self-signed certificate(s):

SSL_MODE="self-signed"

This works everywhere and does not require that you have a domain name. However, after installation, users will need to install the self-signed root certificate in the browser."

Using a Let’s Encrypt certificate

To automatically get a valid certificate via Let’s Encrypt, change the configuration like this:

SSL_MODE="lets-encrypt"

This requires that you have a “real” hostname that you control. The hostname for your Arvados cluster must be defined in HOSTNAME_EXT and resolve to the public IP address of your Arvados instance, so that Let’s Encrypt can validate the domainname ownership and issue the certificate.

When using AWS, EC2 instances can have a default hostname that ends with amazonaws.com. Let’s Encrypt has a blacklist of domain names for which it will not issue certificates, and that blacklist includes the amazonaws.com domain, which means the default hostname can not be used to get a certificate from Let’s Encrypt.

Bring your own certificate

To supply your own certificate, change the configuration like this:

SSL_MODE="bring-your-own"

Copy your certificate files to the directory specified with the variable CUSTOM_CERTS_DIR. The provision script will find it there. The certificate and its key need to be copied to a file named after HOSTNAME_EXT. For example, if HOSTNAME_EXT is defined as my-arvados.example.net, the script will look for

${CUSTOM_CERTS_DIR}/my-arvados.example.net.crt
${CUSTOM_CERTS_DIR}/my-arvados.example.net.key

All certificate files will be used by nginx. You may need to include intermediate certificates in your certificate file. See the nginx documentation for more details.

Configure your authentication provider (optional, recommended)

By default, the installer will use the “Test” provider, which is a list of usernames and cleartext passwords stored in the Arvados config file. This is low security configuration and you are strongly advised to configure one of the other supported authentication methods .

Further customization of the installation (optional)

If you want to customize the behavior of Arvados, this may require editing the Saltstack pillars and states files found in local_config_dir. In particular, local_config_dir/pillars/arvados.sls contains the template (in the arvados.cluster section) used to produce the Arvados configuration file. Consult the Configuration reference for a comprehensive list of configuration keys.

Any extra Salt “state” files you add under local_config_dir/states will be added to the Salt run and applied to the hosts.

Begin installation

At this point, you are ready to run the installer script in deploy mode that will conduct all of the Arvados installation.

Run this in the ~/arvados-setup-xarv1 directory:

./installer.sh deploy

Install the CA root certificate (SSL_MODE=self-signed only)

If you are not using self-signed certificates (you selected SSL_MODE=lets-encrypt or SSL_MODE=bring-your-own), skip this section.

Arvados uses SSL to encrypt communications. The web interface uses AJAX which will silently fail if the certificate is not valid or signed by an unknown Certification Authority.

For this reason, the installer has the option to create its own a root certificate to authorize Arvados services. The installer script will leave a copy of the generated CA’s certificate (something like xarv1.example.com-arvados-snakeoil-ca.crt) in the script’s directory so you can add it to your workstation.

Web Browser

Installing the root certificate into your web browser will prevent security errors when accessing Arvados services with your web browser.

Chrome

  1. Go to “Settings → Privacy and Security → Security → Manage Certificates” or enter chrome://settings/certificates in the URL bar.
  2. Click on the “Authorities” tab (it is not selected by default)
  3. Click on the “Import” button
  4. Choose xarv1.example.com-arvados-snakeoil-ca.crt
  5. Tick the checkbox next to “Trust this certificate for identifying websites”
  6. Hit OK
  7. The certificate should appear in the list of Authorities under “Arvados”

Firefox

  1. Go to “Preferences → Privacy & Security” or enter about:preferences#privacy in the URL bar
  2. Scroll down to the Certificates section
  3. Click on the button “View Certificates…”.
  4. Make sure the “Authorities” tab is selected
  5. Press the “Import…” button.
  6. Choose xarv1.example.com-arvados-snakeoil-ca.crt
  7. Tick the checkbox next to “Trust this CA to identify websites”
  8. Hit OK
  9. The certificate should appear in the list of Authorities under “Arvados”

Other browsers (Safari, etc)

The process will be similar to that of Chrome and Firefox, but the exact user interface will be different. If you can’t figure it out, try searching for “how do I install a custom certificate authority in (my browser)”.

Installation on Linux OS certificate storage

To access your Arvados instance using command line clients (such as arv-get and arv-put) without security errors, install the certificate into the OS certificate storage.

Debian/Ubuntu

Important the certificate file added to ca-certificates must have the extension .crt or it won’t be recognized.

cp xarv1.example.com-arvados-snakeoil-ca.crt /usr/local/share/ca-certificates/arvados-snakeoil-ca.crt
/usr/sbin/update-ca-certificates

CentOS

cp xarv1.example.com-arvados-snakeoil-ca.crt /etc/pki/ca-trust/source/anchors/
/usr/bin/update-ca-trust

Confirm the cluster is working

When everything has finished, you can run the diagnostics. This requires the `arvados-client` package:

apt-get install arvados-client

Depending on where you are running the installer, you need to provide -internal-client or -external-client.

If you are running the diagnostics on the same machine where you installed Arvados, you want -internal-client .

You are an “external client” if you running the diagnostics from your workstation outside of the private network.

./installer.sh diagnostics (-internal-client|-external-client)

Debugging issues

The installer records log files for each deployment.

Most service logs go to /var/log/syslog.

The logs for Rails API server and for Workbench can be found in

/var/www/arvados-api/current/log/production.log
and
/var/www/arvados-workbench/current/log/production.log

on the appropriate instances.

Workbench 2 is a client-side Javascript application. If you are having trouble loading Workbench 2, check the browser’s developer console (this can be found in “Tools → Developer Tools”).

Iterating on config changes

You can iterate on the config and maintain the cluster by making changes to local.params and local_config_dir and running installer.sh deploy again.

Common problems and solutions

PG::UndefinedTable: ERROR: relation \“api_clients\” does not exist

The arvados-api-server package sets up the database as a post-install script. If the database host or password wasn’t set correctly (or quoted correctly) at the time that package is installed, it won’t be able to set up the database.

This will manifest as an error like this:

#<ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR:  relation \"api_clients\" does not exist

If this happens, you need to

  1. correct the database information
  2. run ./installer.sh deploy to update the configuration
  3. Log in to the server, then run this command to re-run the post-install script, which will set up the database:
    dpkg-reconfigure arvados-api-server
  4. Re-run ./installer.sh deploy again to synchronize everything, and so that the install steps that need to contact the API server are run successfully.

Initial user and login

At this point you should be able to log into the Arvados cluster. The initial URL for the single hostname install will use the hostname or IP address you put in HOSTNAME_EXT:

https://${HOSTNAME_EXT}

For the multi-hostname install, it will be:

https://workbench.${CLUSTER}.${DOMAIN}

If you did not configure a different authentication provider you will be using the “Test” provider, and the provision script creates an initial user for testing purposes. This user is configured as administrator of the newly created cluster. It uses the values of INITIAL_USER and INITIAL_USER_PASSWORD the local.params file.

If you did configure a different authentication provider, the first user to log in will automatically be given Arvados admin privileges.

After the installation

As part of the operation of installer.sh, it automatically creates a git repository with your configuration templates. You should retain this repository but be aware that it contains sensitive information (passwords and tokens used by the Arvados services).

As described in Iterating on config changes you may use installer.sh deploy to re-run the Salt to deploy configuration changes and upgrades. However, be aware that the configuration templates created for you by installer.sh are a snapshot which are not automatically kept up to date.

When deploying upgrades, consult the Arvados upgrade notes to see if changes need to be made to the configuration file template in local_config_dir/pillars/arvados.sls. To specify the version to upgrade to, set the VERSION parameter in local.params.

See also Maintenance and upgrading for more information.


Previous: Arvados-in-a-box Next: Multi-Host Arvados

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.