NOTE: The single host installation is a good choice for evaluating Arvados, but it is not recommended for production use.
Using the default configuration, the single host install has scaling limitations compared to a production multi-host install:
/var/lib/arvados
directory).crunch-dispatch-local
dispatcher, which has a limit of eight concurrent jobs.Choose a 5-character cluster identifier that will represent the cluster. Here are guidelines on choosing a cluster identifier . Only lowercase letters and digits 0-9 are allowed. Examples will use xarv1
or ${CLUSTER}
, you should substitute the cluster id you have selected.
Determine if you will use a single hostname, or multiple hostnames.
If you are using multiple hostnames, determine the base domain for the cluster. This will be referred to as ${DOMAIN}
.
For example, if CLUSTER is xarv1
and DOMAIN is example.com
, then controller.${CLUSTER}.${DOMAIN}
" means controller.xarv1.example.com
.
You will need a dedicated (virtual) machine for your Arvados server with at least 2 cores and 8 GiB of RAM (4+ cores / 16+ GiB recommended if you are running workflows) running a supported Linux distribution:
Supported Linux Distributions |
---|
CentOS 7 |
Debian 11 (“bullseye”) |
Debian 10 (“buster”) |
Ubuntu 20.04 (“focal”) |
Ubuntu 18.04 (“bionic”) |
Arvados packages are published for current Debian releases (until the EOL date), current Ubuntu LTS releases (until the end of standard support), and the latest version of CentOS.
Note: if you want to try out Arvados inside a Docker container, use Arvbox. The package-based install method uses systemd
to manage services; lightweight container images generally lack an init system and other tools that the installer requires.
The single host install stores user data in a PostgreSQL database (usually found under /var/lib/postgresql
) and as Keep blocks that are stored as files under /var/lib/arvados/
.
Arvados logs are also kept in /var/log
and /var/www/arvados-api/shared/log
. Accordingly, you should ensure that the disk partition containing /var
has adequate storage for your planned usage. We suggest starting with at least 50GiB of free space.
If you are using a single hostname for all services (they will be distingushed by listening port), you can skip this section.
If you are using the multi-hostname configuration, you will need a DNS entry for each service. If you are using “bring-your-own” TLS certificates, your certificate will need to include all of these hostnames.
In the default configuration these are:
controller.${CLUSTER}.${DOMAIN}
ws.${CLUSTER}.${DOMAIN}
keep0.${CLUSTER}.${DOMAIN}
keep1.${CLUSTER}.${DOMAIN}
keep.${CLUSTER}.${DOMAIN}
download.${CLUSTER}.${DOMAIN}
*.collections.${CLUSTER}.${DOMAIN}
— important note, this must be a wildcard DNS, resolving to the keepweb
serviceworkbench.${CLUSTER}.${DOMAIN}
workbench2.${CLUSTER}.${DOMAIN}
webshell.${CLUSTER}.${DOMAIN}
shell.${CLUSTER}.${DOMAIN}
This is described in more detail in DNS entries and TLS certificates.
sudo
access on the account where you are doing the installsudo
group and having a rule like this in /etc/sudoers.d/arvados_passwordless
that allows members of group sudo
to execute any command without entering a password.%sudo ALL=(ALL:ALL) NOPASSWD:ALL
git
installed on the machinelocal.params
, see below)This is a package-based installation method, however the installation script is currently distributed in source form via git
. We recommend checking out the git tree on your local workstation, not directly on the target(s) where you want to install and run Arvados.
git clone https://github.com/arvados/arvados.git
cd arvados
git checkout 2.5-release
cd tools/salt-install
The install.sh
and provision.sh
scripts will help you deploy Arvados by preparing your environment to be able to run the installer, then running it. The actual installer is located in the arvados-formula git repository and will be cloned during the running of the provision.sh
script. The installer is built using Saltstack and provision.sh
performs the install using masterless mode.
Replace “xarv1” with the cluster id you selected earlier.
This creates a git repository in ~/setup-arvados-xarv1
. The installer.sh
will record all the configuration changes you make, as well as using git push
to synchronize configuration edits if you have multiple nodes.
Important! Once you have initialized the installer directory, all further commands must be run with ~/setup-arvados-${CLUSTER}
as the current working directory.
If you are going to use Terraform to set up the infrastructure on AWS, you first need to install the Terraform CLI and the AWS CLI tool. Then you can initialize the installer.
CLUSTER=xarv1
./installer.sh initialize ~/setup-arvados-${CLUSTER} single_host_single_hostname single_host/single_hostname
cd ~/setup-arvados-${CLUSTER}
CLUSTER=xarv1
./installer.sh initialize ~/setup-arvados-${CLUSTER} single_host_single_hostname single_host/single_hostname
cd ~/setup-arvados-${CLUSTER}
If you are using multiple hostname configuration, substitute ‘multiple_hostnames’ where it says ‘single_hostname’ in the command above.
local.params
This can be found wherever you choose to initialize the install files (~/setup-arvados-xarv1
in these examples).
CLUSTER
to the 5-character cluster identifier (e.g “xarv1”)DOMAIN
to the base DNS domain of the environment, e.g. “example.com”IP_INT
to the host’s IP address.HOSTNAME_EXT
to the hostname that users will use to connect.INITIAL_USER_EMAIL
to your email address, as you will be the first admin user of the system.KEY
/ TOKEN
to a random stringfor i in 1 2 3 4 5; do
tr -dc A-Za-z0-9 </dev/urandom | head -c 32 ; echo ''
done
DATABASE_PASSWORD
to a random stringLq&MZ<V']d?j
DATABASE_PASSWORD="Lq\&MZ\<V\'\]d\?j"
Arvados requires an SSL certificate to work correctly. This installer supports these options:
self-signed
: let the installer create a self-signed certificatelets-encrypt
: automatically obtain and install an SSL certificate for your hostnamebring-your-own
: supply your own certificate in the `certs` directoryIn the default configuration, this installer uses self-signed certificate(s):
SSL_MODE="self-signed"
This works everywhere and does not require that you have a domain name. However, after installation, users will need to install the self-signed root certificate in the browser."
To automatically get a valid certificate via Let’s Encrypt, change the configuration like this:
SSL_MODE="lets-encrypt"
This requires that you have a “real” hostname that you control. The hostname for your Arvados cluster must be defined in HOSTNAME_EXT
and resolve to the public IP address of your Arvados instance, so that Let’s Encrypt can validate the domainname ownership and issue the certificate.
When using AWS, EC2 instances can have a default hostname that ends with amazonaws.com. Let’s Encrypt has a blacklist of domain names for which it will not issue certificates, and that blacklist includes the amazonaws.com domain, which means the default hostname can not be used to get a certificate from Let’s Encrypt.
To supply your own certificate, change the configuration like this:
SSL_MODE="bring-your-own"
Copy your certificate files to the directory specified with the variable CUSTOM_CERTS_DIR
. The provision script will find it there. The certificate and its key need to be copied to a file named after HOSTNAME_EXT
. For example, if HOSTNAME_EXT
is defined as my-arvados.example.net
, the script will look for
${CUSTOM_CERTS_DIR}/my-arvados.example.net.crt
${CUSTOM_CERTS_DIR}/my-arvados.example.net.key
All certificate files will be used by nginx. You may need to include intermediate certificates in your certificate file. See the nginx documentation for more details.
By default, the installer will use the “Test” provider, which is a list of usernames and cleartext passwords stored in the Arvados config file. This is low security configuration and you are strongly advised to configure one of the other supported authentication methods .
If you want to customize the behavior of Arvados, this may require editing the Saltstack pillars and states files found in local_config_dir
. In particular, local_config_dir/pillars/arvados.sls
contains the template (in the arvados.cluster
section) used to produce the Arvados configuration file. Consult the Configuration reference for a comprehensive list of configuration keys.
Any extra Salt “state” files you add under local_config_dir/states
will be added to the Salt run and applied to the hosts.
At this point, you are ready to run the installer script in deploy mode that will conduct all of the Arvados installation.
Run this in the ~/arvados-setup-xarv1
directory:
./installer.sh deploy
If you are not using self-signed certificates (you selected SSL_MODE=lets-encrypt or SSL_MODE=bring-your-own), skip this section.
Arvados uses SSL to encrypt communications. The web interface uses AJAX which will silently fail if the certificate is not valid or signed by an unknown Certification Authority.
For this reason, the installer has the option to create its own a root certificate to authorize Arvados services. The installer script will leave a copy of the generated CA’s certificate (something like xarv1.example.com-arvados-snakeoil-ca.crt
) in the script’s directory so you can add it to your workstation.
Installing the root certificate into your web browser will prevent security errors when accessing Arvados services with your web browser.
chrome://settings/certificates
in the URL bar.xarv1.example.com-arvados-snakeoil-ca.crt
about:preferences#privacy
in the URL barxarv1.example.com-arvados-snakeoil-ca.crt
The process will be similar to that of Chrome and Firefox, but the exact user interface will be different. If you can’t figure it out, try searching for “how do I install a custom certificate authority in (my browser)”.
To access your Arvados instance using command line clients (such as arv-get
and arv-put
) without security errors, install the certificate into the OS certificate storage.
Important the certificate file added to ca-certificates
must have the extension .crt
or it won’t be recognized.
cp xarv1.example.com-arvados-snakeoil-ca.crt /usr/local/share/ca-certificates/arvados-snakeoil-ca.crt
/usr/sbin/update-ca-certificates
cp xarv1.example.com-arvados-snakeoil-ca.crt /etc/pki/ca-trust/source/anchors/
/usr/bin/update-ca-trust
When everything has finished, you can run the diagnostics. This requires the `arvados-client` package:
apt-get install arvados-client
Depending on where you are running the installer, you need to provide -internal-client
or -external-client
.
If you are running the diagnostics on the same machine where you installed Arvados, you want -internal-client
.
You are an “external client” if you running the diagnostics from your workstation outside of the private network.
./installer.sh diagnostics (-internal-client|-external-client)
The installer records log files for each deployment.
Most service logs go to /var/log/syslog
.
The logs for Rails API server and for Workbench can be found in
/var/www/arvados-api/current/log/production.log
and
/var/www/arvados-workbench/current/log/production.log
on the appropriate instances.
Workbench 2 is a client-side Javascript application. If you are having trouble loading Workbench 2, check the browser’s developer console (this can be found in “Tools → Developer Tools”).
You can iterate on the config and maintain the cluster by making changes to local.params
and local_config_dir
and running installer.sh deploy
again.
The arvados-api-server package sets up the database as a post-install script. If the database host or password wasn’t set correctly (or quoted correctly) at the time that package is installed, it won’t be able to set up the database.
This will manifest as an error like this:
#<ActiveRecord::StatementInvalid: PG::UndefinedTable: ERROR: relation \"api_clients\" does not exist
If this happens, you need to
./installer.sh deploy
to update the configurationdpkg-reconfigure arvados-api-server
./installer.sh deploy
again to synchronize everything, and so that the install steps that need to contact the API server are run successfully.At this point you should be able to log into the Arvados cluster. The initial URL for the single hostname install will use the hostname or IP address you put in HOSTNAME_EXT
:
https://${HOSTNAME_EXT}
For the multi-hostname install, it will be:
https://workbench.${CLUSTER}.${DOMAIN}
If you did not configure a different authentication provider you will be using the “Test” provider, and the provision script creates an initial user for testing purposes. This user is configured as administrator of the newly created cluster. It uses the values of INITIAL_USER
and INITIAL_USER_PASSWORD
the local.params
file.
If you did configure a different authentication provider, the first user to log in will automatically be given Arvados admin privileges.
As part of the operation of installer.sh
, it automatically creates a git
repository with your configuration templates. You should retain this repository but be aware that it contains sensitive information (passwords and tokens used by the Arvados services).
As described in Iterating on config changes you may use installer.sh deploy
to re-run the Salt to deploy configuration changes and upgrades. However, be aware that the configuration templates created for you by installer.sh
are a snapshot which are not automatically kept up to date.
When deploying upgrades, consult the Arvados upgrade notes to see if changes need to be made to the configuration file template in local_config_dir/pillars/arvados.sls
. To specify the version to upgrade to, set the VERSION
parameter in local.params
.
See also Maintenance and upgrading for more information.
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.