Arvados can be configured to run containers on an HPC cluster using Slurm or LSF, as an alternative to dispatching to cloud VMs.
In this configuration, the appropriate Arvados dispatcher service — crunch-dispatch-slurm
or arvados-dispatch-lsf
— picks up each container as it appears in the Arvados queue and submits a short shell script as a batch job to the HPC job queue. The shell script executes the crunch-run
container supervisor which retrieves the container specification from the Arvados controller, starts an arv-mount process, runs the container using docker exec
or singularity exec
, and sends updates (logs, outputs, exit code, etc.) back to the Arvados controller.
The crunch-run program runs a gateway server to facilitate the “container shell” feature. However, depending on the site’s network topology, the Arvados controller may not be able to connect directly to the compute node where a given crunch-run process is running.
Instead, in the HPC configuration, crunch-run connects to the Arvados controller at startup and sets up a multiplexed tunnel, allowing the controller process to connect to crunch-run’s gateway server without initiating a connection to the compute node, or even knowing the compute node’s IP address.
This means that when a client requests a container shell connection, the traffic goes through two or three servers:
The API.MaxConcurrentRequests
configuration should not be set too low, or the long-lived tunnel connections can starve other clients.
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.