Customizing Crunch environment using Docker

This page describes how to customize the runtime environment (e.g., the programs, libraries, and other dependencies needed to run a job) that a crunch script will be run in using Docker. Docker is a tool for building and running containers that isolate applications from other applications running on the same node. For detailed information about Docker, see the Docker User Guide.

This page will demonstrate how to:

  1. Fetch the arvados/jobs Docker image
  2. Manually install additional software into the container
  3. Create a new custom image
  4. Upload that image to Arvados for use by Crunch jobs
  5. Share your image with others


This tutorial assumes that you have installed the Arvados Command line SDK and Python SDK on your workstation and have a working environment.

You also need ensure that Docker is installed, the Docker daemon is running, and you have permission to access Docker. You can test this by running docker version. If you receive a permission denied error, your user account may need to be added to the docker group. If you have root access, you can add yourself to the docker group using $ sudo addgroup $USER docker then log out and log back in again; otherwise consult your local sysadmin.

Fetch a starting image

The easiest way to begin is to start from the “arvados/jobs” image which already has the Arvados SDK installed along with other configuration required for use with Crunch.

Download the latest “arvados/jobs” image from the Docker registry:

$ docker pull arvados/jobs:latest
Pulling repository arvados/jobs
3132168f2acb: Download complete
a42b7f2c59b6: Download complete
e5afdf26a7ae: Download complete
5cae48636278: Download complete
7a4f91b70558: Download complete
a04a275c1fd6: Download complete
c433ff206a22: Download complete
b2e539b45f96: Download complete
073b2581c6be: Download complete
593915af19dc: Download complete
32260b35005e: Download complete
6e5b860c1cde: Download complete
95f0bfb43d4d: Download complete
c7fd77eedb96: Download complete
0d7685aafd00: Download complete

Install new packages

Next, enter the container using docker run, providing the arvados/jobs image and the program you want to run (in this case the bash shell).

$ docker run --interactive --tty --user root arvados/jobs /bin/bash

Next, update the package list using apt-get update.

root@fbf1d0f529d5:/# apt-get update
Hit jessie/updates InRelease
Ign jessie InRelease
Ign jessie InRelease
Hit jessie Release.gpg
Get:1 jessie/updates/main amd64 Packages [431 kB]
Hit jessie Release
Hit jessie-updates InRelease
Get:2 jessie/main amd64 Packages [257 kB]
Get:3 jessie-updates/main amd64 Packages [17.6 kB]
Hit jessie Release.gpg
Hit jessie Release
Get:4 jessie/main amd64 Packages [9049 kB]
Fetched 9755 kB in 2s (3894 kB/s)
Reading package lists... Done

In this example, we will install the “R” statistical language Debian package “r-base-core”. Use apt-get install:

root@fbf1d0f529d5:/# apt-get install r-base-core
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  libxxf86vm1 make patch r-base-core r-base-dev r-cran-boot r-cran-class
  r-cran-cluster r-cran-codetools r-cran-foreign r-cran-kernsmooth
  r-cran-lattice r-cran-mass r-cran-matrix r-cran-mgcv r-cran-nlme r-cran-nnet
  r-cran-rpart r-cran-spatial r-cran-survival r-doc-html r-recommended
Suggested packages:
The following NEW packages will be installed:
  libxxf86vm1 make patch r-base-core r-base-dev r-cran-boot r-cran-class
  r-cran-cluster r-cran-codetools r-cran-foreign r-cran-kernsmooth
  r-cran-lattice r-cran-mass r-cran-matrix r-cran-mgcv r-cran-nlme r-cran-nnet
  r-cran-rpart r-cran-spatial r-cran-survival r-doc-html r-recommended
0 upgraded, 203 newly installed, 0 to remove and 39 not upgraded.
Need to get 124 MB of archives.
After this operation, 334 MB of additional disk space will be used.
Do you want to continue [Y/n]? y
Get:130 jessie/main r-cran-cluster amd64 1.15.3-1 [475 kB]
Get:131 jessie/main r-base-dev all 3.1.1-1 [4018 B]
Get:132 jessie/main r-cran-boot all 1.3-13-1 [571 kB]
Get:133 jessie/main r-cran-codetools all 0.2-9-1 [45.7 kB]
Get:134 jessie/main r-cran-rpart amd64 4.1-8-1 [862 kB]
Get:135 jessie/main r-cran-foreign amd64 0.8.61-1 [213 kB]
Fetched 124 MB in 52s (2380 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Unpacking r-base-core (3.1.1-1+b2) ...
Selecting previously unselected package r-base-dev.
Preparing to unpack .../r-base-dev_3.1.1-1_all.deb ...
Unpacking r-base-dev (3.1.1-1) ...
Selecting previously unselected package r-cran-boot.
Preparing to unpack .../r-cran-boot_1.3-13-1_all.deb ...
Unpacking r-cran-boot (1.3-13-1) ...
Selecting previously unselected package r-cran-mass.
Setting up r-base-core (3.1.1-1+b2) ...

Creating config file /etc/R/Renviron with new version
Setting up r-base-dev (3.1.1-1) ...
Setting up r-cran-boot (1.3-13-1) ...
Setting up r-cran-mass (7.3-34-1) ...
Setting up r-cran-class (7.3-11-1) ...

Now we can verify that “R” is installed:

root@fbf1d0f529d5:/# R

R version 3.1.1 (2014-07-10) -- "Sock it to Me"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


Note that you are not limited to installing Debian packages. You may compile programs or libraries from source and install them, edit systemwide configuration files, use other package managers such as pip or gem, and perform any other customization necessary to run your program.

Create a new image

We’re now ready to create a new Docker image. First, quit the container, then use docker commit to create a new image from the stopped container. The container id can be found in the default hostname of the container displayed in the prompt, in this case fbf1d0f529d5:

root@fbf1d0f529d5:/# exit
$ docker commit fbf1d0f529d5 arvados/jobs-with-r
$ docker images
$ docker images |head
REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE
arvados/jobs-with-r   latest              2818853ff9f9        9 seconds ago       703.1 MB
arvados/jobs          latest              12b9f859d48c        4 days ago          362 MB

Upload your image

Finally, we are ready to upload the new Docker image to Arvados. Use arv-keepdocker with the image repository name to upload the image. Without arguments, arv-keepdocker will print out the list of Docker images in Arvados that are available to you.

$ arv-keepdocker arvados/jobs-with-r
703M / 703M 100.0%
Collection saved as 'Docker image arvados/jobs-with-r:latest 2818853ff9f9'
$ arv-keepdocker
REPOSITORY                      TAG         IMAGE ID      COLLECTION                     CREATED
arvados/jobs-with-r             latest      2818853ff9f9  qr1hi-4zz18-abcdefghijklmno    Tue Jan 17 20:35:53 2017

You are now able to specify the runtime environment for your program using DockerRequirement in your workflow:

    dockerPull: arvados/jobs-with-r

Share Docker images

Docker images are subject to normal Arvados permissions. If wish to share your Docker image with others (or wish to share a pipeline template that uses your Docker image) you will need to use arv-keepdocker with the --project-uuid option to upload the image to a shared project.

$ arv-keepdocker --project-uuid qr1hi-j7d0g-xxxxxxxxxxxxxxx arvados/jobs-with-r

Previous: Arvados CWL Extensions Next: Linking alternate login accounts

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.