This page describes how to customize the runtime environment (e.g., the programs, libraries, and other dependencies needed to run a job) that a crunch script will be run in using Docker. Docker is a tool for building and running containers that isolate applications from other applications running on the same node. For detailed information about Docker, see the Docker User Guide.
This page will demonstrate how to:
This tutorial assumes that you have installed the Arvados Command line SDK and Python SDK on your workstation and have a working environment.
You also need ensure that Docker is installed, the Docker daemon is running, and you have permission to access Docker. You can test this by running
docker version. If you receive a permission denied error, your user account may need to be added to the
docker group. If you have root access, you can add yourself to the
docker group using
$ sudo addgroup $USER docker then log out and log back in again; otherwise consult your local sysadmin.
The easiest way to begin is to start from the “arvados/jobs” image which already has the Arvados SDK installed along with other configuration required for use with Crunch.
Download the latest “arvados/jobs” image from the Docker registry:
$ docker pull arvados/jobs:latest Pulling repository arvados/jobs 3132168f2acb: Download complete a42b7f2c59b6: Download complete e5afdf26a7ae: Download complete 5cae48636278: Download complete 7a4f91b70558: Download complete a04a275c1fd6: Download complete c433ff206a22: Download complete b2e539b45f96: Download complete 073b2581c6be: Download complete 593915af19dc: Download complete 32260b35005e: Download complete 6e5b860c1cde: Download complete 95f0bfb43d4d: Download complete c7fd77eedb96: Download complete 0d7685aafd00: Download complete
Next, enter the container using
docker run, providing the arvados/jobs image and the program you want to run (in this case the bash shell).
$ docker run --interactive --tty --user root arvados/jobs /bin/bash root@fbf1d0f529d5:/#
Next, update the package list using
root@fbf1d0f529d5:/# apt-get update Get:2 http://apt.arvados.org stretch-dev InRelease [3260 B] Get:1 http://security-cdn.debian.org/debian-security stretch/updates InRelease [94.3 kB] Ign:3 http://cdn-fastly.deb.debian.org/debian stretch InRelease Get:4 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease [91.0 kB] Get:5 http://apt.arvados.org stretch-dev/main amd64 Packages [208 kB] Get:6 http://cdn-fastly.deb.debian.org/debian stretch Release [118 kB] Get:7 http://security-cdn.debian.org/debian-security stretch/updates/main amd64 Packages [499 kB] Get:8 http://cdn-fastly.deb.debian.org/debian stretch Release.gpg [2434 B] Get:9 http://cdn-fastly.deb.debian.org/debian stretch-updates/main amd64 Packages.diff/Index [10.6 kB] Get:10 http://cdn-fastly.deb.debian.org/debian stretch-updates/main amd64 Packages 2019-07-08-0821.07.pdiff [445 B] Get:10 http://cdn-fastly.deb.debian.org/debian stretch-updates/main amd64 Packages 2019-07-08-0821.07.pdiff [445 B] Fetched 1026 kB in 0s (1384 kB/s) Reading package lists... Done
In this example, we will install the “R” statistical language Debian package “r-base-core”. Use
root@fbf1d0f529d5:/# apt-get install r-base-core Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: [...] done.
Now we can verify that “R” is installed:
root@fbf1d0f529d5:/# R R version 3.3.3 (2017-03-06) -- "Another Canoe" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. >
Note that you are not limited to installing Debian packages. You may compile programs or libraries from source and install them, edit systemwide configuration files, use other package managers such as
gem, and perform any other customization necessary to run your program.
We’re now ready to create a new Docker image. First, quit the container, then use
docker commit to create a new image from the stopped container. The container id can be found in the default hostname of the container displayed in the prompt, in this case
root@fbf1d0f529d5:/# exit $ docker commit fbf1d0f529d5 arvados/jobs-with-r sha256:2818853ff9f9af5d7f77979803baac9c4710790ad2b84c1a754b02728fdff205 $ docker images $ docker images |head REPOSITORY TAG IMAGE ID CREATED SIZE arvados/jobs-with-r latest 2818853ff9f9 9 seconds ago 703.1 MB arvados/jobs latest 12b9f859d48c 4 days ago 362 MB
Finally, we are ready to upload the new Docker image to Arvados. Use
arv-keepdocker with the image repository name to upload the image. Without arguments,
arv-keepdocker will print out the list of Docker images in Arvados that are available to you.
$ arv-keepdocker arvados/jobs-with-r 703M / 703M 100.0% Collection saved as 'Docker image arvados/jobs-with-r:latest 2818853ff9f9' qr1hi-4zz18-abcdefghijklmno $ arv-keepdocker REPOSITORY TAG IMAGE ID COLLECTION CREATED arvados/jobs-with-r latest 2818853ff9f9 qr1hi-4zz18-abcdefghijklmno Tue Jan 17 20:35:53 2017
You are now able to specify the runtime environment for your program using
DockerRequirement in your workflow:
hints: DockerRequirement: dockerPull: arvados/jobs-with-r
Docker images are subject to normal Arvados permissions. If wish to share your Docker image with others (or wish to share a pipeline template that uses your Docker image) you will need to use
arv-keepdocker with the
--project-uuid option to upload the image to a shared project.
$ arv-keepdocker arvados/jobs-with-r --project-uuid qr1hi-j7d0g-xxxxxxxxxxxxxxx
The content of this documentation is licensed under the
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.