This page describes how to set up the runtime environment (e.g., the programs, libraries, and other dependencies needed to run a job) that a workflow step will be run in using Docker. Docker is a tool for building and running containers that isolate applications from other applications running on the same node. For detailed information about Docker, see the Docker User Guide.
This page describes:
This tutorial assumes that you have installed the Arvados Command line SDK and Python SDK on your workstation and have a working environment.
You also need ensure that Docker is installed, the Docker daemon is running, and you have permission to access Docker. You can test this by running docker version
. If you receive a permission denied error, your user account may need to be added to the docker
group. If you have root access, you can add yourself to the docker
group using $ sudo addgroup $USER docker
then log out and log back in again; otherwise consult your local sysadmin.
This example shows how to create a Docker image and add the R package.
First, create new directory called docker-example
, in that directory create a file called Dockerfile
.
$ mkdir docker-example-r-base
$ cd docker-example-r-base
FROM ubuntu:bionic
RUN apt-get update && apt-get -yq --no-install-recommends install r-base-core
The “RUN” command is executed inside the container and can be any shell command line. You are not limited to installing Debian packages. You may compile programs or libraries from source and install them, edit systemwide configuration files, use other package managers such as pip
or gem
, and perform any other customization necessary to run your program.
You can also visit the Docker tutorial for more information and examples.
You should add your Dockerfiles to the same source control repository as the Workflows that use them.
We’re now ready to create a new Docker image. Use docker build
to create a new image from the Dockerfile.
docker-example-r-base$ docker build -t docker-example-r-base .
Now we can verify that “R” is installed:
$ docker run -ti docker-example-r-base
root@57ec8f8b2663:/# R
R version 3.4.4 (2018-03-15) -- "Someone to Lean On"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Finally, we are ready to upload the new Docker image to Arvados. Use arv-keepdocker
with the image repository name to upload the image. Without arguments, arv-keepdocker
will print out the list of Docker images in Arvados that are available to you.
$ arv-keepdocker docker-example-r-base
2020-06-29 13:48:19 arvados.arv_put[769] INFO: Creating new cache file at /home/peter/.cache/arvados/arv-put/39ddb51ebf6c5fcb3d713b5969466967
206M / 206M 100.0% 2020-06-29 13:48:21 arvados.arv_put[769] INFO:
2020-06-29 13:48:21 arvados.arv_put[769] INFO: Collection saved as 'Docker image docker-example-r-base:latest sha256:edd10'
zzzzz-4zz18-0tayximqcyb6uf8
$ arv-keepdocker images
REPOSITORY TAG IMAGE ID COLLECTION CREATED
docker-example-r-base latest sha256:edd10 zzzzz-4zz18-0tayximqcyb6uf8 Mon Jun 29 17:46:16 2020
You are now able to specify the runtime environment for your program using DockerRequirement
in your workflow:
hints: DockerRequirement: dockerPull: docker-example-r-base
Docker images are subject to normal Arvados permissions. If wish to share your Docker image with others you should use arv-keepdocker
with the --project-uuid
option to add the image to a shared project and ensure that metadata is set correctly.
$ arv-keepdocker docker-example-r-base --project-uuid zzzzz-j7d0g-xxxxxxxxxxxxxxx
In addition to creating your own contianers, there are a number of resources where you can find bioinformatics tools already wrapped in container images:
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.