Running an Arvados pipeline

Note:

This section assumes the legacy Jobs API is available. Some newer installations have already disabled the Jobs API in favor of the Containers API.

If the Jobs API is not available, use the Common Workflow Language instead.

This tutorial demonstrates how to use the command line to run the same pipeline as described in running a pipeline using Workbench.

Note:

This tutorial assumes that you are logged into an Arvados VM instance (instructions for Webshell or Unix or Windows) or you have installed the Arvados FUSE Driver and Python SDK on your workstation and have a working environment.

Note:

This tutorial assumes you are using the default Arvados instance, qr1hi. If you are using a different instance, replace qr1hi with your instance. See Accessing Arvados Workbench for more details.

When you use the command line, you must use Arvados unique identifiers to refer to objects. The identifiers in this example correspond to the following Arvados objects:

Use arv pipeline run to run the pipeline, supplying the inputs to the bwa-mem component on the command line:

~$ arv pipeline run --run-pipeline-here --template qr1hi-p5p6p-itzkwxblfermlwv bwa-mem::reference_collection=2463fa9efeb75e099685528b3b9071e0+438 bwa-mem::sample=3229739b505d2b878b62aed09895a55a+142

2014-07-25 18:05:26 +0000 -- pipeline_instance qr1hi-d1hrv-d14trje19pna7f2
bwa-mem qr1hi-8i9sb-67n1qvsronmd2z6 queued 2014-07-25T18:05:25Z

2014-07-25 18:05:36 +0000 -- pipeline_instance qr1hi-d1hrv-d14trje19pna7f2
bwa-mem qr1hi-8i9sb-67n1qvsronmd2z6 {:done=>0, :running=>1, :failed=>0, :todo=>0}

2014-07-25 18:05:46 +0000 -- pipeline_instance qr1hi-d1hrv-d14trje19pna7f2
bwa-mem qr1hi-8i9sb-67n1qvsronmd2z6 49bae1066f4ebce72e2587a3efa61c7d+88

This instantiates your pipeline and displays periodic status reports in your terminal window. The new pipeline instance will also show up on the Workbench Dashboard.

arv pipeline run submits a job for each pipeline component as soon as the component’s inputs are known (i.e., any dependencies are satsified). It terminates when there is no work left to do: this means either all components are satisfied and all jobs have completed successfully, or one or more jobs have failed and it is therefore unproductive to submit any further jobs.

The Keep locators of the output of the bwa-mem components are available from the last status report shown above:

~$ arv keep ls -s 49bae1066f4ebce72e2587a3efa61c7d+88
     29226 ./HWI-ST1027_129_D0THKACXX.1_1.sam

Re-using existing jobs and outputs

When satisfying a pipeline component that is not marked as nondeterministic in the pipeline template, arv pipeline run checks for a previously submitted job that satisfies the component’s requirements. If such a job is found, arv pipeline run uses the existing job rather than submitting a new one. Usually this is a safe way to conserve time and compute resources. In some cases it’s desirable to re-run jobs with identical specifications (e.g., to demonstrate that a job or entire pipeline thought to be repeatable is in fact repeatable). For such cases, job re-use features can be disabled entirely by passing the --no-reuse flag to the arv pipeline run command.

Previous: Creative Commons Next: Using arv-run