Running a workflow using Workbench

A “workflow” (sometimes called a “pipeline” in other systems) is a sequence of steps that apply various programs or tools to transform input data to output data. Workflows are the principal means of performing computation with Arvados. This tutorial demonstrates how to run a single-stage workflow to take a small data set of paired-end reads from a sample exome in FASTQ format and align them to Chromosome 19 using the bwa mem tool, producing a Sequence Alignment/Map file. This tutorial will introduce the following Arvados features:

  • How to create a new process from an existing workflow.
  • How to browse and select input data for the workflow and submit the process to run on the Arvados cluster.
  • How to access your process results.

Steps

  1. Start from the Workbench Dashboard. You can access the Dashboard by clicking on Dashboard in the upper left corner of any Workbench page.
  2. Click on the Run a process… button. This will open a dialog box titled Choose a pipeline or workflow to run.
  3. In the search box, type in bwa-mem.cwl.
  4. Select bwa-mem.cwl and click the Next: choose inputs button. This will create a new process in your Home project and will open it. You can now supply the inputs for the process. Please note that all required inputs are populated with default values and you can change them if you prefer.
  5. For example, let’s see how to set read pair read_p1 and read_p2 for this workflow. Click the Choose button beneath the read_p1 header. This will open a dialog box titled Choose a file.
  6. In the file dialog, click on Home menu and then select All Projects.
  7. Enter HWI-ST1027 into the search box. You will see one or more collections. Click on HWI-ST1027_129_D0THKACXX for CWL tutorial
  8. The right hand panel will list two files. Click on the first one ending in “_1” and click the OK button.
  9. Repeat the steps 5-8 to set the read_p2 except selecting the second file ending in “_2”
  10. Scroll to the bottom of the “Inputs” panel and click on the Run button. The page updates to show you that the process has been submitted to run on the Arvados cluster.
  11. Once the process starts running, you can track the progress by watching log messages from the component(s). This page refreshes automatically. You will see a complete label when the process completes successfully.
  12. Click on the Output link to see the results of the process. This will load a new page listing the output files from this process. You’ll see the output SAM file from the alignment tool under the Files tab.
  13. Click on the download button to the right of the SAM file to download your results.

Previous: Accessing Arvados Workbench Next: Getting started at the command line

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.