Running a workflow using Workbench

A “workflow” (sometimes called a “pipeline” in other systems) is a sequence of steps that apply various programs or tools to transform input data to output data. Workflows are the principal means of performing computation with Arvados. This tutorial demonstrates how to run a single-stage workflow to take a small data set of paired-end reads from a sample exome in FASTQ format and align them to Chromosome 19 using the bwa mem tool, producing a Sequence Alignment/Map file. This tutorial will introduce the following Arvados features:

  • How to create a new process from an existing workflow.
  • How to browse and select input data for the workflow and submit the process to run on the Arvados cluster.
  • How to access your process results.

Steps

  1. Start from the Workbench Dashboard. You can access the Dashboard by clicking on Dashboard in the upper left corner of any Workbench page.
  2. Click on the Run a process… button. This will open a dialog box titled Choose a pipeline or workflow to run.
  3. In the search box, type in Tutorial bwa mem cwl.
  4. Select Tutorial bwa mem cwl and click the Next: choose inputs button. This will create a new process in your Home project and will open it. You can now supply the inputs for the process. Please note that all required inputs are populated with default values and you can change them if you prefer.
  5. For example, let’s see how to change “reference” parameter for this workflow. Click the Choose button beneath the “reference” parameter header. This will open a dialog box titled Choose a dataset for “reference” parameter for cwl-runner in bwa-mem.cwl component.
  6. Open the Home menu and select All Projects. Search for and select Tutorial chromosome 19 reference. You will then see a list of files. Select 19-fasta.bwt and click the OK button.
  7. Repeat the previous two steps to set the “read_p1” parameter for cwl-runner script in bwa-mem.cwl component and “read_p2” parameter for cwl-runner script in bwa-mem.cwl component parameters.
  8. Click on the Run button. The page updates to show you that the process has been submitted to run on the Arvados cluster.
  9. After the process starts running, you can track the progress by watching log messages from the component(s). This page refreshes automatically. You will see a complete label when the process completes successfully.
  10. Click on the Output link to see the results of the process. This will load a new page listing the output files from this process. You’ll see the output SAM file from the alignment tool under the Files tab.
  11. Click on the download button to the right of the SAM file to download your results.

Previous: Accessing Arvados Workbench Next: Accessing an Arvados VM with Webshell

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.