Running a workflow using Workbench
A “workflow” (sometimes called a “pipeline” in other systems) is a sequence of steps that apply various programs or tools to transform input data to output data. Workflows are the principal means of performing computation with Arvados. This tutorial demonstrates how to run a single-stage workflow to take a small data set of paired-end reads from a sample exome in FASTQ format and align them to Chromosome 19 using the bwa mem tool, producing a Sequence Alignment/Map file. This tutorial will introduce the following Arvados features:
- How to create a new process from an existing workflow.
- How to browse and select input data for the workflow and submit the process to run on the Arvados cluster.
- How to access your process results.
Steps
- Click on the + NEW button in the top-left.
- In the pop-up menu, select Run a workflow. This will open the Run Process panel in the Workbench.
- In the search field under Choose a workflow, type in bwa-mem.cwl.
- Select bwa-mem.cwl in the search results, and click the NEXT button. This will create a new process in one of your Home Projects and will open it. To specify the project for the workflow run, click on the input line below “Project where the workflow will run”, and in the pop-up dialog box, choose a project under your Home Projects.
- You can now supply the inputs for the process. Please note that all required inputs are populated with default values and you can change them if you prefer.
- For example, let’s see how to set read pair read_p1 and read_p2 for this workflow. Click on the input line under the read_p1 header. This will open a dialog box titled Choose a file.
- Enter the search terms user guide resources into the Search for a Project field on the left. You will see one or more collections in the search results appearing below and, among them, the one with the exact title User guide resources. Your goal is to locate the file HWI-ST1027_129_D0THKACXX.1_1.fastq.
- You may either locate the file manually, by clicking on the triangles ▶ to the left of each item to expand them (projects and the collections under it) until you find the file, or by filtering the search results using the Filter Collections list in Projects field, for example, with a term like “HWI-ST1027”.
- Either way, you will find the file HWI-ST1027_129_D0THKACXX.1_1.fastq in the search results. Click on it, and then the OK button in the bottom-right.
- Repeat the steps 7—9 to set the value for read_p2, except selecting the file ending in “_2”
- Scroll to the bottom of the “Inputs” panel and click on the RUN WORKFLOW button. The page updates to show you that the process has been queued to run on the Arvados cluster.
- Once the process starts running, you can track the progress by watching the log messages from the component(s) (scroll down to the Logs panel). This page refreshes automatically, and you can also click on the REFRESH button on the top of the page. You will see a Completed label when the process completes successfully.
- The output of the workflow can be found by following the link “Output from bwa-mem.cwl” under the heading Output collection in the main or DETAILS panel, or in the OUTPUTS panel further down. Click on the Output from bwa-mem.cwl link to see the detailed results from the workflow run. This will lead you to a page that lists the metadata of the outputs, and you’ll see the output SAM file there, in the FILES panel.
- To download your results, simply click on the SAM file name.
Previous: Using storage classes
Next: Starting a Workflow at the Command Line