Developing workflows with CWL

The Common Workflow Language is a multi-vendor open standard for describing analysis tools and workflows that are portable across a variety of platforms. CWL is the primary way to develop and run workflows for Arvados. Arvados supports versions v1.0 , v1.1 and v1.2 of the CWL standard.


This tutorial assumes that you have access to the Arvados command line tools and have set the API token and confirmed a working environment. .

Developing workflows

For an introduction and and detailed documentation about writing CWL, see the CWL User Guide and the CWL Specification .

See Writing Portable High-Performance Workflows and Arvados CWL Extensions for additional information about using CWL on Arvados.

See Repositories of CWL Tools and Workflows for links to repositories of existing tools for reuse.

See Software for working with CWL for links to software tools to help create CWL documents.

Using cwltool

When developing a workflow, it is often helpful to run it on the local host to avoid the overhead of submitting to the cluster. To execute a workflow only on the local host (without submitting jobs to an Arvados cluster) you can use the cwltool command. Note that when using cwltool you must have the input data accessible on the local file system using either arv-mount or arv-get to fetch the data from Keep.

~/arvados/doc/user/cwl/bwa-mem$ arv-get 2463fa9efeb75e099685528b3b9071e0+438/ .
156 MiB / 156 MiB 100.0%
~/arvados/doc/user/cwl/bwa-mem$ arv-get ae480c5099b81e17267b7445e35b4bc7+180/ .
23 MiB / 23 MiB 100.0%
~/arvados/doc/user/cwl/bwa-mem$ cwltool bwa-mem-input.yml bwa-mem-input-local.yml
cwltool 1.0.20160629140624
[job bwa-mem.cwl] /home/example/arvados/doc/user/cwl/bwa-mem$ docker \
    run \
    -i \
    --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.ann:/var/lib/cwl/job979368791_bwa-mem/19.fasta.ann:ro \
    --volume=/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq:/var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq:ro \
    --volume=/home/example/arvados/doc/user/cwl/bwa-mem/ \
    --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.amb:/var/lib/cwl/job979368791_bwa-mem/19.fasta.amb:ro \
    --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.pac:/var/lib/cwl/job979368791_bwa-mem/19.fasta.pac:ro \
    --volume=/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq:/var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq:ro \
    --volume=/home/example/arvados/doc/user/cwl/bwa-mem/19.fasta.bwt:/var/lib/cwl/job979368791_bwa-mem/19.fasta.bwt:ro \
    --volume=/home/example/arvados/doc/user/cwl/bwa-mem:/var/spool/cwl:rw \
    --volume=/tmp/tmpgzyou9:/tmp:rw \
    --workdir=/var/spool/cwl \
    --read-only=true \
    --log-driver=none \
    --user=1001 \
    --rm \
    --env=TMPDIR=/tmp \
    --env=HOME=/var/spool/cwl \
    biodckr/bwa \
    bwa \
    mem \
    -t \
    1 \
    -R \
    '@RG	ID:arvados_tutorial	PL:illumina	SM:HWI-ST1027_129' \
    /var/lib/cwl/job979368791_bwa-mem/19.fasta \
    /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq \
    /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq > /home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 100000 sequences (10000000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 4745, 1, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (154, 181, 214)
[M::mem_pestat] low and high boundaries for computing mean and (34, 334)
[M::mem_pestat] mean and (185.63, 44.88)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 394)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 100000 reads in 9.848 CPU sec, 9.864 real sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa mem -t 1 -R @RG	ID:arvados_tutorial	PL:illumina	SM:HWI-ST1027_129 /var/lib/cwl/job979368791_bwa-mem/19.fasta /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.fastq /var/lib/cwl/job979368791_bwa-mem/HWI-ST1027_129_D0THKACXX.1_2.fastq
[main] Real time: 10.061 sec; CPU: 10.032 sec
Final process status is success
    "aligned_sam": {
        "size": 30738959,
        "path": "/home/example/arvados/doc/user/cwl/bwa-mem/HWI-ST1027_129_D0THKACXX.1_1.sam",
        "checksum": "sha1$0c668cca45fef02397bb5302880526d300ee4dac",
        "class": "File"

If you get the error JavascriptException: Long-running script killed after 20 seconds. this may be due to the Dockerized Node.js engine taking too long to start. You may address this by installing Node.js locally (run apt-get install nodejs on Debian or Ubuntu) or by specifying a longer timeout with the --eval-timeout option. For example, run the workflow with cwltool --eval-timeout=40 for a 40-second timeout.

Previous: arvados-cwl-runner options Next: Working with Docker images

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.