Starting a Workflow at the Command Line

The Common Workflow Language is a multi-vendor open standard for describing analysis tools and workflows that are portable across a variety of platforms. CWL is the primary way to develop and run workflows for Arvados. Arvados supports versions v1.0 , v1.1 and v1.2 of the CWL standard.

Note:

This tutorial assumes that you have access to the Arvados command line tools and have set the API token and confirmed a working environment. .

This tutorial will demonstrate how to submit a workflow at the command line using arvados-cwl-runner.

  1. Get the tutorial files
  2. Submitting a workflow to an Arvados cluster
  3. Registering a workflow to use in Workbench
  4. Make a workflow file directly executable

Get the tutorial files

The tutorial files are located in the documentation section of the Arvados source repository, which can be found on git.arvados.org or github

~$ git clone https://git.arvados.org/arvados.git
~$ cd arvados/doc/user/cwl/bwa-mem

The tutorial data is hosted on https://playground.arvados.org (also referred to by the identifier pirca). If you are using a different Arvados instance, you may need to copy the data to your own instance. One way to do this is with arv-copy (this requires signing up for a free playground.arvados.org account).

~$ arv-copy --src pirca --dst settings 2463fa9efeb75e099685528b3b9071e0+438
~$ arv-copy --src pirca --dst settings ae480c5099b81e17267b7445e35b4bc7+180

If you do not wish to create an account on https://playground.arvados.org, you may download the files anonymously and upload them to your local Arvados instance:

https://collections.pirca.arvadosapi.com/c=2463fa9efeb75e099685528b3b9071e0+438/

https://collections.pirca.arvadosapi.com/c=ae480c5099b81e17267b7445e35b4bc7+180/

Submitting a workflow to an Arvados cluster

Submit a workflow and wait for results

Use arvados-cwl-runner to submit CWL workflows to Arvados. After submitting the job, it will wait for the workflow to complete and print out the final result to standard output.

Note: Once submitted, the workflow runs entirely on Arvados, so even if you log out, the workflow will continue to run. However, if you interrupt arvados-cwl-runner with control-C it will cancel the workflow.

~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner bwa-mem.cwl bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
{
    "aligned_sam": {
        "location": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
        "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
        "class": "File",
        "size": 30738986
    }
}

Referencing files

When running a workflow on an Arvados cluster, the input files must be stored in Keep. There are several ways this can happen.

A URI reference to Keep uses the keep: scheme followed by either the portable data hash or UUID of the collection and then the location of the file inside the collection. For example, keep:2463fa9efeb75e099685528b3b9071e0+438/19.fasta.bwt or keep:zzzzz-4zz18-zzzzzzzzzzzzzzz/19.fasta.bwt.

If you reference a file in arv-mount, such as /home/example/keep/by_id/2463fa9efeb75e099685528b3b9071e0+438/19.fasta.bwt, then arvados-cwl-runner will automatically determine the appropriate Keep URI reference.

If you reference a local file which is not in arv-mount, then arvados-cwl-runner will upload the file to Keep and use the Keep URI reference from the upload.

You can also execute CWL files that have been uploaded Keep:


~/arvados/doc/user/cwl/bwa-mem$ arv-put --portable-data-hash --name "bwa-mem.cwl" bwa-mem.cwl
2020-08-20 13:40:02 arvados.arv_put[12976] INFO: Collection saved as 'bwa-mem.cwl'
f141fc27e7cfa7f7b6d208df5e0ee01b+59
~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner keep:f141fc27e7cfa7f7b6d208df5e0ee01b+59/bwa-mem.cwl bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
{
    "aligned_sam": {
        "location": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
        "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
        "class": "File",
        "size": 30738986
    }
}

Note: uploading a workflow file to Keep is not the same as registering the workflow for use in Workbench. See Registering a workflow to use in Workbench below.

Work reuse

Workflows submitted with arvados-cwl-runner will take advantage of Arvados job reuse. If you submit a workflow which is identical to one that has run before, it will short cut the execution and return the result of the previous run. This also applies to individual workflow steps. For example, a two step workflow where the first step has run before will reuse results for first step and only execute the new second step. You can disable this behavior with --disable-reuse.

Docker images

Docker images referenced by the workflow must be uploaded to Arvados. This requires docker to be installed and usable by the user running arvados-cwl-runner. If the image is not present in the local Docker instance, arvados-cwl-runner will first attempt to pull the image using docker pull, then upload it.

If there is already a Docker image in Arvados with the same name, it will use the existing image. In this case, the submitter will not use Docker.

The --match-submitter-images option will check the id of the image in the local Docker instance and compare it to the id of the image already in Arvados with the same name and tag. If they are different, it will choose the image matching the local image id, which will be uploaded it if necessary. This helpful for development, if you locally rebuild the image with the ‘latest’ tag, the --match-submitter-images will ensure that the newer version is used.

Command line options

See arvados-cwl-runner options

Registering a workflow to use in Workbench

Use --create-workflow to register a CWL workflow with Arvados. This enables you to share workflows with other Arvados users, and run them by clicking the Run a process… button on the Workbench Dashboard and on the command line by UUID.

~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner --create-workflow bwa-mem.cwl
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-07-01 12:21:01 arvados.arv-run[15796] INFO: Upload local files: "bwa-mem.cwl"
2016-07-01 12:21:01 arvados.arv-run[15796] INFO: Uploaded to zzzzz-4zz18-7e0hedrmkuyoei3
2016-07-01 12:21:01 arvados.cwl-runner[15796] INFO: Created template zzzzz-p5p6p-rjleou1dwr167v5
zzzzz-p5p6p-rjleou1dwr167v5

You can provide a partial input file to set default values for the workflow input parameters. You can also use the --name option to set the name of the workflow:

~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner --name "My workflow with defaults" --create-workflow bwa-mem.cwl bwa-mem-template.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-07-01 14:09:50 arvados.arv-run[3730] INFO: Upload local files: "bwa-mem.cwl"
2016-07-01 14:09:50 arvados.arv-run[3730] INFO: Uploaded to zzzzz-4zz18-0f91qkovk4ml18o
2016-07-01 14:09:50 arvados.cwl-runner[3730] INFO: Created template zzzzz-p5p6p-0deqe6nuuyqns2i
zzzzz-p5p6p-zuniv58hn8d0qd8

Running registered workflows at the command line

You can run a registered workflow at the command line by its UUID:

~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner pirca-7fd4e-3nqbw08vtjl8ybz --help
INFO /home/peter/work/scripts/venv3/bin/arvados-cwl-runner 2.1.0.dev20200814195416, arvados-python-client 2.1.0.dev20200814195416, cwltool 3.0.20200807132242
INFO Resolved 'pirca-7fd4e-3nqbw08vtjl8ybz' to 'arvwf:pirca-7fd4e-3nqbw08vtjl8ybz#main'
usage: pirca-7fd4e-3nqbw08vtjl8ybz [-h] [--PL PL] [--group_id GROUP_ID]
                                   [--read_p1 READ_P1] [--read_p2 READ_P2]
                                   [--reference REFERENCE]
                                   [--sample_id SAMPLE_ID]
                                   [job_order]

positional arguments:
  job_order             Job input json file

optional arguments:
  -h, --help            show this help message and exit
  --PL PL
  --group_id GROUP_ID
  --read_p1 READ_P1     The reads, in fastq format.
  --read_p2 READ_P2     For mate paired reads, the second file (optional).
  --reference REFERENCE
                        The index files produced by `bwa index`
  --sample_id SAMPLE_ID

Make a workflow file directly executable

You can make a workflow file directly executable (cwl-runner should be an alias to arvados-cwl-runner) by adding the following line to the top of the file:

#!/usr/bin/env cwl-runner
~/arvados/doc/user/cwl/bwa-mem$ ./bwa-mem.cwl bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
{
    "aligned_sam": {
        "path": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
        "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
        "class": "File",
        "size": 30738986
    }
}

You can even make an input file directly executable the same way with the following two lines at the top:

#!/usr/bin/env cwl-runner
cwl:tool: bwa-mem.cwl
~/arvados/doc/user/cwl/bwa-mem$ ./bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
{
    "aligned_sam": {
        "path": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
        "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
        "class": "File",
        "size": 30738986
    }
}

Setting up arvados-cwl-runner

See Arvados CWL Runner


Previous: Developing CWL Workflows with VSCode Next: arvados-cwl-runner options

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.