arvados-cwl-runner options

  1. Command line options
  2. Specify workflow and output names
  3. Submit a workflow without waiting for the result
  4. Control a workflow locally
  5. Automatically delete intermediate outputs
  6. Run workflow on a remote federated cluster

Command line options

The following command line options are available for arvados-cwl-runner:

Option Description
--basedir BASEDIR Base directory used to resolve relative references in the input, default to directory of input object file or current directory (if inputs piped/provided on command line).
--eval-timeout EVAL_TIMEOUT Time to wait for a Javascript expression to evaluate before giving an error, default 20s.
--print-dot Print workflow visualization in graphviz format and exit
--version Print version and exit
--validate Validate CWL document only.
--verbose Default logging
--quiet Only print warnings and errors.
--debug Print even more logging
--metrics Print timing metrics
--tool-help Print command line help for tool
--enable-reuse Enable container reuse (default)
--disable-reuse Disable container reuse
--project-uuid UUID Project that will own the workflow containers, if not provided, will go to home project.
--output-name OUTPUT_NAME Name to use for collection that stores the final output.
--output-tags OUTPUT_TAGS Tags for the final output collection separated by commas, e.g., '--output-tags tag0,tag1,tag2'.
--ignore-docker-for-reuse Ignore Docker image version when deciding whether to reuse past containers.
--submit Submit workflow to run on Arvados.
--local Run workflow on local host (submits containers to Arvados).
--create-template (Deprecated) synonym for —create-workflow.
--create-workflow Register an Arvados workflow that can be run from Workbench
--update-workflow UUID Update an existing Arvados workflow with the given UUID.
--wait After submitting workflow runner, wait for completion.
--no-wait Submit workflow runner and exit.
--log-timestamps Prefix logging lines with timestamp
--no-log-timestamps No timestamp on logging lines
--compute-checksum Compute checksum of contents while collecting outputs
--submit-runner-ram SUBMIT_RUNNER_RAM RAM (in MiB) required for the workflow runner job (default 1024)
--submit-runner-image SUBMIT_RUNNER_IMAGE Docker image for workflow runner job
--always-submit-runner When invoked with —submit —wait, always submit a runner to manage the workflow, even when only running a single CommandLineTool
--match-submitter-images Where Arvados has more than one Docker image of the same name, use image from the Docker instance on the submitting node.
--submit-request-uuid UUID Update and commit to supplied container request instead of creating a new one.
--submit-runner-cluster CLUSTER_ID Submit workflow runner to a remote cluster
--collection-cache-size COLLECTION_CACHE_SIZE Collection cache size (in MiB, default 256).
--name NAME Name to use for workflow execution instance.
--on-error {stop,continue} Desired workflow behavior when a step fails. One of ‘stop’ (do not submit any more steps) or ‘continue’ (may submit other steps that are not downstream from the error). Default is ‘continue’.
--enable-dev Enable loading and running development versions of the CWL standards.
--storage-classes STORAGE_CLASSES Specify comma separated list of storage classes to be used when saving final workflow output to Keep.
--intermediate-storage-classes INTERMEDIATE_STORAGE_CLASSES Specify comma separated list of storage classes to be used when saving intermediate workflow output to Keep.
--intermediate-output-ttl N If N > 0, intermediate output collections will be trashed N seconds after creation. Default is 0 (don’t trash).
--priority PRIORITY Workflow priority (range 1..1000, higher has precedence over lower)
--thread-count THREAD_COUNT Number of threads to use for job submit and output collection.
--http-timeout HTTP_TIMEOUT API request timeout in seconds. Default is 300 seconds (5 minutes).
--defer-downloads When submitting a workflow, defer downloading HTTP URLs to workflow launch instead of downloading to Keep before submit.
--varying-url-params VARYING_URL_PARAMS A comma separated list of URL query parameters that should be ignored when storing HTTP URLs in Keep.
--prefer-cached-downloads If a HTTP URL is found in Keep, skip upstream URL freshness check (will not notice if the upstream has changed, but also not error if upstream is unavailable).
--enable-preemptible Use preemptible instances. Control individual steps with arv:UsePreemptible hint.
--disable-preemptible Don’t use preemptible instances.
--copy-deps Copy dependencies into the destination project.
--no-copy-deps Leave dependencies where they are.
--skip-schemas Skip loading of schemas
--trash-intermediate Immediately trash intermediate outputs on workflow success.
--no-trash-intermediate Do not trash intermediate outputs (default).
--enable-usage-report Create usage_report.html with a summary of each step’s resource usage.
--disable-usage-report Disable usage report.

Specify workflow and output names

Use the --name and --output-name options to specify the name of the workflow and name of the output collection.

~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner --name "Example bwa run" --output-name "Example bwa output" bwa-mem.cwl bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Upload local files: "bwa-mem.cwl"
2016-06-30 14:56:36 arvados.arv-run[27002] INFO: Uploaded to zzzzz-4zz18-h7ljh5u76760ww2
2016-06-30 14:56:40 arvados.cwl-runner[27002] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
2016-06-30 14:56:41 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Running
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-fm2n3b1w0l6bskg) is Complete
2016-06-30 14:57:12 arvados.cwl-runner[27002] INFO: Overall process status is success
{
    "aligned_sam": {
        "path": "keep:54325254b226664960de07b3b9482349+154/HWI-ST1027_129_D0THKACXX.1_1.sam",
        "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
        "class": "File",
        "size": 30738986
    }
}

Submit a workflow without waiting for the result

To submit a workflow and exit immediately, use the --no-wait option. This will submit the workflow to Arvados, print out the UUID of the job that was submitted to standard output, and exit.

~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner --no-wait bwa-mem.cwl bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-06-30 15:07:52 arvados.arv-run[12480] INFO: Upload local files: "bwa-mem.cwl"
2016-06-30 15:07:52 arvados.arv-run[12480] INFO: Uploaded to zzzzz-4zz18-eqnfwrow8aysa9q
2016-06-30 15:07:52 arvados.cwl-runner[12480] INFO: Submitted job zzzzz-8i9sb-fm2n3b1w0l6bskg
zzzzz-8i9sb-fm2n3b1w0l6bskg

Control a workflow locally

To run a workflow with local control, use --local. This means that the host where you run arvados-cwl-runner will be responsible for submitting containers, however, the containers themselves will still run on the Arvados cluster. With --local, if you interrupt arvados-cwl-runner or log out, the workflow will be terminated.

~/arvados/doc/user/cwl/bwa-mem$ arvados-cwl-runner --local bwa-mem.cwl bwa-mem-input.yml
arvados-cwl-runner 1.0.20160628195002, arvados-python-client 0.1.20160616015107, cwltool 1.0.20160629140624
2016-07-01 10:05:19 arvados.cwl-runner[16290] INFO: Pipeline instance zzzzz-d1hrv-92wcu6ldtio74r4
2016-07-01 10:05:28 arvados.cwl-runner[16290] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-2nzzfbuf9zjrj4g) is Queued
2016-07-01 10:05:29 arvados.cwl-runner[16290] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-2nzzfbuf9zjrj4g) is Running
2016-07-01 10:05:45 arvados.cwl-runner[16290] INFO: Job bwa-mem.cwl (zzzzz-8i9sb-2nzzfbuf9zjrj4g) is Complete
2016-07-01 10:05:46 arvados.cwl-runner[16290] INFO: Overall process status is success
{
    "aligned_sam": {
        "size": 30738986,
        "path": "keep:15f56bad0aaa7364819bf14ca2a27c63+88/HWI-ST1027_129_D0THKACXX.1_1.sam",
        "checksum": "sha1$0dc46a3126d0b5d4ce213b5f0e86e2d05a54755a",
        "class": "File"
    }
}

Automatically delete intermediate outputs

Use the --intermediate-output-ttl and --trash-intermediate options to specify how long intermediate outputs should be kept (in seconds) and whether to trash them immediately upon successful workflow completion.

Temporary collections will be trashed intermediate-output-ttl seconds after creation. A value of zero (default) means intermediate output should be retained indefinitely.

Note: arvados-cwl-runner currently does not take workflow dependencies into account when setting the TTL on an intermediate output collection. If the TTL is too short, it is possible for a collection to be trashed before downstream steps that consume it are started. The recommended minimum value for TTL is the expected duration for the entire the workflow.

Using --trash-intermediate without --intermediate-output-ttl means that intermediate files will be trashed on successful completion, but will remain on workflow failure.

Using --intermediate-output-ttl without --trash-intermediate means that intermediate files will be trashed only after the TTL expires (regardless of workflow success or failure).

Run workflow on a remote federated cluster

By default, the workflow runner will run on the local (home) cluster. Using --submit-runner-cluster you can specify that the runner should be submitted to a remote federated cluster. When doing this, --project-uuid should specify a project on that cluster. Steps making up the workflow will be submitted to the remote federated cluster by default, but the behavior of arv:ClusterTarget is unchanged. Note: when using this option, any resources that need to be uploaded in order to run the workflow (such as files or Docker images) will be uploaded to the local (home) cluster, and streamed to the federated cluster on demand.

Using preemptible (spot) instances

Preemptible instances typically offer lower cost computation with a tradeoff of lower service guarantees. If a compute node is preempted, Arvados will restart the computation on a new instance.

If the sitewide configuration Containers.AlwaysUsePreemptibleInstances is true, workflow steps will always select preemptible instances, regardless of user option.

If Containers.AlwaysUsePreemptibleInstances is false, you can request preemptible instances for a specific run with the arvados-cwl-runner --enable-preemptible option.

Within the workflow, you can control whether individual steps should be preemptible with the arv:UsePreemptible hint.

If a workflow requests preemptible instances with arv:UsePreemptible , but you do not want to use preemptible instances, you can override it for a specific run with the arvados-cwl-runner --disable-preemptible option.

Use CUDA GPU instances

See cwltool:CUDARequirement .


Previous: Starting a Workflow at the Command Line Next: Analyzing workflow performance

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.