Arvados CWL Extensions

Arvados provides several extensions to CWL for workflow optimization, site-specific configuration, and to enable access the Arvados API.

To use Arvados CWL extensions, add the following $namespaces section at the top of your CWL file:

  arv: ""
  cwltool: ""

Arvados extensions must go into the hints section, for example:

  arv:RunInSingleContainer: {}
    keep_cache: 123456
    keep_output_dir: local_output_dir
    partition: dev_partition
  arv:APIRequirement: {}
    loadListing: shallow_listing


Indicates that a subworkflow should run in a single container and not be scheduled as separate steps.


Set Arvados-specific runtime hints.

Field Type Description
keep_cache int Size of file data buffer for Keep mount in MiB. Default is 256 MiB. Increase this to reduce cache thrashing in situations such as accessing multiple large (64+ MiB) files at the same time, or performing random access on a large file.
outputDirType enum Preferred backing store for output staging. If not specified, the system may choose which one to use. One of local_output_dir or keep_output_dir

local_output_dir: Use regular file system local to the compute node. There must be sufficient local scratch space to store entire output; specify this with outdirMin of ResourceRequirement. Files are batch uploaded to Keep when the process completes. Most compatible, but upload step can be time consuming for very large files.

keep_output_dir: Use writable Keep mount. Files are streamed to Keep as they are written. Does not consume local scratch space, but does consume RAM for output buffers (up to 192 MiB per file simultaneously open for writing.) Best suited to processes which produce sequential output of large files (non-sequential writes may produced fragmented file manifests). Supports regular files and directories, does not support special files such as symlinks, hard links, named pipes, named sockets, or device nodes.|


Select preferred compute partitions on which to run jobs.

Field Type Description
partition string or array of strings


Indicates that process wants to access to the Arvados API. Will be granted limited network access and have ARVADOS_API_HOST and ARVADOS_API_TOKEN set in the environment.


In CWL v1.0 documents, the default behavior for Directory objects is to recursively expand the listing for access by parameter references an expressions. For directory trees containing many files, this can be expensive in both time and memory usage. Use cwltool:LoadListingRequirement to change the behavior for expansion of directory listings in the workflow runner.

Field Type Description
loadListing string One of no_listing, shallow_listing, or deep_listing

no_listing: Do not expand directory listing at all. The listing field on the Directory object will be undefined.

shallow_listing: Only expand the first level of directory listing. The listing field on the toplevel Directory object will contain the directory contents, however listing will not be defined on subdirectories.

deep_listing: Recursively expand all levels of directory listing. The listing field will be provided on the toplevel object and all subdirectories.

Previous: Best Practices for writing CWL Next: Customizing Crunch environment using Docker

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.