Getting Started with CWL: Glossary

Key Points

  • Common Workflow Language is a standard for describing data analysis workflows

  • We will use an bioinformatics RNA-seq analysis as an example workflow, but does not require in-depth knowledge of biology.

  • After completing this training, you should be able to begin writing workflows for your own analysis, and know where to learn more.

Create a Workflow by Composing Tools
  • CWL documents are written using a syntax called YAML.

  • The key components of the workflow are: the header, the inputs, the steps, and the outputs.

Running and Debugging a Workflow
  • The input parameter file is a YAML file with values for each input parameter.

  • A common reason for a workflow step fails is insufficient RAM.

  • Use ResourceRequirement to set the amount of RAM to be allocated to the job.

  • Output parameter values are printed as JSON to standard output at the end of the run.

Writing a Tool Wrapper
  • The key components of a command line tool wrapper are the header, inputs, baseCommand, arguments, and outputs.

  • Like workflows, CommandLineTools have inputs and outputs.

  • Use baseCommand and arguments to provide the program to run and the command line arguments to run it with.

  • Use glob to capture output files and assign them to output parameters.

  • Use DockerRequirement to supply the name of the Docker image that contains the software to run.

Analyzing Multiple Samples
  • Separate the part of the workflow that you want to run multiple times into a subworkflow.

  • Use a scatter step to run the subworkflow over a list of inputs.

  • The result of a scatter is an array, which can be used in a combine step to get a single result.

Dynamic Workflow Behavior
  • CWL expressions allow you to use custom logic to determine input parameter values.

  • CWL ExpressionTool can be used to reshape data, such as declaring directories that should contain output files.

Resources for further learning
  • Learn more advanced techniques from CWL user guide, by asking questions on the CWL forum and chat channel, and reading the specification.

Supplement: Creating Docker Images for Workflows
  • Docker images contain the initial state of the filesystem for a container

  • Docker images are made up of layers

  • Dockerfiles consist of a series of commands to install software into the container.