Scripts provided by Arvados

Note:

Arvados pipeline templates are deprecated. The recommended way to develop new workflows for Arvados is using the Common Workflow Language.

Several crunch scripts are included with Arvados in the /crunch_scripts directory. They are intended to provide examples and starting points for writing your own scripts.

bwa-aln

Run the bwa aligner on a set of paired-end fastq files, producing a BAM file for each pair. View source.

Parameter Description Example
bwa_tbz Collection with the bwa source distribution. 8b6e2c4916133e1d859c9e812861ce13+70
samtools_tgz Collection with the samtools source distribution. c777e23cf13e5d5906abfdc08d84bfdb+74
input Collection with fastq reads (pairs of *_1.fastq.gz and *_2.fastq.gz). d0136bc494c21f79fc1b6a390561e6cb+2778

bwa-index

Generate an index of a fasta reference genome suitable for use by bwa-aln. View source.

Parameter Description Example
bwa_tbz Collection with the bwa source distribution. 8b6e2c4916133e1d859c9e812861ce13+70
input Collection with reference data (*.fasta.gz, *.fasta.fai.gz, *.dict.gz). c361dbf46ee3397b0958802b346e9b5a+925

picard-gatk2-prep

Using the FixMateInformation, SortSam, ReorderSam, AddOrReplaceReadGroups, and BuildBamIndex modules from picard, prepare a BAM file for use with the GATK2 tools. Additionally, run picard’s CollectAlignmentSummaryMetrics module to produce a *.casm.tsv statistics file for each BAM file. View source.

Parameter Description Example
input Collection containing aligned bam files.
picard_zip Collection with the picard binary distribution. 687f74675c6a0e925dec619cc2bec25f+77
reference Collection with reference data (*.fasta.gz, *.fasta.fai.gz, *.dict.gz). c361dbf46ee3397b0958802b346e9b5a+925

GATK2-realign

Run GATK’s RealignerTargetCreator and IndelRealigner modules on a set of BAM files. View source.

Parameter Description Example
input Collection containing aligned bam files.
picard_zip Collection with the picard binary distribution. 687f74675c6a0e925dec619cc2bec25f+77
gatk_tbz Collection with the GATK2 binary distribution. 7e0a277d6d2353678a11f56bab3b13f2+87
gatk_bundle Collection with the GATK data bundle. d237a90bae3870b3b033aea1e99de4a9+10820
known_sites List of files in the data bundle to use as GATK -known arguments. Optional. ["dbsnp_137.b37.vcf","Mills_and_1000G_gold_standard.indels.b37.vcf"] (this is the default value)
regions Collection with .bed files indicating sequencing target regions. Optional.
region_padding Corresponds to GATK --interval_padding argument. Required if a regions parameter is given. 10

GATK2-bqsr

Run GATK’s BaseQualityScoreRecalibration module on a set of BAM files. View source.

Parameter Description Example
input Collection containing bam files.
gatk_tbz Collection with the GATK2 binary distribution. 7e0a277d6d2353678a11f56bab3b13f2+87
gatk_bundle Collection with the GATK data bundle. d237a90bae3870b3b033aea1e99de4a9+10820

GATK2-merge-call

Merge a set of BAM files using picard, and run GATK’s UnifiedGenotyper module on the merged set to produce a VCF file. View source.

Parameter Description Example
input Collection containing bam files.
picard_zip Collection with the picard binary distribution. 687f74675c6a0e925dec619cc2bec25f+77
gatk_tbz Collection with the GATK2 binary distribution. 7e0a277d6d2353678a11f56bab3b13f2+87
gatk_bundle Collection with the GATK data bundle. d237a90bae3870b3b033aea1e99de4a9+10820
regions Collection with .bed files indicating sequencing target regions. Optional.
region_padding Corresponds to GATK --interval_padding argument. Required if a regions parameter is given. 10

file-select

Pass through the named files from input to output collection, and ignore the rest. View source.

Parameter Description Example
names List of filenames to include in the output. ["human_g1k_v37.fasta.gz","human_g1k_v37.fasta.fai.gz"]

Previous: Pipeline template reference Next: Querying the Metadata Database

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.