Setting up a practice repository
We will create a new git repository and import a library of existing tool definitions that will help us build our workflow.
When using the recommended VSCode environment to develop on Arvados, start by forking the arvados-vscode-cwl-template repository.
- Vscode: On the left sidebar, choose
Explorer
- Select
Clone Repository
and enter https://github.com/arvados/arvados-vscode-cwl-template, then clickOpen
- If asked
Would you like to open the cloned repository?
chooseOpen
Next, import the bio-cwl-tools repository:
- Vscode: In the top menu, select
Terminal
→New Terminal
- This will open a terminal window in the lower part of the screen
- Run this command:
git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
Downloading sample and reference data
Note
You may already have access to this collection.
You can check by going to Workbench and pasting
9178fe1b80a08a422dbe02adfd439764+925
into the search box. If you arrived at a collection page instead of a “not found” error, then you do not need to perform this download step.
- Go to https://workbench2.jutro.arvadosapi.com and sign in, this will create an account
- Go to
Get an API token
under the user menu - Log into the shell node of your Arvados cluster
- On the shell node, copy the host name and token for the
jutro
cluster into the file~/.config/arvados/jutro.conf
as described on the page for arv-copy.
Now, on shell node of your Arvados cluster, use arv-copy
to copy the collection:
arv-copy --src jutro 9178fe1b80a08a422dbe02adfd439764+925
Downloading or generating STAR index
Running STAR requires index files generated from the reference.
This is a rather large download (4 GB). Depending on your bandwidth, it may be faster to generate it yourself.
Downloading
Note
As above, you can check by going to Workbench and pasting
02a12ce9e2707610991bd29d38796b57+2912
into the search box to see if you already have access to this collection.
Use arv-copy
to copy the collection:
arv-copy --src jutro 02a12ce9e2707610991bd29d38796b57+2912
Generating
Create chr1-star-index.yaml
:
InputFiles:
- class: File
location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1.fa
format: http://edamontology.org/format_1930
IndexName: 'hg19-chr1-STAR-index'
Gtf:
class: File
location: keep:9178fe1b80a08a422dbe02adfd439764+925/reference_data/chr1-hg19_genes.gtf
Overhang: 99
Generate the index with arvados-cwl-runner.
arvados-cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml
Setting up a practice repository
We will create a new git repository and import a library of existing tool definitions that will help us build our workflow.
Create a new empty git repository to hold our workflow with this command:
git init rnaseq-cwl-training-exercises
Next, import bio-cwl-tools with this command:
git submodule add https://github.com/common-workflow-library/bio-cwl-tools.git
Downloading sample and reference data
Start from your rnaseq-cwl-exercises directory.
mkdir rnaseq
cd rnaseq
wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=9178fe1b80a08a422dbe02adfd439764+925/
Downloading or generating STAR index
Running STAR requires index files generated from the reference.
This is a rather large download (4 GB). Depending on your bandwidth, it may be faster to generate it yourself.
Downloading
mkdir hg19-chr1-STAR-index
cd hg19-chr1-STAR-index
wget --mirror --no-parent --no-host --cut-dirs=1 https://download.jutro.arvadosapi.com/c=02a12ce9e2707610991bd29d38796b57+2912/
Generating
Create chr1-star-index.yaml
:
InputFiles:
- class: File
location: rnaseq/reference_data/chr1.fa
format: http://edamontology.org/format_1930
IndexName: 'hg19-chr1-STAR-index'
Gtf:
class: File
location: rnaseq/reference_data/chr1-hg19_genes.gtf
Overhang: 99
Generate the index with your local cwl-runner.
cwl-runner bio-cwl-tools/STAR/STAR-Index.cwl chr1-star-index.yaml