Downloading data

Arvados Data collections can be downloaded using either the arv commands or using Workbench.

  1. Download using Workbench
  2. Sharing collections
  3. Download using command line tools

Download using Workbench

You can also download Arvados data collections using the Workbench.

Visit the Workbench Dashboard. Click on Projects dropdown menu in the top navigation menu, select your Home project. You will see the Data collections tab, which lists the collections in this project.

You can access the contents of a collection by clicking on the Show button next to the collection. This will take you to the collection’s page. Using this page you can see the collection’s contents, and download individual files.

You can now download the collection files by clicking on the button(s).

Sharing collections

Sharing with other Arvados users

Collections can be shared with other users on the Arvados cluster by sharing the parent project. Navigate to the parent project using the “breadcrumbs” bar, then click on the Sharing tab. From the sharing tab, you can choose which users or groups to share with, and their level of access.

Creating a special download URL

To share a collection with users that do not have an account on your Arvados cluster, visit the collection page using Workbench as described in the above section. Once on this page, click on the Create sharing link button.

This will create a sharing link for the collection as shown below. You can copy the sharing link in this page and share it with other users.

A user with this url can download this collection by simply accessing this url using browser. It will present a downloadable version of the collection as shown below.

Download using command line tools

Note:

This tutorial assumes that you have access to the Arvados command line tools and have set the API token and confirmed a working environment. .

You can download Arvados data collections using the command line tools arv-ls and arv-get.

Use arv-ls to view the contents of a collection:

~$ arv-ls ae480c5099b81e17267b7445e35b4bc7+180
./HWI-ST1027_129_D0THKACXX.1_1.fastq
./HWI-ST1027_129_D0THKACXX.1_2.fastq
Use @-s@ to print file sizes, in kilobytes, rounded up:
~$ arv-ls -s ae480c5099b81e17267b7445e35b4bc7+180
     12258 ./HWI-ST1027_129_D0THKACXX.1_1.fastq
     12258 ./HWI-ST1027_129_D0THKACXX.1_2.fastq

Use arv-get to download the contents of a collection and place it in the directory specified in the second argument (in this example, . for the current directory):

~$ $ arv-get ae480c5099b81e17267b7445e35b4bc7+180/ .
23 MiB / 23 MiB 100.0%
~$ ls
HWI-ST1027_129_D0THKACXX.1_1.fastq  HWI-ST1027_129_D0THKACXX.1_2.fastq

You can also download individual files:

~$ arv-get ae480c5099b81e17267b7445e35b4bc7+180/HWI-ST1027_129_D0THKACXX.1_1.fastq .
11 MiB / 11 MiB 100.0%

Federated downloads

If your cluster is configured to be part of a federation you can also download collections hosted on other clusters (with appropriate permissions).

If you request a collection by portable data hash, it will first search the home cluster, then search federated clusters.

You may also request a collection by UUID. In this case, it will contact the cluster named in the UUID prefix (in this example, zzzzz).

~$ arv-get zzzzz-4zz18-fw6dnjxtkvzdewt/ .

Previous: Uploading data Next: Access Keep as a GNU/Linux filesystem

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.