Downloading data

Arvados Data collections can be downloaded using either the arv commands or using Workbench.

  1. Downloading using arv commands
  2. Downloading using Workbench
  3. Downloading a shared collection using Workbench

Downloading using arv commands

Note:

This tutorial assumes that you are logged into an Arvados VM instance (instructions for Webshell or Unix or Windows) or you have installed the Arvados FUSE Driver and Python SDK on your workstation and have a working environment.

You can download Arvados data collections using the command line tools arv-ls and arv-get.

Use arv-ls to view the contents of a collection:

~$ arv-ls c1bad4b39ca5a924e481008009d94e32+210
var-GS000016015-ASM.tsv.bz2
~$ arv-ls 887cd41e9c613463eab2f0d885c6dd96+83
alice.txt
bob.txt
carol.txt

Use -s to print file sizes rounded up to the nearest kilobyte:

~$ arv-ls -s c1bad4b39ca5a924e481008009d94e32+210
221887 var-GS000016015-ASM.tsv.bz2

Use arv-get to download the contents of a collection and place it in the directory specified in the second argument (in this example, . for the current directory):

~$ arv-get c1bad4b39ca5a924e481008009d94e32+210/ .
~$ ls var-GS000016015-ASM.tsv.bz2
var-GS000016015-ASM.tsv.bz2

You can also download individual files:

~$ arv-get 887cd41e9c613463eab2f0d885c6dd96+83/alice.txt .

Federated downloads

If your cluster is configured to be part of a federation you can also download collections hosted on other clusters (with appropriate permissions).

If you request a collection by portable data hash, it will first search the home cluster, then search federated clusters.

You may also request a collection by UUID. In this case, it will contact the cluster named in the UUID prefix (in this example, qr1hi).

~$ arv-get qr1hi-4zz18-fw6dnjxtkvzdewt/ .

Downloading using Workbench

You can also download Arvados data collections using the Workbench.

Visit the Workbench Dashboard. Click on Projects dropdown menu in the top navigation menu, select your Home project. You will see the Data collections tab, which lists the collections in this project.

You can access the contents of a collection by clicking on the Show button next to the collection. This will take you to the collection’s page. Using this page you can see the collection’s contents, download individual files, and set sharing options.

You can now download the collection files by clicking on the button(s).

Downloading a shared collection using Workbench

Collections can be shared to allow downloads by anonymous users.

To share a collection with anonymous users, visit the collection page using Workbench as described in the above section. Once on this page, click on the Create sharing link button.

This will create a sharing link for the collection as shown below. You can copy the sharing link in this page and share it with other users.

A user with this url can download this collection by simply accessing this url using browser. It will present a downloadable version of the collection as shown below.


Previous: Uploading data Next: Accessing Keep from GNU/Linux

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.