Uploading data

Arvados Data collections can be uploaded using either Workbench or the arv-put command line tool.

  1. Upload using Workbench
  2. Creating projects
  3. Upload using command line tool

Upload using Workbench

To upload using Workbench, visit the Workbench Dashboard. Click on Projects dropdown menu in the top navigation menu and select your Home project or any other project of your choosing. You will see the Data collections tab for this project, which lists the collections in this project.

To upload files into a new collection, click on Add data dropdown menu and select Upload files from my computer.


This will create a new empty collection in your chosen project and will take you to the Upload tab for that collection.

Click on the Browse… button and select the files you would like to upload. Selected files will be added to a list of files to be uploaded. After you are done selecting files to upload, click on the Start button to start upload. This will start uploading files to Arvados and Workbench will show you the progress bar. When upload is completed, you will see an indication to that effect.

Note: If you leave the collection page during the upload, the upload process will be aborted and you will need to upload the files again.

Note: You can also use the Upload tab to add additional files to an existing collection.

Creating projects

Files are organized into Collections, and Collections are organized by Projects.

Click on Projects Add a new project to add a top level project.

To create a subproject, navigate to the parent project, and click on Add a subproject.

See Sharing collections for information about sharing projects and collections with other users.

Upload using command line tool

Note:

This tutorial assumes that you have access to the Arvados command line tools and have set the API token and confirmed a working environment. .

To upload a file to Keep using arv-put:

~$ arv-put var-GS000016015-ASM.tsv.bz2
216M / 216M 100.0%
Collection saved as ...
zzzzz-4zz18-xxxxxxxxxxxxxxx

The output value zzzzz-4zz18-xxxxxxxxxxxxxxx is the uuid of the Arvados collection created.

Note: The file used in this example is a freely available TSV file containing variant annotations from the Personal Genome Project participant hu599905), downloadable here. Alternatively, you can replace var-GS000016015-ASM.tsv.bz2 with the name of any file you have locally, or you could get the TSV file by downloading it from Keep.

It is also possible to upload an entire directory with arv-put:

~$ mkdir tmp
~$ echo "hello alice" > tmp/alice.txt
~$ echo "hello bob" > tmp/bob.txt
~$ echo "hello carol" > tmp/carol.txt
~$ arv-put tmp
0M / 0M 100.0%
Collection saved as ...
zzzzz-4zz18-yyyyyyyyyyyyyyy

In both examples, the arv-put command created a collection. The first collection contains the single uploaded file. The second collection contains the entire uploaded directory.

arv-put accepts quite a few optional command line arguments, which are described on the arv subcommands page.

Locate your collection in Workbench

Visit the Workbench Dashboard. Click on Projects dropdown menu in the top navigation menu, select your Home project. Your newly uploaded collection should appear near the top of the Data collections tab. The collection name printed by arv-put will appear under the name column.

To move the collection to a different project, check the box at the left of the collection row. Pull down the Selection… menu near the top of the page tab, and select Move selected… button. This will open a dialog box where you can select a destination project for the collection. Click a project, then finally the Move button.

Click on the Show button next to the collection’s listing on a project page to go to the Workbench page for your collection. On this page, you can see the collection’s contents, download individual files, and set sharing options.


Previous: Accessing an Arvados VM with SSH - Windows Environments Next: Downloading data

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.