Using arv-copy

Note:

This section assumes the legacy Jobs API is available. Some newer installations have already disabled the Jobs API in favor of the Containers API.

On those sites, the “copy a pipeline template” feature described below is not available. However, “copy a workflow” feature is not yet implemented.

This tutorial describes how to copy Arvados objects from one cluster to another by using arv-copy.

Note:

This tutorial assumes that you are logged into an Arvados VM instance (instructions for Webshell or Unix or Windows) or you have installed the Arvados FUSE Driver and Python SDK on your workstation and have a working environment.

arv-copy

arv-copy allows users to copy collections and pipeline templates from one cluster to another. By default, arv-copy will recursively go through a template and copy all dependencies associated with the object.

For example, let’s copy from the Arvados playground, also known as qr1hi, to dst_cluster. The names qr1hi and dst_cluster are interchangable with any cluster name. You can find the cluster name from the prefix of the uuid of the object you want to copy. For example, in qr1hi-4zz18-tci4vn4fa95w0zx, the cluster name is qr1hi.

In order to communicate with both clusters, you must create custom configuration files for each cluster. In the Arvados Workbench, click on the dropdown menu icon in the upper right corner of the top navigation menu to access the user settings menu, and click on the menu item Current token. Copy the ARVADOS_API_HOST and ARVADOS_API_TOKEN in both of your clusters. Then, create two configuration files, one for each cluster. The names of the files must have the format of uuid_prefix.conf. In our example, let’s make two files, one for qr1hi and one for dst_cluster. From your Current token page in qr1hi and dst_cluster, copy the ARVADOS_API_HOST and ARVADOS_API_TOKEN.

Copy your ARVADOS_API_HOST and ARVADOS_API_TOKEN into the config files as shown below in the shell account from which you are executing the commands. For example, the default shell you may have access to is shell.qr1hi. You can add these files in ~/.config/arvados/ in the qr1hi shell terminal.

~$ cd ~/.config/arvados
~$ echo "ARVADOS_API_HOST=qr1hi.arvadosapi.com" >> qr1hi.conf
~$ echo "ARVADOS_API_TOKEN=123456789abcdefghijkl" >> qr1hi.conf
~$ echo "ARVADOS_API_HOST=dst_cluster.arvadosapi.com" >> dst_cluster.conf
~$ echo "ARVADOS_API_TOKEN=987654321lkjihgfedcba" >> dst_cluster.conf

Now you’re ready to copy between qr1hi and dst_cluster!

How to copy a collection

First, select the uuid of the collection you want to copy from the source cluster. The uuid can be found in the collection display page in the collection summary area (top left box), or from the URL bar (the part after collections/...)

Now copy the collection from qr1hi to dst_cluster. We will use the uuid qr1hi-4zz18-tci4vn4fa95w0zx as an example. You can find this collection in the lobSTR v.3 project on playground.arvados.org.

~$ arv-copy --src qr1hi --dst dst_cluster qr1hi-4zz18-tci4vn4fa95w0zx
qr1hi-4zz18-tci4vn4fa95w0zx: 6.1M / 6.1M 100.0%
arvados.arv-copy[1234] INFO: Success: created copy with uuid dst_cluster-4zz18-8765943210cdbae

The output of arv-copy displays the uuid of the collection generated in the destination cluster. By default, the output is placed in your home project in the destination cluster. If you want to place your collection in a pre-created project, you can specify the project you want it to be in using the tag --project-uuid followed by the project uuid.

For example, this will copy the collection to project dst_cluster-j7d0g-a894213ukjhal12 in the destination cluster.

~$ arv-copy --src qr1hi --dst dst_cluster --project-uuid dst_cluster-j7d0g-a894213ukjhal12 qr1hi-4zz18-tci4vn4fa95w0zx

How to copy a pipeline template

Note:

As stated above, arv-copy is recursive by default and requires a working git repository in the destination cluster. If you do not have a repository created, you can follow the Adding a new repository page. We will use the tutorial repository created in that page as the example.


In addition, arv-copy requires git when copying to a git repository. Please make sure that git is installed and available.

We will use the uuid qr1hi-p5p6p-9pkaxt6qjnkxhhu as an example pipeline template.

~$ arv-copy --src qr1hi --dst dst_cluster --dst-git-repo $USER/tutorial qr1hi-p5p6p-9pkaxt6qjnkxhhu
To git@git.dst_cluster.arvadosapi.com:$USER/tutorial.git
 * [new branch] git_git_qr1hi_arvadosapi_com_arvados_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d -> git_git_qr1hi_arvadosapi_com_arvados_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d
arvados.arv-copy[19694] INFO: Success: created copy with uuid dst_cluster-p5p6p-rym2h5ub9m8ofwj

New branches in the destination git repo will be created for each branch used in the pipeline template. For example, if your source branch was named ac21f0d45a76294aaca0c0c0fdf06eb72d03368d, your new branch will be named git_git_qr1hi_arvadosapi_com_reponame_git_ac21f0d45a76294aaca0c0c0fdf06eb72d03368d.

By default, if you copy a pipeline template recursively, you will find that the template as well as all the dependencies are in your home project.

If you would like to copy the object without dependencies, you can use the --no-recursive tag.

For example, we can copy the same object using this tag.

~$ arv-copy --src qr1hi --dst dst_cluster --dst-git-repo $USER/tutorial --no-recursive qr1hi-p5p6p-9pkaxt6qjnkxhhu

How to copy a workflow

We will use the uuid zzzzz-7fd4e-sampleworkflow1 as an example workflow.

~$ arv-copy --src zzzzz --dst dst_cluster --dst-git-repo $USER/tutorial zzzzz-7fd4e-sampleworkflow1
zzzzz-4zz18-jidprdejysravcr: 1143M / 1143M 100.0%
2017-01-04 04:11:58 arvados.arv-copy[5906] INFO:
2017-01-04 04:11:58 arvados.arv-copy[5906] INFO: Success: created copy with uuid dst_cluster-7fd4e-ojtgpne594ubkt7

The name, description, and workflow definition from the original workflow will be used for the destination copy. In addition, any locations and docker images found in the src workflow definition will also be copied to the destination recursively.

If you would like to copy the object without dependencies, you can use the --no-recursive flag.

For example, we can copy the same object non-recursively using the following:

~$ arv-copy --src zzzzz --dst dst_cluster --dst-git-repo $USER/tutorial --no-recursive zzzzz-7fd4e-sampleworkflow1

Previous: Keep collection lifecycle Next: Using storage classes

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.