Using arv-copy

This tutorial describes how to copy Arvados objects from one cluster to another by using arv-copy.

Note:

This tutorial assumes that you have access to the Arvados command line tools and have set the API token and confirmed a working environment. .

arv-copy

arv-copy allows users to copy collections and workflows from one cluster to another. By default, arv-copy will recursively go through the workflow and copy all dependencies associated with the object.

For example, let’s copy from the Arvados playground, also known as pirca, to dstcl. The names pirca and dstcl are interchangable with any cluster id. You can find the cluster name from the prefix of the uuid of the object you want to copy. For example, in zzzzz-4zz18-tci4vn4fa95w0zx, the cluster name is zzzzz .

In order to communicate with both clusters, you must create custom configuration files for each cluster. In the Arvados Workbench, click on the dropdown menu icon in the upper right corner of the top navigation menu to access the user settings menu, and click on the menu item Current token. Copy the ARVADOS_API_HOST and ARVADOS_API_TOKEN in both of your clusters. Then, create two configuration files in ~/.config/arvados, one for each cluster. The names of the files must have the format of ClusterID.conf. Navigate to the Current token page on each of pirca and dstcl to get the ARVADOS_API_HOST and ARVADOS_API_TOKEN.

The config file consists of two lines, one for ARVADOS_API_HOST and one for ARVADOS_API_TOKEN:

ARVADOS_API_HOST=zzzzz.arvadosapi.com
ARVADOS_API_TOKEN=v2/zzzzz-gj3su-xxxxxxxxxxxxxxx/123456789abcdefghijkl

Copy your ARVADOS_API_HOST and ARVADOS_API_TOKEN into the config files as shown below in the shell account from which you are executing the commands. In our example, you need two files, ~/.config/arvados/pirca.conf and ~/.config/arvados/dstcl.conf.

Now you’re ready to copy between pirca and dstcl!

How to copy a collection

First, determine the uuid or portable data hash of the collection you want to copy from the source cluster. The uuid can be found in the collection display page in the collection summary area (top left box), or from the URL bar (the part after collections/...)

Now copy the collection from pirca to dstcl. We will use the uuid jutro-4zz18-tv416l321i4r01e as an example. You can find this collection on playground.arvados.org.

~$ arv-copy --src pirca --dst dstcl jutro-4zz18-tv416l321i4r01e
jutro-4zz18-tv416l321i4r01e: 6.1M / 6.1M 100.0%
arvados.arv-copy[1234] INFO: Success: created copy with uuid dstcl-4zz18-xxxxxxxxxxxxxxx

You can also copy by content address:

~$ arv-copy --src pirca --dst dstcl 2463fa9efeb75e099685528b3b9071e0+438
2463fa9efeb75e099685528b3b9071e0+438: 6.1M / 6.1M 100.0%
arvados.arv-copy[1234] INFO: Success: created copy with uuid dstcl-4zz18-xxxxxxxxxxxxxxx

The output of arv-copy displays the uuid of the collection generated in the destination cluster. By default, the output is placed in your home project in the destination cluster. If you want to place your collection in an existing project, you can specify the project you want it to be in using the tag --project-uuid followed by the project uuid.

For example, this will copy the collection to project dstcl-j7d0g-a894213ukjhal12 in the destination cluster.

~$ arv-copy --src pirca --dst dstcl --project-uuid dstcl-j7d0g-a894213ukjhal12 jutro-4zz18-tv416l321i4r01e

How to copy a workflow

We will use the uuid jutro-7fd4e-mkmmq53m1ze6apx as an example workflow.

~$ arv-copy --src jutro --dst pirca --project-uuid pirca-j7d0g-ecak8knpefz8ere jutro-7fd4e-mkmmq53m1ze6apx
ae480c5099b81e17267b7445e35b4bc7+180: 23M / 23M 100.0%
2463fa9efeb75e099685528b3b9071e0+438: 156M / 156M 100.0%
jutro-4zz18-vvvqlops0a0kpdl: 94M / 94M 100.0%
2020-08-19 17:04:13 arvados.arv-copy[4789] INFO:
2020-08-19 17:04:13 arvados.arv-copy[4789] INFO: Success: created copy with uuid pirca-7fd4e-s0tw9rfbkpo2fmx

The name, description, and workflow definition from the original workflow will be used for the destination copy. In addition, any collections and docker images referenced in the source workflow definition will also be copied to the destination.

If you would like to copy the object without dependencies, you can use the --no-recursive flag.


Previous: Trashing and untrashing data Next: Using collection versioning

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.