Access Keep as a GNU/Linux filesystem

GNU/Linux users can use arv-mount or Gnome to mount Keep as a file system in order to access Arvados collections using traditional filesystem tools.

Note:

This tutorial assumes that you have access to the Arvados command line tools and have set the API token and confirmed a working environment. .

  1. Mounting at the command line with arv-mount
  2. Mounting in Gnome File manager

Arv-mount

arv-mount provides a file system view of Arvados Keep using File System in Userspace (FUSE). You can browse, open and read Keep entries as if they are regular files, and existing tools can access files in Keep. Data is streamed on demand. It is not necessary to download an entire file or collection to start processing.

The default mode permits browsing any collection in Arvados as a subdirectory under the mount directory. To avoid having to fetch a potentially large list of all collections, collection directories only come into existence when explicitly accessed by UUID or portable data hash. For instance, a collection may be found by its content hash in the keep/by_id directory.

~$ mkdir -p keep
~$ arv-mount keep
~$ cd keep/by_id/c1bad4b39ca5a924e481008009d94e32+210
~/keep/by_id/c1bad4b39ca5a924e481008009d94e32+210$ ls
var-GS000016015-ASM.tsv.bz2
~/keep/by_id/c1bad4b39ca5a924e481008009d94e32+210$ md5sum var-GS000016015-ASM.tsv.bz2
44b8ae3fde7a8a88d2f7ebd237625b4f  var-GS000016015-ASM.tsv.bz2
~/keep/by_id/c1bad4b39ca5a924e481008009d94e32+210$ cd ../..
~$ fusermount -u keep

The last line unmounts Keep. Subdirectories will no longer be accessible.

In the top level directory of each collection, arv-mount provides a special file called .arvados#collection that contains a JSON-formatted API record for the collection. This can be used to determine the collection’s portable_data_hash, uuid, etc. This file does not show up in ls or ls -a.

Modifying files and directories in Keep

By default, all files in the Keep mount are read only. However, arv-mount --read-write enables you to perform the following operations using normal Unix command line tools (touch, mv, rm, mkdir, rmdir) and your own programs using standard POSIX file system APIs:

  • Create, update, rename and delete individual files within collections
  • Create and delete subdirectories inside collections
  • Move files and directories within and between collections
  • Create and delete collections within a project (using mkdir and rmdir in a project directory)

Not supported:

  • Symlinks, hard links
  • Changing permissions
  • Extended attributes
  • Moving a subdirectory of a collection into a project, or moving a collection from a project into another collection

If multiple clients (separate instances of arv-mount or other arvados applications) modify the same file in the same collection within a short time interval, this may result in a conflict. In this case, the most recent commit wins, and the “loser” will be renamed to a conflict file in the form name~YYYYMMDD-HHMMSS~conflict~.

Please note this feature is in beta testing. In particular, the conflict mechanism is itself currently subject to race conditions with potential for data loss when a collection is being modified simultaneously by multiple clients. This issue will be resolved in future development.

Mounting in Gnome File manager

As an alternative to arv-mount you can also access the WebDAV mount through the Gnome File manager.

  1. Open “Files”
  2. On the left sidebar, click on “Other Locations”
  3. At the bottom of the window, enter davs://collections.ClusterID.example.com/ When prompted for credentials, enter username “arvados” and a valid Arvados token in the Password field.

Previous: Downloading data Next: Access Keep from macOS Finder

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.