collections

API endpoint base: https://pirca.arvadosapi.com/arvados/v1/collections

Object type: 4zz18

Example UUID: zzzzz-4zz18-0123456789abcde

Resource

Collections describe sets of files in terms of data blocks stored in Keep. See Keep – Content-Addressable Storage for details.

Each collection has, in addition to the Common resource fields:

Attribute Type Description Example
name string
description text
properties hash User-defined metadata, may be used in queries using subproperty filters
portable_data_hash string The MD5 sum of the manifest text stripped of block hints other than the size hint.
manifest_text text
replication_desired number Minimum storage replication level desired for each data block referenced by this collection. A value of null signifies that the site default replication level (typically 2) is desired. 2
replication_confirmed number Replication level most recently confirmed by the storage system. This field is null when a collection is first created, and is reset to null when the manifest_text changes in a way that introduces a new data block. An integer value indicates the replication level of the least replicated data block in the collection. 2, null
replication_confirmed_at datetime When replication_confirmed was confirmed. If replication_confirmed is null, this field is also null.
storage_classes_desired list An optional list of storage class names where the blocks should be saved. If not provided, the cluster’s default storage class(es) will be set. ['archival']
storage_classes_confirmed list Storage classes most recently confirmed by the storage system. This field is an empty list when a collection is first created. 'archival'], []
storage_classes_confirmed_at datetime When storage_classes_confirmed was confirmed. If storage_classes_confirmed is [], this field is null.
trash_at datetime If trash_at is non-null and in the past, this collection will be hidden from API calls. May be untrashed.
delete_at datetime If delete_at is non-null and in the past, the collection may be permanently deleted.
is_trashed boolean True if trash_at is in the past, false if not.
current_version_uuid string UUID of the collection’s current version. On new collections, it’ll be equal to the uuid attribute.
version number Version number, starting at 1 on new collections. This attribute is read-only.
preserve_version boolean When set to true on a current version, it will be persisted. When passing true as part of a bigger update call, both current and newly created versions are persisted.
file_count number The total number of files in the collection. This attribute is read-only.
file_size_total number The sum of the file sizes in the collection. This attribute is read-only.

Conditions of creating a Collection

If a new portable_data_hash is specified when creating or updating a Collection, it must match the cryptographic digest of the supplied manifest_text.

Side effects of creating a Collection

Referenced blocks are protected from garbage collection in Keep.

Data can be shared with other users via the Arvados permission model.

Methods

See Common resource methods for more information about create, delete, get, list, and update.

Required arguments are displayed in green.

Supports federated get only, which may be called with either a uuid or a portable data hash. When requesting a portable data hash which is not available on the home cluster, the query is forwarded to all the clusters listed in RemoteClusters and returns the first successful result.

create

Create a new Collection.

Arguments:

Argument Type Description Location Example
collection object query
replace_files object Initialize files and directories using content from other collections query

The new collection’s content can be initialized by providing a manifest_text key in the provided collection object, or by using the replace_files option (see replace_files below).

delete

Put a Collection in the trash. This sets the trash_at field to now and delete_at field to now + token TTL. A trashed collection is invisible to most API calls unless the include_trash parameter is true.

Arguments:

Argument Type Description Location Example
uuid string The UUID of the Collection in question. path

get

Gets a Collection’s metadata by UUID or portable data hash. When making a request by portable data hash, attributes other than portable_data_hash and manifest_text are not returned, even when requested explicitly using the select parameter.

Arguments:

Argument Type Description Location Example
uuid string The UUID or portable data hash of the Collection in question. path

list

List collections.

See common resource list method.

Argument Type Description Location Example
include_trash boolean (default false) Include trashed collections. query
include_old_versions boolean (default false) Include past versions of the collection(s) being listed, if any. query

Note: Because adding access tokens to manifests can be computationally expensive, the manifest_text field is not included in results by default. If you need it, pass a select parameter that includes manifest_text.

Searching Collections for names of file or directories

You can search collections for specific file or directory names (whole or part) using the following filter in a list query.

filters: [["file_names", "ilike", "%sample1234.fastq%"]]

Note: file_names is a hidden field used for indexing. It is not returned by any API call. On the client, you can programmatically enumerate all the files in a collection using arv-ls, the Python SDK Collection class, Go SDK FileSystem struct, the WebDAV API, or the S3-compatible API.

As of this writing (Arvados 2.4), you can also search for directory paths, but not complete file paths.

In other words, this will work (when dir3 is a directory):

filters: [["file_names", "ilike", "%dir1/dir2/dir3%"]]

However, this will not return the desired results (where sample1234.fastq is a file):

filters: [["file_names", "ilike", "%dir1/dir2/dir3/sample1234.fastq%"]]

As a workaround, you can search for both the directory path and file name separately, and then filter on the client side.

filters: [["file_names", "ilike", "%dir1/dir2/dir3%"], ["file_names", "ilike", "%sample1234.fastq%"]]

update

Update attributes of an existing Collection.

Arguments:

Argument Type Description Location Example
uuid string The UUID of the Collection in question. path
collection object query
replace_files object Delete and replace files and directories using content from other collections query

The collection’s content can be updated by providing a manifest_text key in the provided collection object, or by using the replace_files option (see replace_files below).

untrash

Remove a Collection from the trash. This sets the trash_at and delete_at fields to null.

Arguments:

Argument Type Description Location Example
uuid string The UUID of the Collection to untrash. path
ensure_unique_name boolean (default false) Rename collection uniquely if untrashing it would fail with a unique name conflict. query

provenance

Returns a list of objects in the database that directly or indirectly contributed to producing this collection, such as the container request that produced this collection as output.

The general algorithm is:

  1. Visit the container request that produced this collection (via output_uuid or log_uuid attributes of the container request)
  2. Visit the input collections to that container request (via mounts and container_image of the container request)
  3. Iterate until there are no more objects to visit

Arguments:

Argument Type Description Location Example
uuid string The UUID of the Collection to get provenance. path

used_by

Returns a list of objects in the database this collection directly or indirectly contributed to, such as containers that takes this collection as input.

The general algorithm is:

  1. Visit containers that take this collection as input (via mounts or container_image of the container)
  2. Visit collections produced by those containers (via output or log of the container)
  3. Iterate until there are no more objects to visit

Arguments:

Argument Type Description Location Example
uuid string The UUID of the Collection to get usage. path

Using “replace_files” to create/update collections

The replace_files option can be used with the create and update APIs to efficiently copy individual files and directory trees from other collections, and copy/rename/delete items within an existing collection, without transferring any file data.

replace_files keys indicate target paths in the new collection, and values specify sources that should be copied to the target paths.

  • Each target path must be an absolute canonical path beginning with /. It must not contain . or .. components, consecutive / characters, or a trailing / after the final component.
  • Each source must be either an empty string (signifying that the target path is to be deleted), or PDH/path where PDH is the portable data hash of a collection on the cluster and /path is a file or directory in that collection.
  • In an update request, sources may reference the current portable data hash of the collection being updated.

Example: delete foo.txt from a collection

"replace_files": {
  "/foo.txt": ""
}

Example: rename foo.txt to bar.txt in a collection with portable data hash fa7aeb5140e2848d39b416daeef4ffc5+45

"replace_files": {
  "/foo.txt": "",
  "/bar.txt": "fa7aeb5140e2848d39b416daeef4ffc5+45/foo.txt"
}

Example: delete current contents, then add content from multiple collections

"replace_files": {
  "/": "",
  "/copy of collection 1": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/",
  "/copy of collection 2": "ea10d51bcf88862dbcc36eb292017dfd+45/"
}

Example: replace entire collection with a copy of a subdirectory from another collection

"replace_files": {
  "/": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/subdir"
}

A target path with a non-empty source cannot be the ancestor of another target path in the same request. For example, the following request is invalid:

"replace_files": {
  "/foo": "fa7aeb5140e2848d39b416daeef4ffc5+45/",
  "/foo/this_will_return_an_error": ""
}

Previous: Projects and filter groups Next: repositories

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.