API endpoint base: https://pirca.arvadosapi.com/arvados/v1/collections
Object type: 4zz18
Example UUID: zzzzz-4zz18-0123456789abcde
Collections describe sets of files in terms of data blocks stored in Keep. See Keep – Content-Addressable Storage for details.
Each collection has, in addition to the Common resource fields:
Attribute | Type | Description | Example |
---|---|---|---|
name | string | ||
description | text | ||
properties | hash | User-defined metadata, may be used in queries using subproperty filters | |
portable_data_hash | string | The MD5 sum of the manifest text stripped of block hints other than the size hint. | |
manifest_text | text | ||
replication_desired | number | Minimum storage replication level desired for each data block referenced by this collection. A value of null signifies that the site default replication level (typically 2) is desired. |
2 |
replication_confirmed | number | Replication level most recently confirmed by the storage system. This field is null when a collection is first created, and is reset to null when the manifest_text changes in a way that introduces a new data block. An integer value indicates the replication level of the least replicated data block in the collection. | 2 , null |
replication_confirmed_at | datetime | When replication_confirmed was confirmed. If replication_confirmed is null, this field is also null. |
|
storage_classes_desired | list | An optional list of storage class names where the blocks should be saved. If not provided, the cluster’s default storage class(es) will be set. | ['archival'] |
storage_classes_confirmed | list | Storage classes most recently confirmed by the storage system. This field is an empty list when a collection is first created. | 'archival'] , [] |
storage_classes_confirmed_at | datetime | When storage_classes_confirmed was confirmed. If storage_classes_confirmed is [] , this field is null. |
|
trash_at | datetime | If trash_at is non-null and in the past, this collection will be hidden from API calls. May be untrashed. |
|
delete_at | datetime | If delete_at is non-null and in the past, the collection may be permanently deleted. |
|
is_trashed | boolean | True if trash_at is in the past, false if not. |
|
current_version_uuid | string | UUID of the collection’s current version. On new collections, it’ll be equal to the uuid attribute. |
|
version | number | Version number, starting at 1 on new collections. This attribute is read-only. | |
preserve_version | boolean | When set to true on a current version, it will be persisted. When passing true as part of a bigger update call, both current and newly created versions are persisted. |
|
file_count | number | The total number of files in the collection. This attribute is read-only. | |
file_size_total | number | The sum of the file sizes in the collection. This attribute is read-only. |
If a new portable_data_hash
is specified when creating or updating a Collection, it must match the cryptographic digest of the supplied manifest_text
.
Referenced blocks are protected from garbage collection in Keep.
Data can be shared with other users via the Arvados permission model.
See Common resource methods for more information about create
, delete
, get
, list
, and update
.
Required arguments are displayed in green.
Supports federated get
only, which may be called with either a uuid or a portable data hash. When requesting a portable data hash which is not available on the home cluster, the query is forwarded to all the clusters listed in RemoteClusters
and returns the first successful result.
Create a new Collection.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
collection | object | query | ||
replace_files | object | Initialize files and directories using content from other collections | query |
The new collection’s content can be initialized by providing a manifest_text
key in the provided collection
object, or by using the replace_files
option (see replace_files below).
Put a Collection in the trash. This sets the trash_at
field to now
and delete_at
field to now
+ token TTL. A trashed collection is invisible to most API calls unless the include_trash
parameter is true.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection in question. | path |
Gets a Collection’s metadata by UUID or portable data hash. When making a request by portable data hash, attributes other than portable_data_hash
and manifest_text
are not returned, even when requested explicitly using the select
parameter.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID or portable data hash of the Collection in question. | path |
List collections.
See common resource list method.
Argument | Type | Description | Location | Example |
---|---|---|---|---|
include_trash | boolean (default false) | Include trashed collections. | query | |
include_old_versions | boolean (default false) | Include past versions of the collection(s) being listed, if any. | query |
Note: Because adding access tokens to manifests can be computationally expensive, the manifest_text
field is not included in results by default. If you need it, pass a select
parameter that includes manifest_text
.
You can search collections for specific file or directory names (whole or part) using the following filter in a list
query.
filters: [["file_names", "ilike", "%sample1234.fastq%"]]
Note: file_names
is a hidden field used for indexing. It is not returned by any API call. On the client, you can programmatically enumerate all the files in a collection using arv-ls
, the Python SDK Collection
class, Go SDK FileSystem
struct, the WebDAV API, or the S3-compatible API.
As of this writing (Arvados 2.4), you can also search for directory paths, but not complete file paths.
In other words, this will work (when dir3
is a directory):
filters: [["file_names", "ilike", "%dir1/dir2/dir3%"]]
However, this will not return the desired results (where sample1234.fastq
is a file):
filters: [["file_names", "ilike", "%dir1/dir2/dir3/sample1234.fastq%"]]
As a workaround, you can search for both the directory path and file name separately, and then filter on the client side.
filters: [["file_names", "ilike", "%dir1/dir2/dir3%"], ["file_names", "ilike", "%sample1234.fastq%"]]
Update attributes of an existing Collection.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection in question. | path | |
collection | object | query | ||
replace_files | object | Delete and replace files and directories using content from other collections | query |
The collection’s content can be updated by providing a manifest_text
key in the provided collection
object, or by using the replace_files
option (see replace_files below).
Remove a Collection from the trash. This sets the trash_at
and delete_at
fields to null
.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection to untrash. | path | |
ensure_unique_name | boolean (default false) | Rename collection uniquely if untrashing it would fail with a unique name conflict. | query |
Returns a list of objects in the database that directly or indirectly contributed to producing this collection, such as the container request that produced this collection as output.
The general algorithm is:
output_uuid
or log_uuid
attributes of the container request)mounts
and container_image
of the container request)Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection to get provenance. | path |
Returns a list of objects in the database this collection directly or indirectly contributed to, such as containers that takes this collection as input.
The general algorithm is:
mounts
or container_image
of the container)output
or log
of the container)Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection to get usage. | path |
The replace_files
option can be used with the create
and update
APIs to efficiently copy individual files and directory trees from other collections, and copy/rename/delete items within an existing collection, without transferring any file data.
replace_files
keys indicate target paths in the new collection, and values specify sources that should be copied to the target paths.
/
. It must not contain .
or ..
components, consecutive /
characters, or a trailing /
after the final component.PDH/path
where PDH
is the portable data hash of a collection on the cluster and /path
is a file or directory in that collection.update
request, sources may reference the current portable data hash of the collection being updated.Example: delete foo.txt
from a collection
"replace_files": { "/foo.txt": "" }
Example: rename foo.txt
to bar.txt
in a collection with portable data hash fa7aeb5140e2848d39b416daeef4ffc5+45
"replace_files": { "/foo.txt": "", "/bar.txt": "fa7aeb5140e2848d39b416daeef4ffc5+45/foo.txt" }
Example: delete current contents, then add content from multiple collections
"replace_files": { "/": "", "/copy of collection 1": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/", "/copy of collection 2": "ea10d51bcf88862dbcc36eb292017dfd+45/" }
Example: replace entire collection with a copy of a subdirectory from another collection
"replace_files": { "/": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/subdir" }
A target path with a non-empty source cannot be the ancestor of another target path in the same request. For example, the following request is invalid:
"replace_files": { "/foo": "fa7aeb5140e2848d39b416daeef4ffc5+45/", "/foo/this_will_return_an_error": "" }
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.