API endpoint base: https://pirca.arvadosapi.com/arvados/v1/collections
Object type: 4zz18
Example UUID: zzzzz-4zz18-0123456789abcde
Collections describe sets of files in terms of data blocks stored in Keep. See Keep – Content-Addressable Storage and using collection versioning for details.
Each collection has, in addition to the Common resource fields:
Attribute | Type | Description | Example |
---|---|---|---|
name | string | ||
description | text | Free text description of the group. May be HTML formatted, must be appropriately sanitized before display. | |
properties | hash | User-defined metadata, may be used in queries using subproperty filters | |
portable_data_hash | string | The MD5 sum of the manifest text stripped of block hints other than the size hint. | |
manifest_text | text | The manifest describing how to assemble blocks into files, in the Arvados manifest format | |
replication_desired | number | Minimum storage replication level desired for each data block referenced by this collection. A value of null signifies that the site default replication level (typically 2) is desired. |
2 |
replication_confirmed | number | Replication level most recently confirmed by the storage system. This field is null when a collection is first created, and is reset to null when the manifest_text changes in a way that introduces a new data block. An integer value indicates the replication level of the least replicated data block in the collection. | 2 , null |
replication_confirmed_at | datetime | When replication_confirmed was confirmed. If replication_confirmed is null, this field is also null. |
|
storage_classes_desired | list | An optional list of storage class names where the blocks should be saved. If not provided, the cluster’s default storage class(es) will be set. | ['archival'] |
storage_classes_confirmed | list | Storage classes most recently confirmed by the storage system. This field is an empty list when a collection is first created. | 'archival'] , [] |
storage_classes_confirmed_at | datetime | When storage_classes_confirmed was confirmed. If storage_classes_confirmed is [] , this field is null. |
|
trash_at | datetime | If trash_at is non-null and in the past, this collection will be hidden from API calls. May be untrashed. |
|
delete_at | datetime | If delete_at is non-null and in the past, the collection may be permanently deleted. |
|
is_trashed | boolean | True if trash_at is in the past, false if not. |
|
current_version_uuid | string | UUID of the collection’s current version. On new collections, it’ll be equal to the uuid attribute. |
|
version | number | Version number, starting at 1 on new collections. This attribute is read-only. | |
preserve_version | boolean | When set to true on a current version, it will be persisted. When passing true as part of a bigger update call, both current and newly created versions are persisted. |
|
file_count | number | The total number of files in the collection. This attribute is read-only. | |
file_size_total | number | The sum of the file sizes in the collection. This attribute is read-only. |
If a new portable_data_hash
is specified when creating or updating a Collection, it must match the cryptographic digest of the supplied manifest_text
.
Referenced blocks are protected from garbage collection in Keep.
Data can be shared with other users via the Arvados permission model.
Collections can be trashed by updating the record and setting the trash_at
field, or with the delete method. The delete method sets trash_at
to “now”.
The value of trash_at
can be set to a time in the future as a feature to automatically expire collections.
When trash_at
is set, delete_at
will also be set. Normally delete_at = trash_at + Collections.DefaultTrashLifetime
. When the trash_at
time is past but delete_at
is in the future, the trashed collection is invisible to most API calls unless the include_trash
parameter is true. Collections in the trashed state can be untrashed so long as delete_at
has not past. Collections are also trashed if they are contained in a trashed group
Once delete_at
is past, the collection and all of its previous versions will be deleted permanently and can no longer be untrashed.
The replace_files
option can be used with the create and update APIs to efficiently and atomically copy individual files and directory trees from other collections, copy/rename/delete items within an existing collection, and add new items to a collection.
replace_files
keys indicate target paths in the new collection, and values specify sources that should be copied to the target paths.
/
. It must not contain .
or ..
components, consecutive /
characters, or a trailing /
after the final component.<PDH>/<path>
where <PDH>
is the portable data hash of a collection on the cluster and <path>
is a file or directory in that collection,manifest_text/<path>
where <path>
is an existing file or directory in a collection supplied in the manifest_text
attribute in the request, orcurrent/<path>
where <path>
is an existing file or directory in the collection being updated.In an update
request, sources may reference the current portable data hash of the collection being updated. However, in many cases it is more appropriate to use a current/<path>
source instead, to ensure the latest content is used even if the collection has been updated since the PDH was last retrieved.
Delete foo.txt
.
"replace_files": { "/foo.txt": "" }
Rename foo.txt
to bar.txt
.
"replace_files": { "/foo.txt": "", "/bar.txt": "current/foo.txt" }
Swap contents of files foo
and bar
.
"replace_files": { "/foo": "current/bar", "/bar": "current/foo" }
"replace_files": { "/new_directory/new_file.txt": "manifest_text/new_file.txt" }, "collection": { "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n" }
Note this is equivalent to omitting the replace_files
argument.
"replace_files": { "/": "manifest_text/" }, "collection": { "manifest_text": "./new_directory acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n" }
Rename current_file.txt
to old_file.txt
and replace current_file.txt
with new content, all in a single atomic operation.
"replace_files": { "/current_file.txt": "manifest_text/new_file.txt", "/old_file.txt": "current/current_file.txt" }, "collection": { "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n" }
Delete all current content, then copy content from other collections into new subdirectories.
"replace_files": { "/": "", "/copy of collection 1": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/", "/copy of collection 2": "ea10d51bcf88862dbcc36eb292017dfd+45/" }
Replace all current content with a copy of a subdirectory from another collection.
"replace_files": { "/": "1f4b0bc7583c2a7f9102c395f4ffc5e3+45/subdir" }
A target path with a non-empty source cannot be the ancestor of another target path in the same request. For example, the following request is invalid:
"replace_files": { "/foo": "fa7aeb5140e2848d39b416daeef4ffc5+45/", "/foo/this_will_return_an_error": "" }
It is an error to supply a non-empty manifest_text
that is unused, i.e., the replace_files
argument does not contain any values beginning with "manifest_text/"
. For example, the following request is invalid:
"replace_files": { "/foo": "current/bar" }, "collection": { "manifest_text": ". acbd18db4cc2f85cedef654fccc4a4d8+3+A82740cd577ff5745925af5780de5992cbb25d937@668efec4 0:3:new_file.txt\n" }
Collections on other clusters in a federation cannot be used as sources. Each source must exist on the current cluster and be readable by the current user.
Similarly, if manifest_text
is provided, it must only reference data blocks that are stored on the current cluster. This API does not copy data from other clusters in a federation.
See Common resource methods for more information about create
, delete
, get
, list
, and update
.
Required arguments are displayed in green.
Supports federated get
only, which may be called with either a uuid or a portable data hash. When requesting a portable data hash which is not available on the home cluster, the query is forwarded to all the clusters listed in RemoteClusters
and returns the first successful result.
Create a new Collection.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
collection | object | query | ||
replace_files | object | Initialize files and directories with new content and/or content from other collections | query |
The new collection’s content can be initialized by providing a manifest_text
key in the provided collection
object, or by using the replace_files
option.
Put a Collection in the trash. This sets the trash_at
field to now
and delete_at
field to now
+ token TTL. A trashed collection is invisible to most API calls unless the include_trash
parameter is true.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection in question. | path |
Gets a Collection’s metadata by UUID or portable data hash. When making a request by portable data hash, attributes other than portable_data_hash
, manifest_text
, and trash_at
are not returned, even when requested explicitly using the select
parameter.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID or portable data hash of the Collection in question. | path |
List collections.
See common resource list method.
Argument | Type | Description | Location | Example |
---|---|---|---|---|
include_trash | boolean (default false) | Include trashed collections. | query | |
include_old_versions | boolean (default false) | Include past versions of the collection(s) being listed, if any. | query |
Note: Because adding access tokens to manifests can be computationally expensive, the manifest_text
field is not included in results by default. If you need it, pass a select
parameter that includes manifest_text
.
You can search collections for specific file or directory names (whole or part) using the following filter in a list
query.
filters: [["file_names", "ilike", "%sample1234.fastq%"]]
Note: file_names
is a hidden field used for indexing. It is not returned by any API call. On the client, you can programmatically enumerate all the files in a collection using arv-ls
, the Python SDK Collection
class, Go SDK FileSystem
struct, the WebDAV API, or the S3-compatible API.
As of this writing (Arvados 2.4), you can also search for directory paths, but not complete file paths.
In other words, this will work (when dir3
is a directory):
filters: [["file_names", "ilike", "%dir1/dir2/dir3%"]]
However, this will not return the desired results (where sample1234.fastq
is a file):
filters: [["file_names", "ilike", "%dir1/dir2/dir3/sample1234.fastq%"]]
As a workaround, you can search for both the directory path and file name separately, and then filter on the client side.
filters: [["file_names", "ilike", "%dir1/dir2/dir3%"], ["file_names", "ilike", "%sample1234.fastq%"]]
Update attributes of an existing Collection.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection in question. | path | |
collection | object | query | ||
replace_files | object | Add, delete, and replace files and directories with new content and/or content from other collections | query |
The collection’s existing content can be replaced entirely by providing a manifest_text
key in the provided collection
object, or updated in place by using the replace_files
option.
Remove a Collection from the trash. This sets the trash_at
and delete_at
fields to null
.
Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection to untrash. | path | |
ensure_unique_name | boolean (default false) | Rename collection uniquely if untrashing it would fail with a unique name conflict. | query |
Returns a list of objects in the database that directly or indirectly contributed to producing this collection, such as the container request that produced this collection as output.
The general algorithm is:
output_uuid
or log_uuid
attributes of the container request)mounts
and container_image
of the container request)Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection to get provenance. | path |
Returns a list of objects in the database this collection directly or indirectly contributed to, such as containers that takes this collection as input.
The general algorithm is:
mounts
or container_image
of the container)output
or log
of the container)Arguments:
Argument | Type | Description | Location | Example |
---|---|---|---|---|
uuid | string | The UUID of the Collection to get usage. | path |
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.