Storage classes (alternately known as “storage tiers”) allow you to control which volumes should be used to store particular collection data blocks. This can be used to implement data storage policies such as moving data to archival storage.
In the default Arvados configuration, with no storage classes specified in the configuration file, all volumes belong to a single implicit storage class called “default”. Apart from that, names of storage classes are internal to the cluster and decided by the administrator. Other than the implicit “default” class, Arvados currently does not define any standard storage class names.
To use multiple storage classes, update the StorageClasses
and Volumes
sections of your configuration file.
StorageClasses
section.StorageClasses
section must use Default: true
to indicate at least one default storage class. When a client/user does not specify storage classes when creating a new collection, the default storage classes are used implicitly.Priority
to the faster ones. When reading data, volumes with high priority storage classes are searched first.Example:
StorageClasses: default: # When reading a block that is stored on multiple volumes, # prefer a volume with this class. Priority: 20 # When a client does not specify a storage class when saving a # new collection, use this one. Default: true archival: Priority: 10 Volumes: ClusterID-nyw5e-000000000000000: # This volume is in the "default" storage class. StorageClasses: default: true ClusterID-nyw5e-000000000000001: # This volume is in the "archival" storage class. StorageClasses: archival: true
Refer to the configuration reference for more details.
When uploading data, if a data block cannot be uploaded to all desired storage classes, it will result in a fatal error. Data blocks will not be uploaded to volumes that do not have the desired storage class.
If you change the storage classes for a collection, the data is not moved immediately. The keep-balance service is responsible for deciding which blocks should be placed on which keepstore volumes. As part of the rebalancing behavior, it will determine where a block should go in order to satisfy the desired storage classes, and issue pull requests to copy the block from its original volume to the desired volume. The block will subsequently be moved to trash on the original volume.
If a block is assigned to multiple storage classes, the block will be stored on desired_replication
number of volumes for storage class, even if that results in overreplication.
If a collection has a desired storage class which is not available in any keepstore volume, the collection’s blocks will remain in place, and an error will appear in the keep-balance
logs.
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.