Configuring storage classes

Storage classes (alternately known as “storage tiers”) allow you to control which volumes should be used to store particular collection data blocks. This can be used to implement data storage policies such as moving data to archival storage.

In the default Arvados configuration, with no storage classes specified in the configuration file, all volumes belong to a single implicit storage class called “default”. Apart from that, names of storage classes are internal to the cluster and decided by the administrator. Other than the implicit “default” class, Arvados currently does not define any standard storage class names.

To use multiple storage classes, update the StorageClasses and Volumes sections of your configuration file.

  • Every storage class you use (including “default”) must be defined in the StorageClasses section.
  • The StorageClasses section must use Default: true to indicate at least one default storage class. When a client/user does not specify storage classes when creating a new collection, the default storage classes are used implicitly.
  • If some storage classes are faster or cheaper to access than others, assign a higher Priority to the faster ones. When reading data, volumes with high priority storage classes are searched first.

Example:

    StorageClasses:

      default:
        # When reading a block that is stored on multiple volumes,
        # prefer a volume with this class.
        Priority: 20

        # When a client does not specify a storage class when saving a
        # new collection, use this one.
        Default: true

      archival:
        Priority: 10

    Volumes:

      ClusterID-nyw5e-000000000000000:
        # This volume is in the "default" storage class.
        StorageClasses:
          default: true

      ClusterID-nyw5e-000000000000001:
        # This volume is in the "archival" storage class.
        StorageClasses:
          archival: true

Refer to the configuration reference for more details.

Using storage classes

Discussed in the user guide

Storage management notes

When uploading data, if a data block cannot be uploaded to all desired storage classes, it will result in a fatal error. Data blocks will not be uploaded to volumes that do not have the desired storage class.

If you change the storage classes for a collection, the data is not moved immediately. The keep-balance service is responsible for deciding which blocks should be placed on which keepstore volumes. As part of the rebalancing behavior, it will determine where a block should go in order to satisfy the desired storage classes, and issue pull requests to copy the block from its original volume to the desired volume. The block will subsequently be moved to trash on the original volume.

If a block is assigned to multiple storage classes, the block will be stored on desired_replication number of volumes for storage class, even if that results in overreplication.

If a collection has a desired storage class which is not available in any keepstore volume, the collection’s blocks will remain in place, and an error will appear in the keep-balance logs.


Previous: Metadata vocabulary Next: Recovering data

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.