When there is a large number of unneeded blocks stored in an S3 bucket, particularly when using PrefixLength: 0
, the speed of garbage collection can be severely limited by AWS API rate limits and Arvados’s multi-step trash/delete process.
The multi-step trash/delete process can be short-circuited by setting BlobTrashLifetime
to zero and enabling UnsafeDelete
on S3-backed volumes. However, on an actively used cluster such a configuration can result in data loss in the rare case where a given block is trashed and then rewritten soon afterward, and S3 processes the write and delete requests in the opposite order.
The following steps can be used to temporarily disable writes on an S3 bucket to enable faster garbage collection without data loss or service interruption. Note that garbage collection on other S3 volumes will be temporarily disabled during this procedure.
PrefixLength: 3
for the new volume because this results in a much higher rate limit for I/O and garbage collection operations compared to the default PrefixLength: 0
. If the target volume configuration specifies StorageClasses
, use the same values for the new volume.keep-balance
service.
Collections:
BlobTrashLifetime: 0
BalancePullLimit: 0
[...]
Volumes:
target-volume-uuid:
ReadOnly: true
AllowTrashWhenReadOnly: true
DriverParameters:
UnsafeDelete: true
Note that BlobTrashLifetime: 0
instructs keepstore to delete unneeded blocks outright (bypassing the recoverable trash phase); however, in this mode it will normally not trash any blocks at all on an S3 volume due to the safety issue mentioned above, unless the volume is configured with UnsafeDelete: true
.keepstore
services with the updated configuration.keep-balance
service.keep-balance
logs and metrics. When garbage collection is complete, keep-balance logs will show an empty changeset: zzzzz-bi6l4-0123456789abcdef (keep0.zzzzz.arvadosapi.com:25107, disk): ChangeSet{Pulls:0, Trashes:0}
UnsafeDelete
configuration entry on the target volume.BlobTrashLifetime
configuration entry (or restore it to its previous value).PrefixLength: 0
and the new volume has PrefixLength: 3
, skip the next two steps: new data will be stored on the new volume, some existing data will be moved automatically to other volumes, and some will be left on the target volume as long as it’s needed.ReadOnly: false
and AllowTrashWhenReadOnly: false
on the target volume.ReadOnly: true
and AllowTrashWhenReadOnly: true
on the new volume.BalancePullLimit
configuration entry (or restore its previous value), and restart keep-balance
.keepstore
services with the updated configuration.
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.