This page describes how to balance keepstore servers using keep-balance. Keep-balance creates new copies of under-replicated blocks, deletes excess copies of over-replicated and unreferenced blocks, and moves blocks to better positions (e.g. after adding new keepstore servers) so clients find them faster.
See the Keep-balance install docs for installation instructions.
The keep-balance service determines which blocks are candidates for deletion and instructs the keepstore to move those blocks to the trash. When a block is newly written, it is protected from deletion for the duration in BlobSigningTTL
. During this time, it cannot be trashed or deleted.
If keep-balance instructs keepstore to trash a block which is older than BlobSigningTTL
, and BlobTrashLifetime
is non-zero, the block will be moved to “trash”. A block which is in the trash is no longer accessible by read requests, but has not yet been permanently deleted. Blocks which are in the trash may be recovered using the “untrash” API endpoint. Blocks are permanently deleted after they have been in the trash for the duration in BlobTrashLifetime
.
Keep-balance is also responsible for balancing the distribution of blocks across keepstore servers by asking servers to pull blocks from other servers (as determined by their storage class and rendezvous hashing order). Pulling a block makes a copy. If a block is overreplicated (i.e. there are excess copies) after pulling, it will be subsequently trashed and deleted on the original server, subject to BlobTrash
and BlobTrashLifetime
settings.
By default, keep-balance operates periodically, i.e. do a scan/balance operation, sleep, repeat.
The Collections.BalancePeriod
value in /etc/arvados/config.yml
determines the interval between start times of successive scan/balance operations. If an operation takes longer than the Collections.BalancePeriod
, the next operation will follow it immediately. If SIGUSR1 is received during an idle period between operations, the next operation will start immediately.
Keep-balance can also be run with the -once
flag to do a single scan/balance operation and then exit. The exit code will be zero if the operation was successful.
For configuring resource usage tuning and lost block reporting, please see the Collections.BlobMissingReport
, Collections.BalanceCollectionBatch
, Collections.BalanceCollectionBuffers
option in the default config.yml file.
The Collections.BalancePullLimit
and Collections.BalanceTrashLimit
configuration entries determine the maximum number of pull and trash operations keep-balance will attempt to apply on each keepstore server. If both values are zero, keep-balance will operate in “dry run” mode, where all changes are computed but none are committed.
Keep-balance does not attempt to discover whether committed pull and trash requests ever get carried out — only that they are accepted by the Keep services. If some services are full, new copies of under-replicated blocks might never get made, only repeatedly requested.
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.