This page describes how to balance keepstore servers using keep-balance. Keep-balance creates new copies of under-replicated blocks, deletes excess copies of over-replicated and unreferenced blocks, and moves blocks to better positions (e.g. after adding new keepstore servers) so clients find them faster.
See the Keep-balance install docs for installation instructions.
The keep-balance service determines which blocks are candidates for deletion and instructs the keepstore to move those blocks to the trash. When a block is newly written, it is protected from deletion for the duration in
BlobSigningTTL. During this time, it cannot be trashed or deleted.
If keep-balance instructs keepstore to trash a block which is older than
BlobTrashLifetime is non-zero, the block will be moved to “trash”. A block which is in the trash is no longer accessible by read requests, but has not yet been permanently deleted. Blocks which are in the trash may be recovered using the “untrash” API endpoint. Blocks are permanently deleted after they have been in the trash for the duration in
Keep-balance is also responsible for balancing the distribution of blocks across keepstore servers by asking servers to pull blocks from other servers (as determined by their storage class and rendezvous hashing order). Pulling a block makes a copy. If a block is overreplicated (i.e. there are excess copies) after pulling, it will be subsequently trashed and deleted on the original server, subject to
By default, keep-balance operates periodically, i.e. do a scan/balance operation, sleep, repeat.
Collections.BalancePeriod value in
/etc/arvados/config.yml determines the interval between start times of successive scan/balance operations. If an operation takes longer than the
Collections.BalancePeriod, the next operation will follow it immediately. If SIGUSR1 is received during an idle period between operations, the next operation will start immediately.
Keep-balance can also be run with the
-once flag to do a single scan/balance operation and then exit. The exit code will be zero if the operation was successful.
For configuring resource usage tuning and lost block reporting, please see the
Collections.BalanceCollectionBuffers option in the default config.yml file.
Collections.BalanceTrashLimit configuration entries determine the maximum number of pull and trash operations keep-balance will attempt to apply on each keepstore server. If both values are zero, keep-balance will operate in “dry run” mode, where all changes are computed but none are committed.
Keep-balance does not attempt to discover whether committed pull and trash requests ever get carried out — only that they are accepted by the Keep services. If some services are full, new copies of under-replicated blocks might never get made, only repeatedly requested.
The content of this documentation is licensed under the
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.