Balancing Keep servers

This page describes how to balance keepstore servers using keep-balance. Keep-balance creates new copies of under-replicated blocks, deletes excess copies of over-replicated and unreferenced blocks, and moves blocks to better positions (e.g. after adding new keepstore servers) so clients find them faster.

See the Keep-balance install docs for installation instructions.

Data deletion

The keep-balance service determines which blocks are candidates for deletion and instructs the keepstore to move those blocks to the trash. When a block is newly written, it is protected from deletion for the duration in BlobSigningTTL. During this time, it cannot be trashed or deleted.

If keep-balance instructs keepstore to trash a block which is older than BlobSigningTTL, and BlobTrashLifetime is non-zero, the block will be moved to “trash”. A block which is in the trash is no longer accessible by read requests, but has not yet been permanently deleted. Blocks which are in the trash may be recovered using the “untrash” API endpoint. Blocks are permanently deleted after they have been in the trash for the duration in BlobTrashLifetime.

Keep-balance is also responsible for balancing the distribution of blocks across keepstore servers by asking servers to pull blocks from other servers (as determined by their storage class and rendezvous hashing order). Pulling a block makes a copy. If a block is overreplicated (i.e. there are excess copies) after pulling, it will be subsequently trashed and deleted on the original server, subject to BlobTrash and BlobTrashLifetime settings.

Scanning

By default, keep-balance operates periodically, i.e. do a scan/balance operation, sleep, repeat.

The Collections.BalancePeriod value in /etc/arvados/config.yml determines the interval between start times of successive scan/balance operations. If an operation takes longer than the Collections.BalancePeriod, the next operation will follow it immediately. If SIGUSR1 is received during an idle period between operations, the next operation will start immediately.

Keep-balance can also be run with the -once flag to do a single scan/balance operation and then exit. The exit code will be zero if the operation was successful.

Additional configuration

For configuring resource usage tuning and lost block reporting, please see the Collections.BlobMissingReport, Collections.BalanceCollectionBatch, Collections.BalanceCollectionBuffers option in the default config.yml file.

The Collections.BalancePullLimit and Collections.BalanceTrashLimit configuration entries determine the maximum number of pull and trash operations keep-balance will attempt to apply on each keepstore server. If both values are zero, keep-balance will operate in “dry run” mode, where all changes are computed but none are committed.

Limitations

Keep-balance does not attempt to discover whether committed pull and trash requests ever get carried out — only that they are accepted by the Keep services. If some services are full, new copies of under-replicated blocks might never get made, only repeatedly requested.


Previous: Restricting upload or download Next: Preventing container reuse

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.