Keep-balance deletes unreferenced and overreplicated blocks from Keep servers, makes additional copies of underreplicated blocks, and moves blocks into optimal locations as needed (e.g., after adding new servers).
If you are installing keep-balance on an existing system with valuable data, you can run keep-balance in “dry run” mode first and review its logs as a precaution. To do this, edit your keep-balance startup script to use the flags -commit-pulls=false -commit-trash=false
.
Keep-balance can be installed anywhere with network access to Keep services. Typically it runs on the same host as keepproxy.
A cluster should have only one keep-balance process running at a time.
On Debian-based systems:
~$ sudo apt-get install keep-balance
On Red Hat-based systems:
~$ sudo yum install keep-balance
Verify that keep-balance
is functional:
~$ keep-balance -h
...
Usage: keep-balance [options]
Options:
-commit-pulls
send pull requests (make more replicas of blocks that are underreplicated or are not in optimal rendezvous probe order)
-commit-trash
send trash requests (delete unreferenced old blocks, and excess replicas of overreplicated blocks)
...
Create an Arvados superuser token for use by keep-balance. On the API server, run:
Liquid error: No such template ‘create_superuser_token’
On each node that runs keepstore, save the token you generated in the previous step in a text file like /etc/arvados/keepstore/system-auth-token.txt
and then create or update /etc/arvados/keepstore/keepstore.yml
with the following key:
SystemAuthTokenFile: /etc/arvados/keepstore/system-auth-token.txt
Restart all keepstore services to apply the updated configuration.
On the host running keep-balance, create /etc/arvados/keep-balance/keep-balance.yml
using the token you generated above. Follow this YAML format:
Client:
APIHost: uuid_prefix.your.domain:443
AuthToken: zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
KeepServiceTypes:
- disk
RunPeriod: 10m
CollectionBatchSize: 100000
CollectionBuffers: 1000
If your API server’s SSL certificate is not signed by a recognized CA, add the Insecure
option to the Client
section:
Client:
Insecure: true
APIHost: ...
If your system does not use systemd, skip this section and follow the runit instructions instead.
If your system uses systemd, the keep-balance service should already be set up. Start it and check its status:
~$ sudo systemctl restart keep-balance
~$ sudo systemctl status keep-balance
● keep-balance.service - Arvados Keep Balance
Loaded: loaded (/lib/systemd/system/keep-balance.service; enabled)
Active: active (running) since Sat 2017-02-14 18:46:01 UTC; 3 days ago
Docs: https://doc.arvados.org/
Main PID: 541 (keep-balance)
CGroup: /system.slice/keep-balance.service
└─541 /usr/bin/keep-balance -commit-pulls -commit-trash
Feb 14 18:46:01 zzzzz.arvadosapi.com keep-balance[541]: 2017/02/14 18:46:01 starting up: will scan every 10m0s and on SIGUSR1
Feb 14 18:56:01 zzzzz.arvadosapi.com keep-balance[541]: 2017/02/14 18:56:01 Run: start
Feb 14 18:56:01 zzzzz.arvadosapi.com keep-balance[541]: 2017/02/14 18:56:01 skipping zzzzz-bi6l4-rbtrws2jxul6i4t with service type "proxy"
Feb 14 18:56:01 zzzzz.arvadosapi.com keep-balance[541]: 2017/02/14 18:56:01 clearing existing trash lists, in case the new rendezvous order differs from previous run
Install runit to supervise the keep-balance daemon.
On Debian-based systems:
~$ sudo apt-get install runit
On Red Hat-based systems:
~$ sudo yum install runit
Create a supervised service.
~$ sudo mkdir /etc/service/keep-balance
~$ cd /etc/service/keep-balance
~$ sudo mkdir log log/main
~$ printf '#!/bin/sh\nexec keep-balance -commit-pulls -commit-trash 2>&1\n' | sudo tee run
~$ printf '#!/bin/sh\nexec svlogd main\n' | sudo tee log/run
~$ sudo chmod +x run log/run
~$ sudo sv exit .
~$ cd -
Use sv stat
and check the log file to verify the service is running.
~$ sudo sv stat /etc/service/keep-balance
run: /etc/service/keep-balance: (pid 12520) 2s; run: log: (pid 12519) 2s
~$ tail /etc/service/keep-balance/log/main/current
2017/02/14 18:46:01 starting up: will scan every 10m0s and on SIGUSR1
2017/02/14 18:56:01 Run: start
2017/02/14 18:56:01 skipping zzzzz-bi6l4-rbtrws2jxul6i4t with service type "proxy"
2017/02/14 18:56:01 clearing existing trash lists, in case the new rendezvous order differs from previous run
Ensure your keepstore services have the “delete” operation enabled. If it is disabled (which is the default), unneeded blocks will be identified by keep-balance, but will never be deleted from the underlying storage devices.
Add the -never-delete=false
command line flag to your keepstore run script:
keepstore -never-delete=false -volume=...
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.