Keep-balance deletes unreferenced and overreplicated blocks from Keep servers, makes additional copies of underreplicated blocks, and moves blocks into optimal locations as needed (e.g., after adding new servers).
If you are installing keep-balance on an existing system with valuable data, you can run keep-balance in “dry run” mode first and review its logs as a precaution. To do this, edit your keep-balance startup script to use the flags
Keep-balance can be installed anywhere with network access to Keep services. Typically it runs on the same host as keepproxy.
A cluster should have only one keep-balance process running at a time.
On Debian-based systems:
~$ sudo apt-get install keep-balance
On Red Hat-based systems:
~$ sudo yum install keep-balance
keep-balance is functional:
~$ keep-balance -h ... Usage: keep-balance [options] Options: -commit-pulls send pull requests (make more replicas of blocks that are underreplicated or are not in optimal rendezvous probe order) -commit-trash send trash requests (delete unreferenced old blocks, and excess replicas of overreplicated blocks) ...
Create an Arvados superuser token for use by keep-balance.
On the API server, use the following commands:
~$ cd /var/www/arvados-api/current $ sudo -u webserver-user RAILS_ENV=production bundle exec script/create_superuser_token.rb zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
On each node that runs keepstore, save the token you generated in the previous step in a text file like
/etc/arvados/keepstore/system-auth-token.txt and then create or update
/etc/arvados/keepstore/keepstore.yml with the following key:
Restart all keepstore services to apply the updated configuration.
On the host running keep-balance, create
/etc/arvados/keep-balance/keep-balance.yml using the token you generated above. Follow this YAML format:
Listen: :9005 Client: APIHost: uuid_prefix.your.domain:443 AuthToken: zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz KeepServiceTypes: - disk ManagementToken: xyzzy RunPeriod: 10m CollectionBatchSize: 100000 CollectionBuffers: 1000 LostBlocksFile: /tmp/keep-balance-lost-blocks.txt # If given, this file will be updated atomically during each successful run.
If your API server’s SSL certificate is not signed by a recognized CA, add the
Insecure option to the
Client: Insecure: true APIHost: ...
If your system does not use systemd, skip this section and follow the runit instructions instead.
If your system uses systemd, the keep-balance service should already be set up. Start it and check its status:
~$ sudo systemctl restart keep-balance ~$ sudo systemctl status keep-balance ● keep-balance.service - Arvados Keep Balance Loaded: loaded (/lib/systemd/system/keep-balance.service; enabled) Active: active (running) since Sat 2017-02-14 18:46:01 UTC; 3 days ago Docs: https://doc.arvados.org/ Main PID: 541 (keep-balance) CGroup: /system.slice/keep-balance.service └─541 /usr/bin/keep-balance -commit-pulls -commit-trash Feb 14 18:46:01 zzzzz.arvadosapi.com keep-balance: 2017/02/14 18:46:01 starting up: will scan every 10m0s and on SIGUSR1 Feb 14 18:56:01 zzzzz.arvadosapi.com keep-balance: 2017/02/14 18:56:01 Run: start Feb 14 18:56:01 zzzzz.arvadosapi.com keep-balance: 2017/02/14 18:56:01 skipping zzzzz-bi6l4-rbtrws2jxul6i4t with service type "proxy" Feb 14 18:56:01 zzzzz.arvadosapi.com keep-balance: 2017/02/14 18:56:01 clearing existing trash lists, in case the new rendezvous order differs from previous run
Install runit to supervise the keep-balance daemon.
On Debian-based systems:
~$ sudo apt-get install runit
On Red Hat-based systems:
~$ sudo yum install runit
Create a supervised service.
~$ sudo mkdir /etc/service/keep-balance ~$ cd /etc/service/keep-balance ~$ sudo mkdir log log/main ~$ printf '#!/bin/sh\nexec keep-balance -commit-pulls -commit-trash 2>&1\n' | sudo tee run ~$ printf '#!/bin/sh\nexec svlogd main\n' | sudo tee log/run ~$ sudo chmod +x run log/run ~$ sudo sv exit . ~$ cd -
sv stat and check the log file to verify the service is running.
~$ sudo sv stat /etc/service/keep-balance run: /etc/service/keep-balance: (pid 12520) 2s; run: log: (pid 12519) 2s ~$ tail /etc/service/keep-balance/log/main/current 2017/02/14 18:46:01 starting up: will scan every 10m0s and on SIGUSR1 2017/02/14 18:56:01 Run: start 2017/02/14 18:56:01 skipping zzzzz-bi6l4-rbtrws2jxul6i4t with service type "proxy" 2017/02/14 18:56:01 clearing existing trash lists, in case the new rendezvous order differs from previous run
Ensure your keepstore services have the “delete” operation enabled. If it is disabled (which is the default), unneeded blocks will be identified by keep-balance, but will never be deleted from the underlying storage devices.
-never-delete=false command line flag to your keepstore run script:
keepstore -never-delete=false -volume=...
The content of this documentation is licensed under the
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.