Keep-balance deletes unreferenced and overreplicated blocks from Keep servers, makes additional copies of underreplicated blocks, and moves blocks into optimal locations as needed (e.g., after adding new servers). See Balancing Keep servers for usage details.
Keep-balance can be installed anywhere with network access to Keep services, arvados-controller, and PostgreSQL. Typically it runs on the same host as keepproxy.
A cluster should have only one instance of keep-balance running at a time.
If you are installing keep-balance on an existing system with valuable data, you can run keep-balance in “dry run” mode first and review its logs as a precaution. To do this, edit your keep-balance startup script to use the flags -commit-pulls=false -commit-trash=false -commit-confirmed-fields=false.
Edit the cluster config at config.yml and set Services.Keepbalance.InternalURLs.  This port is only used to publish metrics.
    Services:
      Keepbalance:
        InternalURLs:
          "http://keep.ClusterID.example.com:9005/": {}
Ensure your cluster configuration has Collections.BlobTrash: true (this is the default).
# arvados-server config-dump | grep BlobTrash:
      BlobTrash: true
If BlobTrash is false, unneeded blocks will be counted and logged by keep-balance, but they will not be deleted.
# yum install keep-balance
# apt-get install keep-balance
# systemctl enable --now keep-balance
# systemctl status keep-balance
[...]
If systemctl status indicates it is not running, use journalctl to check logs for errors:
# journalctl -n12 --unit keep-balance
The content of this documentation is licensed under the
Creative
  Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.