The Keepproxy server is a gateway into your Keep storage. Unlike the Keepstore servers, which are only accessible on the local LAN, Keepproxy is suitable for clients located elsewhere on the internet. Specifically, in contrast to Keepstore:
By convention, we use the following hostname for the Keepproxy server:
Hostname |
---|
keep.ClusterID.example.com |
This hostname should resolve from anywhere on the internet.
Edit the cluster config at config.yml
and set Services.Keepproxy.ExternalURL
and Services.Keepproxy.InternalURLs
.
Services:
Keepproxy:
ExternalURL: https://keep.ClusterID.example.com
InternalURLs:
"http://localhost:25107": {}
Put a reverse proxy with SSL support in front of Keepproxy. Keepproxy itself runs on the port 25107 (or whatever is specified in Services.Keepproxy.InternalURL
) while the reverse proxy runs on port 443 and forwards requests to Keepproxy.
Use a text editor to create a new file /etc/nginx/conf.d/keepproxy.conf
with the following configuration. Options that need attention are marked in red.
upstream keepproxy {
server 127.0.0.1:25107;
}
server {
listen 443 ssl;
server_name keep.ClusterID.example.com;
proxy_connect_timeout 90s;
proxy_read_timeout 300s;
proxy_set_header X-Real-IP $remote_addr;
proxy_http_version 1.1;
proxy_request_buffering off;
proxy_max_temp_file_size 0;
ssl_certificate /YOUR/PATH/TO/cert.pem;
ssl_certificate_key /YOUR/PATH/TO/cert.key;
# Clients need to be able to upload blocks of data up to 64MiB in size.
client_max_body_size 64m;
location / {
proxy_pass http://keepproxy;
}
}
Note: if the Web uploader is failing to upload data and there are no logs from keepproxy, be sure to check the nginx proxy logs. In addition to “GET” and “PUT”, The nginx proxy must pass “OPTIONS” requests to keepproxy, which should respond with appropriate Cross-origin resource sharing headers. If the CORS headers are not present, brower security policy will cause the upload request to silently fail. The CORS headers are generated by keepproxy and should not be set in nginx.
# yum install keepproxy
# apt-get install keepproxy
# systemctl enable --now keepproxy
# systemctl status keepproxy
[...]
If systemctl status
indicates it is not running, use journalctl
to check logs for errors:
# journalctl -n12 --unit keepproxy
Make sure the cluster config file is up to date on the API server host then restart the API server and controller processes to ensure the configuration changes are visible to the whole cluster.
# systemctl restart nginx arvados-controller
# arvados-server check
We recommend using the Cluster diagnostics tool. Because Keepproxy is specifically a gateway used by outside clients, for this test you should run the diagnostics from a client machine outside the Arvados private network, and provide the -external-client
parameter.
Here are some other checks you can perform manually.
Log into a host that is on a network external to your private Arvados network. The host should be able to contact your keepproxy server (eg keep.ClusterID.example.com
), but not your keepstore servers (eg keep[0-9].ClusterID.example.com).
ARVADOS_API_HOST
and ARVADOS_API_TOKEN
must be set in the environment.
ARVADOS_API_HOST
should be the hostname of the API server.
ARVADOS_API_TOKEN
should be the system root token.
Install the Command line SDK
Check that the keepproxy server is in the keep_service
“accessible” list:
$ arv keep_service accessible
[...]
If keepstore does not show up in the “accessible” list, and you are accessing it from within the private network, check that you have properly configured the geo
block for the API server .
Install the Python SDK
You should now be able to use arv-put
to upload collections and arv-get
to fetch collections. Be sure to execute this from outside the cluster’s private network.
Liquid error: No such template ‘arv_put_example’
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.