Install the Git server

Arvados allows users to create their own private and public git repositories, and clone/push them using SSH and HTTPS.

The git hosting setup involves three components.

  • The “arvados-git-sync.rb” script polls the API server for the current list of repositories, creates bare repositories, and updates the local permission cache used by gitolite.
  • Gitolite provides SSH access.
  • arvados-git-http provides HTTPS access.

It is not strictly necessary to deploy both SSH and HTTPS access, but we recommend deploying both:

  • SSH is a more appropriate way to authenticate from a user’s workstation because it does not require managing tokens on the client side;
  • HTTPS is a more appropriate way to authenticate from a shell VM because it does not depend on SSH agent forwarding (SSH clients’ agent forwarding features tend to behave as if the remote machine is fully trusted).
  • HTTPS is also used by Arvados Composer to access git repositories from the browser.

The HTTPS instructions given below will not work if you skip the SSH setup steps.

Set up DNS

By convention, we use the following hostname for the git service:

git.uuid_prefix.your.domain

Note:

Here, we show how to install the git hosting services on the same host as your API server. Using a different host is not yet fully supported. On this page we will refer to it as your git server.

DNS and network configuration should be set up so port 443 reaches your HTTPS proxy, and port 22 reaches the OpenSSH service on your git server.

Generate an API token

Use the following command to generate an API token.

Change webserver-user to the user that runs your web server process. If you install Phusion Passenger as we recommend, this is www-data on Debian-based systems, and nginx on Red Hat-based systems.

Using RVM:

gitserver:~$ cd /var/www/arvados-api/current
gitserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production `which rvm-exec` default bundle exec ./script/create_superuser_token.rb
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

Not using RVM:

gitserver:~$ cd /var/www/arvados-api/current
gitserver:/var/www/arvados-api/current$ sudo -u webserver-user RAILS_ENV=production bundle exec ./script/create_superuser_token.rb
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

Copy that token; you’ll need it in a minute.

Install git and other dependencies

On Debian-based systems:

gitserver:~$ sudo apt-get install git openssh-server

On Red Hat-based systems:

gitserver:~$ sudo yum install git perl-Data-Dumper openssh-server

Note:

The Arvados API and Git servers require Git 1.7.10 or later.

Create a “git” user and a storage directory

Gitolite and some additional scripts will be installed in /var/lib/arvados/git, which means hosted repository data will be stored in /var/lib/arvados/git/repositories. If you choose to install gitolite in a different location, make sure to update the git_repositories_dir entry in your API server’s application.yml file accordingly: for example, if you install gitolite at /data/gitolite then your git_repositories_dir will be /data/gitolite/repositories.

A new UNIX account called “git” will own the files. This makes git URLs look familiar to users (git@[...]:username/reponame.git).

On Debian- or Red Hat-based systems:

gitserver:~$ sudo mkdir -p /var/lib/arvados/git
gitserver:~$ sudo useradd --comment git --home-dir /var/lib/arvados/git git
gitserver:~$ sudo chown -R git:git ~git

The git user needs its own SSH key. (It must be able to run ssh git@localhost from scripts.)

gitserver:~$ sudo -u git -i bash
git@gitserver:~$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
git@gitserver:~$ cp .ssh/id_rsa.pub .ssh/authorized_keys
git@gitserver:~$ ssh -o stricthostkeychecking=no localhost cat .ssh/id_rsa.pub
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7aBIDAAgMQN16Pg6eHmvc+D+6TljwCGr4YGUBphSdVb25UyBCeAEgzqRiqy0IjQR2BLtSirXr+1SJAcQfBgI/jwR7FG+YIzJ4ND9JFEfcpq20FvWnMMQ6XD3y3xrZ1/h/RdBNwy4QCqjiXuxDpDB7VNP9/oeAzoATPZGhqjPfNS+RRVEQpC6BzZdsR+S838E53URguBOf9yrPwdHvosZn7VC0akeWQerHqaBIpSfDMtaM4+9s1Gdsz0iP85rtj/6U/K/XOuv2CZsuVZZ52nu3soHnEX2nx2IaXMS3L8Z+lfOXB2T6EaJgXF7Z9ME5K1tx9TSNTRcYCiKztXLNLSbp git@gitserver
git@gitserver:~$ rm .ssh/authorized_keys

Install gitolite

Check https://github.com/sitaramc/gitolite/tags for the latest stable version. This guide was tested with v3.6.4. Versions below 3.0 are missing some features needed by Arvados, and should not be used.

Download and install the version you selected.

git@gitserver:~$ echo 'PATH=$HOME/bin:$PATH' >.profile
git@gitserver:~$ source .profile
git@gitserver:~$ git clone --branch v3.6.4 https://github.com/sitaramc/gitolite
...
Note: checking out '5d24ae666bfd2fa9093d67c840eb8d686992083f'.
...
git@gitserver:~$ mkdir bin
git@gitserver:~$ gitolite/install -ln ~git/bin
git@gitserver:~$ bin/gitolite setup -pk .ssh/id_rsa.pub
Initialized empty Git repository in /var/lib/arvados/git/repositories/gitolite-admin.git/
Initialized empty Git repository in /var/lib/arvados/git/repositories/testing.git/
WARNING: /var/lib/arvados/git/.ssh/authorized_keys missing; creating a new one
    (this is normal on a brand new install)

If this didn’t go well, more detail about installing gitolite, and information about how it works, can be found on the gitolite home page.

Clone the gitolite-admin repository. The arvados-git-sync.rb script works by editing the files in this working directory and pushing them to gitolite. Here we make sure “git push” won’t produce any errors or warnings.

git@gitserver:~$ git clone git@localhost:gitolite-admin
Cloning into 'gitolite-admin'...
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 6 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (6/6), done.
Checking connectivity... done.
git@gitserver:~$ cd gitolite-admin
git@gitserver:~/gitolite-admin$ git config user.email arvados
git@gitserver:~/gitolite-admin$ git config user.name arvados
git@gitserver:~/gitolite-admin$ git config push.default simple
git@gitserver:~/gitolite-admin$ git push
Everything up-to-date

Configure gitolite

Configure gitolite to look up a repository name like username/reponame.git and find the appropriate bare repository storage directory.

Add the following lines to the top of ~git/.gitolite.rc:

my $repo_aliases;
my $aliases_src = "$ENV{HOME}/.gitolite/arvadosaliases.pl";
if ($ENV{HOME} && (-e $aliases_src)) {
    $repo_aliases = do $aliases_src;
}
$repo_aliases ||= {};

Add the following lines inside the section that begins %RC = (:

    REPO_ALIASES => $repo_aliases,

Inside that section, adjust the ‘UMASK’ setting to 022, to ensure the API server has permission to read repositories:

    UMASK => 022,

Uncomment the ‘Alias’ line in the section that begins ENABLE => [:

            # access a repo by another (possibly legacy) name
            'Alias',

Configure git synchronization

Create a configuration file /var/www/arvados-api/current/config/arvados-clients.yml using the following template, filling in the appropriate values for your system.

  • For arvados_api_token, use the token you generated above.
  • For gitolite_arvados_git_user_key, provide the public key you generated above, i.e., the contents of ~git/.ssh/id_rsa.pub.
production:
  gitolite_url: /var/lib/arvados/git/repositories/gitolite-admin.git
  gitolite_tmp: /var/lib/arvados/git
  arvados_api_host: uuid_prefix.example.com
  arvados_api_token: "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz"
  arvados_api_host_insecure: false
  gitolite_arvados_git_user_key: "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7aBIDAAgMQN16Pg6eHmvc+D+6TljwCGr4YGUBphSdVb25UyBCeAEgzqRiqy0IjQR2BLtSirXr+1SJAcQfBgI/jwR7FG+YIzJ4ND9JFEfcpq20FvWnMMQ6XD3y3xrZ1/h/RdBNwy4QCqjiXuxDpDB7VNP9/oeAzoATPZGhqjPfNS+RRVEQpC6BzZdsR+S838E53URguBOf9yrPwdHvosZn7VC0akeWQerHqaBIpSfDMtaM4+9s1Gdsz0iP85rtj/6U/K/XOuv2CZsuVZZ52nu3soHnEX2nx2IaXMS3L8Z+lfOXB2T6EaJgXF7Z9ME5K1tx9TSNTRcYCiKztXLNLSbp git@gitserver"

Enable the synchronization script

The API server package includes a script that retrieves the current set of repository names and permissions from the API, writes them to arvadosaliases.pl in a format usable by gitolite, and triggers gitolite hooks which create new empty repositories if needed. This script should run every 2 to 5 minutes.

If you are using RVM, create /etc/cron.d/arvados-git-sync with the following content:

*/5 * * * * git cd /var/www/arvados-api/current && /usr/local/rvm/bin/rvm-exec default bundle exec script/arvados-git-sync.rb production

Otherwise, create /etc/cron.d/arvados-git-sync with the following content:

*/5 * * * * git cd /var/www/arvados-api/current && bundle exec script/arvados-git-sync.rb production

Configure the API server to advertise the correct SSH URLs

In your API server’s application.yml file, add the following entry:

git_repo_ssh_base: "git@git.uuid_prefix.your.domain:"

Make sure to include the trailing colon.

Install the arvados-git-httpd package

This is needed only for HTTPS access.

The arvados-git-httpd package provides HTTP access, using Arvados authentication tokens instead of passwords. It is intended to be installed on the system where your git repositories are stored, and accessed through a web proxy that provides SSL support.

On Debian-based systems:

~$ sudo apt-get install git arvados-git-httpd

On Red Hat-based systems:

~$ sudo yum install git arvados-git-httpd
~$ sudo systemctl enable arvados-git-httpd

Verify that arvados-git-httpd and git-http-backend can be run:

~$ arvados-git-httpd -h
[...]
Usage: arvados-git-httpd [-config path/to/arvados/git-httpd.yml]
[...]
~$ git http-backend
Status: 500 Internal Server Error
Expires: Fri, 01 Jan 1980 00:00:00 GMT
Pragma: no-cache
Cache-Control: no-cache, max-age=0, must-revalidate

fatal: No REQUEST_METHOD from server

Enable arvados-git-httpd

Note:

The arvados-git-httpd package includes configuration files for systemd. If you’re using a different init system, you’ll need to configure a service to start and stop an arvados-git-httpd process as desired.

Create the configuration file /etc/arvados/git-httpd/git-httpd.yml. Run arvados-git-httpd -h to learn more about configuration entries.

Client:
  APIHost: uuid_prefix.your.domain
  Insecure: false
GitCommand: /var/lib/arvados/git/gitolite/src/gitolite-shell
GitoliteHome: /var/lib/arvados/git
Listen: :9001
RepoRoot: /var/lib/arvados/git/repositories

Restart the systemd service to ensure the new configuration is used.

~$ sudo systemctl restart arvados-git-httpd

Set up a reverse proxy to provide SSL service

The arvados-git-httpd service will be accessible from anywhere on the internet, so we recommend using SSL.

This is best achieved by putting a reverse proxy with SSL support in front of arvados-git-httpd, running on port 443 and passing requests to arvados-git-httpd on port 9001 (or whichever port you used in your run script).

Add the following configuration to the http section of your Nginx configuration:


upstream arvados-git-httpd {
  server                  127.0.0.1:9001;
}
server {
  listen                  [your public IP address]:443 ssl;
  server_name             git.uuid_prefix.your.domain;
  proxy_connect_timeout   90s;
  proxy_read_timeout      300s;

  ssl on;
  ssl_certificate         /YOUR/PATH/TO/cert.pem;
  ssl_certificate_key     /YOUR/PATH/TO/cert.key;

  # The server needs to accept potentially large refpacks from push clients.
  client_max_body_size 50m;

  location  / {
    proxy_pass            http://arvados-git-httpd;
  }
}

Configure the API server to advertise the correct HTTPS URLs

In your API server’s application.yml file, add the following entry:

git_repo_https_base: https://git.uuid_prefix.your.domain/

Make sure to include the trailing slash.

Restart Nginx

Restart Nginx to make the Nginx and API server configuration changes take effect.

gitserver:~$ sudo nginx -s reload

Clone Arvados repository

Here we create a repository object which will be used to set up a hosted clone of the arvados repository on this cluster.

~$ uuid_prefix=`arv --format=uuid user current | cut -d- -f1`
~$ echo "Site prefix is '$uuid_prefix'"
~$ all_users_group_uuid="$uuid_prefix-j7d0g-fffffffffffffff"
~$ repo_uuid=`arv --format=uuid repository create --repository "{\"owner_uuid\":\"$uuid_prefix-tpzed-000000000000000\", \"name\":\"arvados\"}"`
~$ echo "Arvados repository uuid is '$repo_uuid'"

Create a link object to make the repository object readable by the “All users” group, and therefore by every active user. This makes it possible for users to run the bundled Crunch scripts by specifying "script_version":"master","repository":"arvados" rather than pulling the Arvados source tree into their own repositories.

~$ read -rd $'\000' newlink <<EOF; arv link create --link "$newlink"
{
 "tail_uuid":"$all_users_group_uuid",
 "head_uuid":"$repo_uuid",
 "link_class":"permission",
 "name":"can_read"
}
EOF

In a couple of minutes, your arvados-git-sync cron job will create an empty repository on your git server. Seed it with the real arvados repository. If your git credential helpers were configured correctly when you set up your shell server, the “git push” command will use your API token instead of prompting you for a username and password.

~$ cd /tmp
/tmp$ git clone --bare https://github.com/curoverse/arvados.git
/tmp git --git-dir arvados.git push https://git.uuid_prefix.your.domain/arvados.git '*:*'

If you did not set up a HTTPS service, you can push to git@git.uuid_prefix.your.domain:arvados.git using your SSH key, or by logging in to your git server and using sudo.

gitserver:~$ sudo -u git -i bash
git@gitserver:~$ git clone --bare https://github.com/curoverse/arvados.git /tmp/arvados.git
git@gitserver:~$ cd /tmp/arvados.git
git@gitserver:/tmp/arvados.git$ gitolite push /var/lib/arvados/git/repositories/your_arvados_repo_uuid.git '*:*'

Previous: Install a shell server Next: Containers API SLURM prerequisites

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.