Running web services in Arvados containers

Introduction to service containers

Arvados 3.2.0 introduces service containers. If your container runs a web service, you can have Arvados expose that service at a dynamically assigned hostname+port combination. After you run a workflow in Arvados, you can run interactive data analysis and visualization tools as service containers to explore the results.

Services can have different access levels:

  • If you expose a public service, Arvados will let anyone connect to it.
  • If you expose a private service, Arvados will require the initial connection to provide an Arvados API token from a user who requested this container. Arvados will refuse any other connections.

Limitations

Unfortunately, not every interactive container can be run as an Arvados service container. Here are some limitations you should be aware of before you start. These limitations may be lifted in a future release of Arvados.

Service containers require the Arvados administrator to configure the hostname(s) and port(s) to use for service containers. If you’re not sure whether your Arvados cluster supports service containers, check with your administrator.

Arvados can only expose services that use plain HTTP in the container. Other protocols are currently not supported. For example, Jupyter notebooks require websockets to retrieve the results of running code. If you try to run a Jupyter notebook as an Arvados service container, you’ll be able to connect and start a notebook, but you won’t be able to run any code inside of it.

If you run a service container as part of a larger workflow, the workflow supervisor will wait for the service container to finish before it considers the workflow finished. This is okay if the service reports progress or provides debugging for a main process that takes a limited time, but you probably don’t want this if your service container waits for user input to quit or runs indefinitely. To run dedicated interactive services, consider starting them separately from your main workflow, and launching them with arvados-cwl-runner --local to avoid the overhead of a separate supervisor container.

Example service container: nginx web server

This page demonstrates different ways to launch and access a service container using the nginx web server as an example. This page assumes you are already familiar with basic data and workflow management in Arvados.

Running a service container

Users who want to start service containers on demand can do so using CWL. If you’re writing automation for Arvados, you can also submit service container requests directly through the Arvados API.

Running with CWL

This CWL runs nginx, exposes port 80 through Arvados, and takes one input with the directory of files to serve. Save this content as nginx.cwl:

#!/usr/bin/env cwl-runner
# nginx.cwl
cwlVersion: v1.2
$namespaces:
  arv: "http://arvados.org/cwl#"
  cwltool: "http://commonwl.org/cwltool#"

class: CommandLineTool
inputs:
  siteFiles:
    type: Directory
requirements:
  DockerRequirement:
    dockerPull: library/nginx:1.29
  InitialWorkDirRequirement:
    listing:
      - entry: "$(inputs.siteFiles)"
        entryname: /usr/share/nginx/html
hints:
  ResourceRequirement:
    # You may adjust these values as desired.
    coresMin: 2
    coresMax: 8
    ramMin: 128
    ramMax: 1024
  arv:PublishPorts:
    publishPorts:
      "80":
        serviceAccess: public
        label: Web Server
baseCommand: nginx
arguments:
  - "-g"
  - "daemon off;"
outputs: {}

Next, you’ll need a directory on your system that contains some HTML and supporting content you want to serve from the container. You can use any web content you like. If you don’t have anything handy, create a new directory and save this web page as index.html inside it:

<!-- Example index.html -->
<!doctype html>
<html lang=en>
  <head>
    <meta charset=utf-8>
    <title>Test Page</title>
  </head>
  <body>
    <p>Hello from inside an Arvados service workflow!</p>
  </body>
</html>

Write an input file nginx-in.yml that points siteFiles to the directory with your HTML content:

# nginx-in.yml
siteFiles:
  class: Directory
  path: "/home/you/arvados-nginx-site"

Launch this CWL with your input using arvados-cwl-runner:

$ arvados-cwl-runner --local nginx.cwl nginx-in.yml

The tool will upload all the dependencies to Arvados, then report:

INFO [container nginx.cwl] zzzzz-xvhdp-abcde12345fghij state is Committed

After Arvados launches the container, you’ll be able to connect to your web server.

Running with an Arvados API client

If you are comfortable writing your own Arvados tools, you can submit service container requests directly to the Arvados API. Before you start, you must make sure the container image and data you want to use is already in Arvados.

For this example, save the official nginx Docker image to Arvados using arv-keepdocker, then get the portable data hash of that container:

$ arv-keepdocker library/nginx 1.29
[…]
zzzzz-4zz18-wyfcjuvi7ankgp2
$ arv collection get --select='["portable_data_hash"]' --uuid=zzzzz-4zz18-wyfcjuvi7ankgp2
{
 "etag":"",
 "kind":"arvados#collection",
 "portable_data_hash":"23a275c1761b645edb84355a91702cee+219"
}

Next, create a collection that contains some HTML and supporting content you want to serve from the container. You can use any web content you like. If you don’t have anything handy, create a new directory and save this web page as index.html inside it:

<!-- Example index.html -->
<!doctype html>
<html lang=en>
  <head>
    <meta charset=utf-8>
    <title>Test Page</title>
  </head>
  <body>
    <p>Hello from inside an Arvados service container!</p>
  </body>
</html>

Create a collection from your content directory:

$ arv-put /home/you/arvados-nginx-site
[…]
zzzzz-4zz18-5jh66sx5f2xo6kd

Now you are ready to submit your container request. Below is a body you could use with either the command-line tool arv container_request create --container-request=… or an SDK like:

arv_client.container_requests().create(
    body={'container_request': ...}
).execute()
  • The value of container_image is the portable data hash of the nginx Docker image you uploaded.
  • In the mounts value for /usr/share/nginx/html, the value of uuid is the UUID of your site content collection. You could alternatively specify a portable_data_hash.
  • The runtime constraint API must be set true for the container to be accessible over the network. You can adjust the other runtime constraints as you like.
  • command, cwd, and output_path are based on the Docker image we’re using.
  • From state on, you can modify these fields as desired, or add others like owner_uuid.
{
  "container_image": "23a275c1761b645edb84355a91702cee+219",
  "service": true,
  "use_existing": false,
  "published_ports": {
    "80": {
      "access": "public",
      "label": "Web Server",
      "initial_path": ""
    }
  },
  "mounts": {
    "/usr/share/nginx/html": {
      "kind": "collection",
      "uuid": "zzzzz-4zz18-5jh66sx5f2xo6kd"
    },
    "/run/nginx.out": {
      "kind": "collection",
      "writable": true
    }
  },
  "runtime_constraints": {
    "API": true,
    "ram": 209715200,
    "vcpus": 2
  },

  "command": [
    "nginx",
    "-g",
    "daemon off;"
  ],
  "cwd": ".",
  "output_path": "/run/nginx.out",

  "state": "Committed",
  "name": "nginx server",
  "priority": 500
}

Connecting to a service container

After Arvados starts a service container, the container record includes the URL users should use to access the service(s). You can open the service URL through Workbench or retrieve it from the Arvados API.

Connecting through Workbench

When you view the process page for a running service container, a blue button appears next to the process name to connect to that service. If the container runs multiple services, you’ll be able to select the one you want to connect to from a pulldown.

Screenshot from the top of an Arvados Workbench process page showing a running nginx service container with a "Connect to web server" button.

Getting a connection URL from the Arvados API

This section will illustrate how to get this information from Arvados API records using the CLI tools. You can follow this same process to make analogous API calls with any SDK. After you submit your container request, get its corresponding container_uuid:

$ arv container_request get --select='["container_uuid","state"]' --uuid=zzzzz-xvhdp-abcde12345fghij
{
 "container_uuid":"zzzzz-dz642-y87pdppv4afp4da",
 "etag":"",
 "kind":"arvados#containerRequest",
 "state":"Committed"
}

If state is Committed but container_uuid is null, then your request has not been dispatched yet. Wait a little bit and try again. Once you have a container_uuid, request the published_ports field of that container record:

$ arv container get --select='["published_ports"]' --uuid=zzzzz-dz642-y87pdppv4afp4da
{
 "published_ports":{
  "80":{
   "label":"Web Server",
   "access":"public",
   "base_url":"https://zzzzz.arvados.example:8900/",
   "initial_url":"https://zzzzz.arvados.example:8900/",
   "initial_path":"",
   "external_port":8900
  }
 }
}

For each service, the initial_url provides the URL where you can connect to the corresponding service over HTTP. If access is private, you should add an arvados_api_token query parameter for a user who requested this container. For example, if the container’s published port above had "access":"private", the full URL to connect to it would look like:

https://zzzzz.arvados.example:8900/?arvados_api_token=v2/zzzzz-gj3su-y2tncmjag9gajm8/1234567890

Stopping a service container

Cancel a service container just like any other running container. You can use the red ⏹ Cancel button that appears next to the process name in Workbench or the ⏹ Cancel button in the action toolbar:

Screenshot from the top of an Arvados Workbench process page showing a running nginx service container with a "Cancel" button.

Or if you’re using the Arvados API directly, update your container request to set its priority to 0:

$ arv container_request update  --container-request='{"priority":0}' --uuid=zzzzz-xvhdp-abcde12345fghij

Further reading


Previous: Federated Multi-Cluster Workflows Next: Getting Started with CWL

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.