WebDAV

Web Distributed Authoring and Versioning is an IETF standard set of extensions to HTTP to manipulate and retrieve hierarchical web resources, similar to directories in a file system. Arvados supports accessing files in Keep using WebDAV.

Most major operating systems include built-in support for mounting WebDAV resources as network file systems, see user guide sections for Windows , macOS , Linux . WebDAV is also supported by various standalone storage browser applications such as Cyberduck and client libraries exist in many languages for programmatic access.

Keep-web provides read/write HTTP (WebDAV) access to files stored in Keep. It serves public data to anonymous and unauthenticated clients, and serves private data to clients that supply Arvados API tokens.

Supported Operations

Supports WebDAV HTTP methods GET, PUT, DELETE, PROPFIND, COPY, and MOVE.

Does not support LOCK or UNLOCK. These methods will be accepted, but are no-ops.

Browsing

Requests can be authenticated a variety of ways as described below in Authentication mechanisms . An unauthenticated request will return a 401 Unauthorized response with a WWW-Authenticate header indicating support for RFC 7617 Basic Authentication .

Getting a listing from keep-web starting at the root path / will return two folders, by_id and users.

The by_id folder will return an empty listing. However, a path which starts with /by_id/ followed by a collection uuid, portable data hash, or project uuid will return the listing of that object.

The users folder will return a listing of the users for whom the client has permission to read the “home” project of that user. Browsing an individual user will return the collections and projects directly owned by that user. Browsing those collections and projects return listings of the files, directories, collections, and subprojects they contain, and so forth.

In addition to the /by_id/ path prefix, the collection or project can be specified using a path prefix of /c=<uuid or pdh>/ or (if the cluster is properly configured) as a virtual host. This is described on Keep-web URLs

It is possible for a project or a filter group to appear as its own descendant in the by_id and users tree (a filter group may match itself, its own ancestor, another filter group that matches its ancestor, etc). When this happens, the descendant appears as an empty read-only directory. For example, if filter group f matches its own parent p:

  • /users/example/p/f will show the filter group’s contents (matched projects and collections).
  • /users/example/p/f/p will appear as an empty directory.
  • /by_id/uuid_of_f/p will show the parent project’s contents, including f.
  • /by_id/uuid_of_f/p/f will appear as an empty directory.

Authentication mechanisms

A token can be provided in an Authorization header as a Bearer token:

Authorization: Bearer o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK

A token can also be provided with RFC 7617 Basic Authentication in this case, the payload is formatted as username:token and encoded with base64. The username must be non-empty, but is ignored. In this example, the username is “user”:

Authorization: Basic dXNlcjpvMDdqNHB4N1JsSks0Q3VNWXA3QzBMRFQ0Q3pSMUoxcUJFNUF2bzdlQ2NVak9UaWt4Swo=

A base64-encoded token can be provided in a cookie named “api_token”:

Cookie: api_token=bzA3ajRweDdSbEpLNEN1TVlwN0MwTERUNEN6UjFKMXFCRTVBdm83ZUNjVWpPVGlreEs=

A token can be provided in an URL-encoded query string:

GET /foo/bar.txt?api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK

A token can be provided in a URL-encoded path (as described in the previous section):

GET /t=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK/_/foo/bar.txt

A suitably encoded token can be provided in a POST body if the request has a content type of application/x-www-form-urlencoded or multipart/form-data:

POST /foo/bar.txt
Content-Type: application/x-www-form-urlencoded
[...]
api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK

If a token is provided in a query string or in a POST request, the response is an HTTP 303 redirect to an equivalent GET request, with the token stripped from the query string and added to a cookie instead.

Indexes

Keep-web returns a generic HTML index listing when a directory is requested with the GET method. It does not serve a default file like “index.html”. Directory listings are also returned for WebDAV PROPFIND requests.

Range requests

Keep-web supports partial resource reads using the HTTP Range header as specified in RFC 7233 .

Compatibility

Client-provided authorization tokens are ignored if the client does not provide a Host header.

In order to use the query string or a POST form authorization mechanisms, the client must follow 303 redirects; the client must accept cookies with a 303 response and send those cookies when performing the redirect; and either the client or an intervening proxy must resolve a relative URL (“//host/path”) if given in a response Location header.

Intranet mode

Normally, Keep-web accepts requests for multiple collections using the same host name, provided the client’s credentials are not being used. This provides insufficient XSS protection in an installation where the “anonymously accessible” data is not truly public, but merely protected by network topology.

In such cases — for example, a site which is not reachable from the internet, where some data is world-readable from Arvados’s perspective but is intended to be available only to users within the local network — the downstream proxy should configured to return 401 for all paths beginning with “/c=”.

Same-origin URLs

Without the same-origin protection outlined above, a web page stored in collection X could execute JavaScript code that uses the current viewer’s credentials to download additional data from collection Y — data which is accessible to the current viewer, but not to the author of collection X — from the same origin (``https://collections.example.com/’’) and upload it to some other site chosen by the author of collection X.


Previous: virtual_machines Next: S3 API

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.