Web Distributed Authoring and Versioning is an IETF standard set of extensions to HTTP to manipulate and retrieve hierarchical web resources, similar to directories in a file system. Arvados supports accessing files in Keep using WebDAV.
Most major operating systems include built-in support for mounting WebDAV resources as network file systems, see user guide sections for Windows , macOS , Linux . WebDAV is also supported by various standalone storage browser applications such as Cyberduck and client libraries exist in many languages for programmatic access.
Keep-web provides read/write HTTP (WebDAV) access to files stored in Keep. It serves public data to anonymous and unauthenticated clients, and serves private data to clients that supply Arvados API tokens.
Supports WebDAV HTTP methods GET
, PUT
, DELETE
, PROPFIND
, COPY
, and MOVE
.
Does not support LOCK
or UNLOCK
. These methods will be accepted, but are no-ops.
Requests can be authenticated a variety of ways as described below in Authentication mechanisms . An unauthenticated request will return a 401 Unauthorized response with a WWW-Authenticate
header indicating support for RFC 7617 Basic Authentication .
Getting a listing from keep-web starting at the root path /
will return two folders, by_id
and users
.
The by_id
folder will return an empty listing. However, a path which starts with /by_id/ followed by a collection uuid, portable data hash, or project uuid will return the listing of that object.
The users
folder will return a listing of the users for whom the client has permission to read the “home” project of that user. Browsing an individual user will return the collections and projects directly owned by that user. Browsing those collections and projects return listings of the files, directories, collections, and subprojects they contain, and so forth.
In addition to the /by_id/
path prefix, the collection or project can be specified using a path prefix of /c=<uuid or pdh>/
or (if the cluster is properly configured) as a virtual host. This is described on Keep-web URLs
It is possible for a project or a filter group to appear as its own descendant in the by_id
and users
tree (a filter group may match itself, its own ancestor, another filter group that matches its ancestor, etc). When this happens, the descendant appears as an empty read-only directory. For example, if filter group f
matches its own parent p
:
/users/example/p/f
will show the filter group’s contents (matched projects and collections)./users/example/p/f/p
will appear as an empty directory./by_id/uuid_of_f/p
will show the parent project’s contents, including f
./by_id/uuid_of_f/p/f
will appear as an empty directory.A token can be provided in an Authorization header as a Bearer
token:
Authorization: Bearer o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
A token can also be provided with RFC 7617 Basic Authentication in this case, the payload is formatted as username:token
and encoded with base64. The username must be non-empty, but is ignored. In this example, the username is “user”:
Authorization: Basic dXNlcjpvMDdqNHB4N1JsSks0Q3VNWXA3QzBMRFQ0Q3pSMUoxcUJFNUF2bzdlQ2NVak9UaWt4Swo=
A base64-encoded token can be provided in a cookie named “api_token”:
Cookie: api_token=bzA3ajRweDdSbEpLNEN1TVlwN0MwTERUNEN6UjFKMXFCRTVBdm83ZUNjVWpPVGlreEs=
A token can be provided in an URL-encoded query string:
GET /foo/bar.txt?api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
A token can be provided in a URL-encoded path (as described in the previous section):
GET /t=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK/_/foo/bar.txt
A suitably encoded token can be provided in a POST body if the request has a content type of application/x-www-form-urlencoded or multipart/form-data:
POST /foo/bar.txt Content-Type: application/x-www-form-urlencoded [...] api_token=o07j4px7RlJK4CuMYp7C0LDT4CzR1J1qBE5Avo7eCcUjOTikxK
If a token is provided in a query string or in a POST request, the response is an HTTP 303 redirect to an equivalent GET request, with the token stripped from the query string and added to a cookie instead.
Keep-web returns a generic HTML index listing when a directory is requested with the GET method. It does not serve a default file like “index.html”. Directory listings are also returned for WebDAV PROPFIND requests.
Keep-web supports partial resource reads using the HTTP Range
header as specified in RFC 7233 .
Client-provided authorization tokens are ignored if the client does not provide a Host
header.
In order to use the query string or a POST form authorization mechanisms, the client must follow 303 redirects; the client must accept cookies with a 303 response and send those cookies when performing the redirect; and either the client or an intervening proxy must resolve a relative URL (“//host/path”) if given in a response Location header.
Normally, Keep-web accepts requests for multiple collections using the same host name, provided the client’s credentials are not being used. This provides insufficient XSS protection in an installation where the “anonymously accessible” data is not truly public, but merely protected by network topology.
In such cases — for example, a site which is not reachable from the internet, where some data is world-readable from Arvados’s perspective but is intended to be available only to users within the local network — the downstream proxy should configured to return 401 for all paths beginning with “/c=”.
Without the same-origin protection outlined above, a web page stored in collection X could execute JavaScript code that uses the current viewer’s credentials to download additional data from collection Y — data which is accessible to the current viewer, but not to the author of collection X — from the same origin (``https://collections.example.com/’’) and upload it to some other site chosen by the author of collection X.
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.