Organizating data

Projects and Collections

In Arvados, files are organized into “collections”, and collections are organized by “project”.

Only collections can contain files. A collection is a distinct database record identified by a universal unique id (UUID). Arvados maintains a history of changes to the collection. Every collection version has an immutable identifier called a “portable data hash” which is computed from the file content of the collection. This can be used to refer to the immutable file content independently of the collection UUID. If two collections have the same portable data hash, they have the same file content.

Projects contain collections, workflows and workflow runs, and other projects (subprojects). Both collections and projects can have user-provided metadata.

Projects are the main unit of organization and sharing. See Sharing collections for information about sharing projects and collections with other users.

Creating a project

When you have navigated to any existing project, clicking on + NEW New project will prompt you to create a new subproject under the current project.

If you’re at the top-level Home Projects, a new top-level project will be created.

Alternatively, you can right-click on the link to an existing project to bring up a context menu, and select New project.

Sharing projects

Projects can be shared with other users on the Arvados cluster. First, locate the collection or project using any available means (for instance, by manually navigating in the Workbench, or using the Search bar). Then right-click on its link in a listing, or click on the triple-dot button in the details page. You will find the menu item Share, which opens the dialog box Sharing settings.

To share with other Arvados users, select the WITH USERS/GROUPS tab in the Sharing settings dialog box. Under Add people and groups, in the input field you can search for the user or group names. Select one you will be sharing with, choose the Authorization level (Read/Write/Manage) in the drop-down menu, and click on the plus sign (+) on the right. This can be repeated for other users or groups, each with their own Authorization level. The selected ones will appear under People with access. You can revisit the Sharing settings dialog box to modify the users or their access levels at a later time.

The General access drop-down menu controls the default sharing setting, with the following choices:

  • Private: This is the initial state when no users or groups have been selected for sharing. At any time, by setting General access to private, the current sharing setting will be cleared, and any users or groups formerly with access will lose that access.
  • Public: This means the list of People with access will include Anonymous users, even if they are not users of the current cluster. You can further set their access level in the Authorization level.
  • All users: This means sharing with other users who are logged in on the current cluster.
  • Shared: When you choose to share with specific people or groups, General access will be set to Shared. From this state, you can further specify the default sharing settings for Public and All users.

Previous: Checking your environment Next: Uploading data

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.