Synchronizing from external sources

The arvados-sync-users and arvados-sync-groups tools allow to manage Arvados users & groups from external sources.

These tools are designed to be run periodically reading a file created by a remote auth system (ie: LDAP) dump script, applying what’s included on the file as the source of truth.

NOTE: Both tools need to perform several administrative tasks on Arvados, so must be run using a superuser token via ARVADOS_API_HOST and ARVADOS_API_TOKEN environment variables or ~/.config/arvados/settings.conf file.

Using arvados-sync-users

This tool reads a CSV (comma-separated values) file having information about user accounts and their expected state on Arvados.

Every line on the file should have 5 fields:

  1. A user identifier: it could be an email address (default) or a username.
  2. The user’s first name.
  3. The user’s last name.
  4. The intended user’s active state.
  5. The intended user’s admin state: will always be read as false when active=false.

The last 2 fields should be represented as true/false, yes/no, or 1/0 values.

Options

The following command line options are supported:

Option Description
--help This list of options
--case-insensitive Uses case-insensitive username matching
--deactivate-unlisted Deactivate users that aren’t listed on the input file. (Current & system users won’t be affected)
--user-id Identifier to use in looking up user. One of ‘email’ or ‘username’ (Default: ‘email’)
--verbose Log informational messages
--version Print version and exit

The tool will create users when needed, and update those existing records to match the desired state described by the fields on the CSV file.
System users like the root and anonymous are unaffected by this tool.
In the case of a LoginCluster federation, this tool should be run on the cluster that manages the user accounts, and will fail otherwise.

Example

To sync users using the username to identify every account, reading from some external_users.csv file and deactivating existing users that aren’t included in it, the command should be called as follows:

~$ arvados-sync-users --deactivate-unlisted --user-id username /path/to/external_users.csv 

Using arvados-sync-groups

This tool reads a CSV (comma-separated values) file having information about external groups and their members. When running it for the first time, it’ll create a special group named ‘Externally synchronized groups’ meant to be the parent of all the remote groups.

Every line on the file should have 3 values: a group name, a local user identifier and a permission level, meaning that the named user is a member of the group with the provided permission. The tool will create the group if it doesn’t exist, and add the user to it. If any group member is not present on the input file, it will be removed from the group.

Users can be identified by their email address or username: the tool will check if every user exist on the system, and report back when not found. Groups on the other hand, are identified by their name.

Permission level can be one of the following: can_read, can_write or can_manage, giving the group member read, read/write or managing privileges on the group. For backwards compatibility purposes, if any record omits the third (permission) field, it will default to can_write permission. You can read more about permissions on the group management admin guide.

When using arvados-sync-groups, consider setting Users.CanCreateRoleGroups: false in your cluster configuration to prevent users from creating additional groups.

Options

The following command line options are supported:

Option Description
--help This list of options
--case-insensitive Uses case-insensitive username matching
--parent-group-uuid UUID of group to own all the externally synchronized groups
--user-id Identifier to use in looking up user. One of ‘email’ or ‘username’ (Default: ‘email’)
--verbose Log informational messages (Default: False)
--version Print version and exit

Examples

To sync groups using the username to identify every account, reading from some external_groups.csv file, the command should be called as follows:

~$ arvados-sync-groups --user-id username /path/to/external_groups.csv 

If you want to use a specific preexisting group as the parent of all the remote groups, you can do it this way:

~$ arvados-sync-groups --parent-group-uuid <preexisting group UUID> --user-id username /path/to/external_groups.csv 

Previous: Changing upstream login providers Next: Securing API access with scoped tokens

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.