Analyzing workflow cost (cloud only)

Note:

This tutorial assumes that you have access to Arvados command line tools, configured your API token, and confirmed a working environment.

Note:

Cost information is generally only available when Arvados runs in a cloud environment and arvados-dispatch-cloud is used to dispatch containers. The per node-hour price for each defined InstanceType must be supplied in config.yml.

The arv-cluster-activity program can be used to analyze cluster usage and cost over a time period.

Installation

The arv-cluster-activity tool can be installed from a distribution package or PyPI.

Option 1: Install from distribution packages

First, add the appropriate package repository for your distribution.

Install python3-arvados-cluster-activity

Red Hat, AlmaLinux, and Rocky Linux

# dnf install python3-arvados-cluster-activity

Debian and Ubuntu

# apt install python3-arvados-cluster-activity

Option 2: Install with pip

Run pip install arvados-cluster-activity[prometheus] in an appropriate installation environment, such as a virtualenv.

Note:

Support for fetching Prometheus metrics depends on Pandas and NumPy. If these dependencies pose a problem, you can install the cluster activity tool without Prometheus support by omitting it from pip install.

The Cluster Activity report uses the Arvados Python SDK, which uses pycurl, which depends on the libcurl C library. To build the module you may have to first install additional packages. On Debian-based distributions you can install them by running:

# apt install git build-essential python3-dev libcurl4-openssl-dev libssl-dev

Syntax

The arv-cluster-activity tool has a number of command line arguments:

~$ arv-cluster-activity --help
usage: arv-cluster-activity [-h] [--start START] [--end END] [--days DAYS] [--cost-report-file COST_REPORT_FILE] [--include-workflow-steps] [--columns COLUMNS] [--exclude EXCLUDE]
                            [--html-report-file HTML_REPORT_FILE] [--version] [--cluster CLUSTER] [--prometheus-auth PROMETHEUS_AUTH]

options:
  -h, --help            show this help message and exit
  --start START         Start date for the report in YYYY-MM-DD format (UTC) (or use --days)
  --end END             End date for the report in YYYY-MM-DD format (UTC), default "now"
  --days DAYS           Number of days before "end" to start the report (or use --start)
  --cost-report-file COST_REPORT_FILE
                        Export cost report to specified CSV file
  --include-workflow-steps
                        Include individual workflow steps (optional)
  --columns COLUMNS     Cost report columns (optional), must be comma separated with no spaces between column names. Available columns are:
                        Project, ProjectUUID, Workflow,
                        WorkflowUUID, Step, StepUUID, Sample, SampleUUID, User, UserUUID, Submitted, Started, Runtime, Cost
  --exclude EXCLUDE     Exclude workflows containing this substring (may be a regular expression)
  --html-report-file HTML_REPORT_FILE
                        Export HTML report to specified file
  --version             Print version and exit.
  --cluster CLUSTER     Cluster to query for prometheus stats
  --prometheus-auth PROMETHEUS_AUTH
                        Authorization file with prometheus info

Credentials

To access the Arvados host, the tool will read default credentials from ~/.config/aravdos/settings.conf or use the standard ARVADOS_API_HOST and ARVADOS_API_TOKEN environment variables.

The cluster report tool will also fetch metrics from Prometheus, if available. This can be passed in an environment file using --prometheus-auth, or set as environment variables.

PROMETHEUS_HOST=https://your.prometheus.server.example.com
PROMETHEUS_USER=admin
PROMETHEUS_PASSWORD=password

PROMETHEUS_USER and PROMETHEUS_PASSWORD will be passed in an Authorization header using HTTP Basic authentication.

Alternately, instead of PROMETHEUS_USER and PROMETHEUS_PASSWORD you can provide PROMETHEUS_APIKEY. This will be passed in as a Bearer token (Authorization: Bearer <APIKEY>).

Example usage

~$ arv-cluster-activity \
    --days 90
    --include-workflow-steps \
    --prometheus-auth prometheus.env \
    --cost-report-file report.csv \
    --html-report-file report.html
INFO:root:Exporting workflow runs 0 - 5
INFO:root:Getting workflow steps
INFO:root:Got workflow steps 0 - 2
INFO:root:Getting container hours time series
INFO:root:Getting data usage time series


Previous: Debugging workflows - shell access Next: Federated Multi-Cluster Workflows

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.