Analyzing workflow cost (cloud only)

Note:

This tutorial assumes that you have access to Arvados command line tools, configured your API token, and confirmed a working environment.

Note:

Cost information is generally only available when Arvados runs in a cloud environment and arvados-dispatch-cloud is used to dispatch containers. The per node-hour price for each defined InstanceType must be supplied in config.yml.

The arv-cluster-activity program can be used to analyze cluster usage and cost over a time period.

Installation

The arv-cluster-activity tool can be installed from a distribution package or PyPI.

Option 1: Install from distribution packages

First, add the appropriate package repository for your distribution.

Install python3-arvados-cluster-activity

Red Hat, AlmaLinux, and Rocky Linux

# dnf install python3-arvados-cluster-activity

Debian and Ubuntu

# apt install python3-arvados-cluster-activity

Option 2: Install with pip

Run pip install "arvados-cluster-activity[prometheus]" in an appropriate installation environment, such as a virtualenv.

Note:

Support for fetching Prometheus metrics depends on Pandas and NumPy. If these dependencies pose a problem, you can install the cluster activity tool without Prometheus support by simply running pip install arvados-cluster-activity.

The Cluster Activity report uses the Arvados Python SDK, which uses pycurl, which depends on the libcurl C library. To build the module you may have to first install additional packages. On Debian-based distributions you can install them by running:

# apt install git build-essential python3-dev libcurl4-openssl-dev libssl-dev

Syntax

The arv-cluster-activity tool has a number of command line arguments:

~$ arv-cluster-activity --help
usage: arv-cluster-activity [-h] [--start START] [--end END] [--days DAYS] [--cost-report-file COST_REPORT_FILE] [--include-workflow-steps] [--columns COLUMNS] [--exclude EXCLUDE]
                            [--html-report-file HTML_REPORT_FILE] [--version] [--cluster CLUSTER] [--prometheus-auth PROMETHEUS_AUTH]

options:
  -h, --help            show this help message and exit
  --start START         Start date for the report in YYYY-MM-DD format (UTC) (or use --days)
  --end END             End date for the report in YYYY-MM-DD format (UTC), default "now"
  --days DAYS           Number of days before "end" to start the report (or use --start)
  --cost-report-file COST_REPORT_FILE
                        Export cost report to specified CSV file
  --include-workflow-steps
                        Include individual workflow steps (optional)
  --columns COLUMNS     Cost report columns (optional), must be comma separated with no spaces between column names. Available columns are:
                        Project, ProjectUUID, Workflow,
                        WorkflowUUID, Step, StepUUID, Sample, SampleUUID, User, UserUUID, Submitted, Started, Runtime, Cost
  --exclude EXCLUDE     Exclude workflows containing this substring (may be a regular expression)
  --html-report-file HTML_REPORT_FILE
                        Export HTML report to specified file
  --version             Print version and exit.
  --cluster CLUSTER     Cluster to query for prometheus stats
  --prometheus-auth PROMETHEUS_AUTH
                        Authorization file with prometheus info

Credentials

To access the Arvados host, the tool will read default credentials from ~/.config/aravdos/settings.conf or use the standard ARVADOS_API_HOST and ARVADOS_API_TOKEN environment variables.

The cluster report tool will also fetch metrics from Prometheus, if available. This can be passed in an environment file using --prometheus-auth, or set as environment variables.

PROMETHEUS_HOST=https://your.prometheus.server.example.com
PROMETHEUS_USER=admin
PROMETHEUS_PASSWORD=password

PROMETHEUS_USER and PROMETHEUS_PASSWORD will be passed in an Authorization header using HTTP Basic authentication.

Alternately, instead of PROMETHEUS_USER and PROMETHEUS_PASSWORD you can provide PROMETHEUS_APIKEY. This will be passed in as a Bearer token (Authorization: Bearer <APIKEY>).

Example usage

~$ arv-cluster-activity \
    --days 90
    --include-workflow-steps \
    --prometheus-auth prometheus.env \
    --cost-report-file report.csv \
    --html-report-file report.html
INFO:root:Exporting workflow runs 0 - 5
INFO:root:Getting workflow steps
INFO:root:Got workflow steps 0 - 2
INFO:root:Getting container hours time series
INFO:root:Getting data usage time series


Previous: Debugging workflows - shell access Next: Federated Multi-Cluster Workflows

The content of this documentation is licensed under the Creative Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the Apache License, Version 2.0.