This tutorial assumes that you have access to Arvados command line tools, configured your API token, and confirmed a working environment.
Note: Starting from Arvados 2.7.2, these reports are generated automatically by arvados-cwl-runner
and can be found as usage_report.html
in a container request’s log collection.
The crunchstat-summary
tool can be used to analyze workflow and container performance. It can be installed from packages (apt install python3-crunchstat-summary
or dnf install python3-crunchstat-summary
), or in a Python virtualenv (pip install crunchstat_summary
). crunchstat-summary
analyzes the crunchstat lines from the logs of a container or workflow and generates a report in text or html format.
The crunchstat-summary
tool has a number of command line arguments:
~$ crunchstat-summary -h
usage: crunchstat-summary [-h]
[--job UUID | --container UUID | --pipeline-instance UUID | --log-file LOG_FILE]
[--skip-child-jobs] [--format {html,text}]
[--threads THREADS] [--verbose]
Summarize resource usage of an Arvados Crunch job
optional arguments:
-h, --help show this help message and exit
--job UUID, --container-request UUID
Look up the specified job or container request and
read its log data from Keep (or from the Arvados event
log, if the job is still running)
--container UUID [Deprecated] Look up the specified container find its
container request and read its log data from Keep (or
from the Arvados event log, if the job is still
running)
--pipeline-instance UUID
[Deprecated] Summarize each component of the given
pipeline instance (historical pre-1.4)
--log-file LOG_FILE Read log data from a regular file
--skip-child-jobs Do not include stats from child jobs/containers
--format {html,text} Report format
--threads THREADS Maximum worker threads to run
--verbose, -v Log more information (once for progress, twice for
debug)
When crunchstat-summary
is given a container or container request uuid for a toplevel workflow runner container, it will generate a report for the whole workflow. If the workflow is big, it can take a long time to generate the report.
crunchstat-summary
prints to stdout. The html report, in particular, should be redirected to a file and then loaded in a browser.
The html report can be generated as follows:
~$ crunchstat-summary --container-request pirca-xvhdp-rs0ef250emtmbj8 --format html > report.html
When loaded in a browser:
Using --format text
will print detailed usage and summary:
~$ crunchstat-summary --container-request pirca-xvhdp-rs0ef250emtmbj8 --format text
category metric task_max task_max_rate job_total
blkio:0:0 read 63067755822 53687091.20 63067755822
blkio:0:0 write 64484253320 16376234.80 64484253320
cpu cpus 16 - -
cpu sys 2147.29 0.60 2147.29
cpu user 549046.22 15.99 549046.22
cpu user+sys 551193.51 16.00 551193.51
fuseop:create count 1 0.10 1
fuseop:create time 0.01 0.00 0.01
fuseop:destroy count 0 0 0
fuseop:destroy time 0 0 0.00
fuseop:flush count 12 0.70 12
fuseop:flush time 0.00 0.00 0.00
fuseop:forget count 0 0 0
fuseop:forget time 0 0 0.00
fuseop:getattr count 40 2.70 40
fuseop:getattr time 0.00 0.00 0.00
fuseop:lookup count 36 2.90 36
fuseop:lookup time 0.67 0.07 0.67
fuseop:mkdir count 0 0 0
fuseop:mkdir time 0 0 0.00
fuseop:on_event count 0 0 0
fuseop:on_event time 0 0 0.00
fuseop:open count 9 0.30 9
fuseop:open time 0.00 0.00 0.00
fuseop:opendir count 0 0 0
fuseop:opendir time 0 0 0.00
fuseop:read count 481185 409.60 481185
fuseop:read time 370.11 2.14 370.11
fuseop:readdir count 0 0 0
fuseop:readdir time 0 0 0.00
fuseop:release count 7 0.30 7
fuseop:release time 0.00 0.00 0.00
fuseop:rename count 0 0 0
fuseop:rename time 0 0 0.00
fuseop:rmdir count 0 0 0
fuseop:rmdir time 0 0 0.00
fuseop:setattr count 0 0 0
fuseop:setattr time 0 0 0.00
fuseop:statfs count 0 0 0
fuseop:statfs time 0 0 0.00
fuseop:unlink count 0 0 0
fuseop:unlink time 0 0 0.00
fuseop:write count 5414406 1123.00 5414406
fuseop:write time 475.04 0.11 475.04
fuseops read 481185 409.60 481185
fuseops write 5414406 1123.00 5414406
keepcache hit 961402 819.20 961402
keepcache miss 946 0.90 946
keepcalls get 962348 820.00 962348
keepcalls put 961 0.30 961
mem cache 22748987392 - -
mem pgmajfault 0 - 0
mem rss 27185491968 - -
net:docker0 rx 0 - 0
net:docker0 tx 0 - 0
net:docker0 tx+rx 0 - 0
net:ens5 rx 1100398604 - 1100398604
net:ens5 tx 1445464 - 1445464
net:ens5 tx+rx 1101844068 - 1101844068
net:keep0 rx 63086467386 53687091.20 63086467386
net:keep0 tx 64482237590 20131128.60 64482237590
net:keep0 tx+rx 127568704976 53687091.20 127568704976
statfs available 398721179648 - 398721179648
statfs total 400289181696 - 400289181696
statfs used 1568198656 0 1568002048
time elapsed 34820 - 34820
# Elapsed time: 9h 40m 20s
# Assigned instance type: m5.4xlarge
# Instance hourly price: $0.768
# Max CPU usage in a single interval: 1599.52%
# Overall CPU usage: 1582.98%
# Requested CPU cores: 16
# Instance VCPUs: 16
# Max memory used: 25926.11MB
# Requested RAM: 50000.00MB
# Maximum RAM request for this instance type: 61736.70MB
# Max network traffic: 127.57GB
# Max network speed in a single interval: 53.69MB/s
# Keep cache miss rate: 0.10%
# Keep cache utilization: 99.97%
# Temp disk utilization: 0.39%
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.