Arvados system services (other than the Rails API server) have an option to provide live profiling data on an HTTP endpoint. This can be analyzed with the go tool pprof program from the Go runtime to help identify memory and CPU usage issues. The go tool pprof program can either connect directly to the profiling endpoint, or read a snapshot from disk.
Enable profiling by choosing a listening address and adding -pprof <address> to the EXTRA_OPTS environment variable in the systemd service.
Example:
$ sudo systemctl edit keep-balance
### Editing /etc/systemd/system/keep-balance.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file
[Service]
Environment="EXTRA_OPTS=-pprof 127.0.0.1:3333"
### Lines below this comment will be discarded
[...]
Restart the service.
$ sudo systemctl restart keep-balance
Save a snapshot of the program’s active memory usage after garbage collection.
$ curl 'http://localhost:3333/debug/pprof/heap?gc=1' > /tmp/pprof.gz
The following analysis steps can be done on the server itself or on a different machine.
To get the go tool pprof command, install the Go runtime from OS packages or from the Go download page.
$ sudo apt install golang
Run the go tool pprof command to summarize the snapshot.
$ go tool pprof -top /tmp/pprof.gz
File: keep-balance
Build ID: edd0405c97f4235473dba21b7c7fd52c8f755cde
Type: inuse_space
Time: Nov 3, 2025 at 11:12am (EST)
Showing nodes accounting for 443.71MB, 98.02% of 452.67MB total
Dropped 35 nodes (cum <= 2.26MB)
flat flat% sum% cum cum%
217.54MB 48.06% 48.06% 217.54MB 48.06% git.arvados.org/arvados.git/services/keep-balance.(*BlockState).increaseDesired (inline)
85.46MB 18.88% 66.94% 85.46MB 18.88% git.arvados.org/arvados.git/services/keep-balance.(*BlockStateMap).get (inline)
66.07MB 14.60% 81.53% 76.58MB 16.92% git.arvados.org/arvados.git/sdk/go/arvados.(*Collection).SizedDigests
19.87MB 4.39% 85.92% 20.87MB 4.61% github.com/lib/pq.textDecode
13MB 2.87% 88.79% 13MB 2.87% git.arvados.org/arvados.git/services/keep-balance.(*BlockState).addReplica (inline)
10.51MB 2.32% 91.12% 10.51MB 2.32% bytes.genSplit
10.25MB 2.26% 93.38% 10.25MB 2.26% github.com/lib/pq.(*conn).recvMessage
7.50MB 1.66% 95.04% 53.63MB 11.85% git.arvados.org/arvados.git/services/keep-balance.EachCollection
5MB 1.10% 96.14% 13.50MB 2.98% encoding/json.Unmarshal
4.50MB 0.99% 97.14% 4.50MB 0.99% encoding/json.(*scanner).pushParseState
4MB 0.88% 98.02% 4MB 0.88% encoding/json.(*decodeState).literalStore
0 0% 98.02% 10.51MB 2.32% bytes.Split (inline)
0 0% 98.02% 31.13MB 6.88% database/sql.(*Rows).Next
0 0% 98.02% 31.13MB 6.88% database/sql.(*Rows).Next.func1
0 0% 98.02% 31.13MB 6.88% database/sql.(*Rows).nextLocked
0 0% 98.02% 31.13MB 6.88% database/sql.withLock
0 0% 98.02% 4MB 0.88% encoding/json.(*decodeState).array
0 0% 98.02% 4MB 0.88% encoding/json.(*decodeState).unmarshal
0 0% 98.02% 4MB 0.88% encoding/json.(*decodeState).value
0 0% 98.02% 4.50MB 0.99% encoding/json.checkValid
0 0% 98.02% 4.50MB 0.99% encoding/json.stateBeginValue
0 0% 98.02% 2.46MB 0.54% git.arvados.org/arvados.git/lib/service.(*command).RunCommand.ifCollectionInHost.func9
0 0% 98.02% 2.46MB 0.54% git.arvados.org/arvados.git/sdk/go/httpserver.(*metrics).ServeAPI.RequireLiteralToken.func3
...
The go tool pprof command can also connect directly to the profiling endpoint. In this mode, by default it will also save a snapshot in $HOME/pprof/.
To connect directly to the profiling endpoint and display a sampling of CPU usage over a 2-second interval:
$ go tool pprof -top 'http://localhost:3333/debug/pprof/profile?seconds=2'
Fetching profile over HTTP from http://localhost:3333/debug/pprof/profile?seconds=2
Saved profile in /home/username/pprof/pprof.keep-balance.samples.cpu.001.pb.gz
File: keep-balance
Build ID: edd0405c97f4235473dba21b7c7fd52c8f755cde
Type: cpu
Time: Nov 3, 2025 at 11:12am (EST)
Duration: 2.19s, Total samples = 2.98s (136.17%)
Showing nodes accounting for 2.57s, 86.24% of 2.98s total
Dropped 73 nodes (cum <= 0.01s)
flat flat% sum% cum cum%
0.34s 11.41% 11.41% 0.36s 12.08% runtime.findObject
0.15s 5.03% 16.44% 0.32s 10.74% regexp.(*Regexp).doOnePass
0.13s 4.36% 20.81% 0.13s 4.36% runtime.(*mspan).heapBitsSmallForAddr
0.12s 4.03% 24.83% 0.12s 4.03% runtime.(*gcBits).bitp (inline)
0.11s 3.69% 28.52% 0.11s 3.69% regexp/syntax.(*Inst).MatchRunePos
0.08s 2.68% 31.21% 0.09s 3.02% runtime.(*mspan).writeHeapBitsSmall
0.07s 2.35% 33.56% 0.07s 2.35% runtime.nextFreeFast (inline)
0.06s 2.01% 35.57% 0.06s 2.01% runtime.futex
0.06s 2.01% 37.58% 0.06s 2.01% runtime.memclrNoHeapPointers
0.05s 1.68% 39.26% 0.05s 1.68% indexbytebody
0.05s 1.68% 40.94% 0.05s 1.68% internal/runtime/syscall.Syscall6
0.05s 1.68% 42.62% 0.43s 14.43% runtime.mallocgc
0.05s 1.68% 44.30% 0.05s 1.68% runtime.memmove
0.05s 1.68% 45.97% 0.71s 23.83% runtime.scanobject
0.05s 1.68% 47.65% 0.05s 1.68% runtime.usleep
0.04s 1.34% 48.99% 0.26s 8.72% runtime.mallocgcSmallScanNoHeader
0.04s 1.34% 50.34% 0.04s 1.34% runtime.rand
0.03s 1.01% 51.34% 0.03s 1.01% crypto/internal/fips140/aes/gcm.gcmAesDec
0.03s 1.01% 52.35% 0.57s 19.13% database/sql.(*Rows).nextLocked
0.03s 1.01% 53.36% 0.13s 4.36% database/sql.convertAssignRows
0.03s 1.01% 54.36% 0.56s 18.79% git.arvados.org/arvados.git/sdk/go/arvados.(*Collection).SizedDigests
0.03s 1.01% 55.37% 0.18s 6.04% git.arvados.org/arvados.git/services/keep-balance.(*BlockStateMap).GetConfirmedReplication
...
http://localhost:3333/debug/pprof/ serves an HTML page with a list of available profiles:
allocs — A sampling of all past memory allocationsblock — Stack traces that led to blocking on synchronization primitivescmdline — The command line invocation of the current programgoroutine — Stack traces of all current goroutines. Use debug=2 as a query parameter to export in the same format as an unrecovered panic.heap — A sampling of memory allocations of live objects. You can specify the gc GET parameter to run GC before taking the heap sample.mutex — Stack traces of holders of contended mutexesprofile — CPU profile. You can specify the duration in the seconds GET parameter. After you get the profile file, use the go tool pprof command to investigate the profile.symbol — Maps given program counters to function names. Counters can be specified in a GET raw query or POST body, multiple counters are separated by ‘+’.threadcreate — Stack traces that led to the creation of new OS threadstrace — A trace of execution of the current program. You can specify the duration in the seconds GET parameter. After you get the trace file, use the go tool trace command to investigate the trace.Additional resources:
The content of this documentation is licensed under the
Creative
Commons Attribution-Share Alike 3.0 United States licence.
Code samples in this documentation are licensed under the
Apache License, Version 2.0.