Metrics
          Some Arvados services publish Prometheus/OpenMetrics-compatible metrics at /metrics, and some provide additional runtime status at /status.json.  Metrics can help you understand how components perform under load, find performance bottlenecks, and detect and diagnose problems.
To access metrics endpoints, services must be configured with a management token. When accessing a metrics endpoint, prefix the management token with "Bearer " and supply it in the Authorization request header.
curl -sfH "Authorization: Bearer your_management_token_goes_here" "https://0.0.0.0:25107/status.json"
Keep-web
Keep-web exports metrics at /metrics — e.g., https://collections.zzzzz.arvadosapi.com/metrics.
	
		| Name | 
		Type | 
		Description | 
	
	
		| request_duration_seconds | 
		summary | 
		elapsed time between receiving a request and sending the last byte of the response body (segmented by HTTP request method and response status code) | 
	
	
		| time_to_status_seconds | 
		summary | 
		elapsed time between receiving a request and sending the HTTP response status code (segmented by HTTP request method and response status code) | 
	
Metrics in the arvados_keepweb_collectioncache namespace report keep-web’s internal cache of Arvados collection metadata.
	
		| Name | 
		Type | 
		Description | 
	
	
		| arvados_keepweb_collectioncache_requests | 
		counter | 
		cache lookups | 
	
	
		| arvados_keepweb_collectioncache_api_calls | 
		counter | 
		outgoing API calls | 
	
	
		| arvados_keepweb_collectioncache_permission_hits | 
		counter | 
		collection-to-permission cache hits | 
	
	
		| arvados_keepweb_collectioncache_pdh_hits | 
		counter | 
		UUID-to-PDH cache hits | 
	
	
		| arvados_keepweb_collectioncache_hits | 
		counter | 
		PDH-to-manifest cache hits | 
	
	
		| arvados_keepweb_collectioncache_cached_manifests | 
		gauge | 
		number of collections in the cache | 
	
	
		| arvados_keepweb_collectioncache_cached_manifest_bytes | 
		gauge | 
		memory consumed by cached collection manifests | 
	
Keepstore
Keepstore exports metrics at /status.json — e.g., http://keep0.zzzzz.arvadosapi.com:25107/status.json.
Root
volumeStatusEnt
VolumeStatus
	
		| Attribute | 
		Type | 
		Description | 
	
	
		| MountPoint | 
		 string | 
		 | 
	
	
		| DeviceNum | 
		  uint64 | 
		 | 
	
	
		| BytesFree | 
		  uint64 | 
		 | 
	
	
		| BytesUsed | 
		  uint64 | 
		 | 
	
ioStats
	
		| Attribute | 
		Type | 
		Description | 
	
	
		| Errors | 
		     uint64 | 
		 | 
	
	
		| Ops | 
		        uint64 | 
		 | 
	
	
		| CompareOps | 
		 uint64 | 
		 | 
	
	
		| GetOps | 
		     uint64 | 
		 | 
	
	
		| PutOps | 
		     uint64 | 
		 | 
	
	
		| TouchOps | 
		   uint64 | 
		 | 
	
	
		| InBytes | 
		    uint64 | 
		 | 
	
	
		| OutBytes | 
		   uint64 | 
		 | 
	
PoolStatus
	
		| Attribute | 
		Type | 
		Description | 
	
	
		| BytesAllocatedCumulative | 
			 uint64 | 
		 | 
	
	
		| BuffersMax | 
			int | 
		 | 
	
	
		| BuffersInUse | 
			int | 
		 | 
	
WorkQueueStatus
	
		| Attribute | 
		Type | 
		Description | 
	
	
		| InProgress | 
		 int | 
		 | 
	
	
		| Queued | 
		     int | 
		 | 
	
Example response
{
  "Volumes": [
    {
      "Label": "[UnixVolume /var/lib/arvados/keep0]",
      "Status": {
        "MountPoint": "/var/lib/arvados/keep0",
        "DeviceNum": 65029,
        "BytesFree": 222532972544,
        "BytesUsed": 435456679936
      },
      "InternalStats": {
        "Errors": 0,
        "InBytes": 1111,
        "OutBytes": 0,
        "OpenOps": 1,
        "StatOps": 4,
        "FlockOps": 0,
        "UtimesOps": 0,
        "CreateOps": 0,
        "RenameOps": 0,
        "UnlinkOps": 0,
        "ReaddirOps": 0
      }
    }
  ],
  "BufferPool": {
    "BytesAllocatedCumulative": 67108864,
    "BuffersMax": 20,
    "BuffersInUse": 0
  },
  "PullQueue": {
    "InProgress": 0,
    "Queued": 0
  },
  "TrashQueue": {
    "InProgress": 0,
    "Queued": 0
  },
  "RequestsCurrent": 1,
  "RequestsMax": 40,
  "Version": "dev"
}
Keep-balance
Keep-balance exports metrics at /metrics — e.g., http://keep.zzzzz.arvadosapi.com:9005/metrics.
	
		| Name | 
		Type | 
		Description | 
	
	
		| arvados_keep_total_{replicas,blocks,bytes} | 
		gauge | 
		stored data (stored in backend volumes, whether referenced or not) | 
	
	
		| arvados_keep_garbage_{replicas,blocks,bytes} | 
		gauge | 
		garbage data (unreferenced, and old enough to trash) | 
	
	
		| arvados_keep_transient_{replicas,blocks,bytes} | 
		gauge | 
		transient data (unreferenced, but too new to trash) | 
	
	
		| arvados_keep_overreplicated_{replicas,blocks,bytes} | 
		gauge | 
		overreplicated data (more replicas exist than are needed) | 
	
	
		| arvados_keep_underreplicated_{replicas,blocks,bytes} | 
		gauge | 
		underreplicated data (fewer replicas exist than are needed) | 
	
	
		| arvados_keep_lost_{replicas,blocks,bytes} | 
		gauge | 
		lost data (referenced by collections, but not found on any backend volume) | 
	
	
		| arvados_keep_dedup_block_ratio | 
		gauge | 
		deduplication ratio (block references in collections ÷ distinct blocks referenced) | 
	
	
		| arvados_keep_dedup_byte_ratio | 
		gauge | 
		deduplication ratio (block references in collections ÷ distinct blocks referenced, weighted by block size) | 
	
	
		| arvados_keepbalance_get_state_seconds | 
		summary | 
		time to get all collections and keepstore volume indexes for one iteration | 
	
	
		| arvados_keepbalance_changeset_compute_seconds | 
		summary | 
		time to compute changesets for one iteration | 
	
	
		| arvados_keepbalance_send_pull_list_seconds | 
		summary | 
		time to send pull lists to all keepstore servers for one iteration | 
	
	
		| arvados_keepbalance_send_trash_list_seconds | 
		summary | 
		time to send trash lists to all keepstore servers for one iteration | 
	
	
		| arvados_keepbalance_sweep_seconds | 
		summary | 
		time to complete one iteration | 
	
Each arvados_keep_ storage state statistic above is presented as a set of three metrics:
	
		| *_blocks | 
		distinct block hashes | 
	
	
		| *_bytes | 
		bytes stored on backend volumes | 
	
	
		| *_replicas | 
		objects/files stored on backend volumes | 
	
Node manager
The node manager status end point provides a snapshot of internal status at the time of the most recent wishlist update.
	
		| Attribute | 
		Type | 
		Description | 
	
	
		| nodes_booting | 
		int | 
		Number of nodes in booting state | 
	
	
		| nodes_unpaired | 
		int | 
		Number of nodes in unpaired state | 
	
	
		| nodes_busy | 
		int | 
		Number of nodes in busy state | 
	
	
		| nodes_idle | 
		int | 
		Number of nodes in idle state | 
	
	
		| nodes_fail | 
		int | 
		Number of nodes in fail state | 
	
	
		| nodes_down | 
		int | 
		Number of nodes in down state | 
	
	
		| nodes_shutdown | 
		int | 
		Number of nodes in shutdown state | 
	
	
		| nodes_wish | 
		int | 
		Number of nodes in the current wishlist | 
	
	
		| node_quota | 
		int | 
		Current node count ceiling due to cloud quota limits | 
	
	
		| config_max_nodes | 
		int | 
		Configured max node count | 
	
Example
{
  "actor_exceptions": 0,
  "idle_times": {
    "compute1": 0,
    "compute3": 0,
    "compute2": 0,
    "compute4": 0
  },
  "create_node_errors": 0,
  "destroy_node_errors": 0,
  "nodes_idle": 0,
  "config_max_nodes": 8,
  "list_nodes_errors": 0,
  "node_quota": 8,
  "Version": "1.1.4.20180719160944",
  "nodes_wish": 0,
  "nodes_unpaired": 0,
  "nodes_busy": 4,
  "boot_failures": 0
}
          
  
            
      
      
      
        
      
    
  
  
            
      
      
      
        
      
            
      
      
      
        
      
            
      
      
      
        
      
    
  
  
            
      
      
      
        
      
            
      
      
      
        
      
            
      
      
      
        
      
            
      
      
      
        
      
            
      
      
      
        
      
    
  
  
            
      
      
      
        
      
            
      
      
      
        
      
            
      
      
        
        
          Previous: Health checks
        
        Next: Management token