arvados.retry
Utilities to retry operations.
The core of this module is RetryLoop
, a utility class to retry operations
that might fail. It can distinguish between temporary and permanent failures;
provide exponential backoff; and save a series of results.
It also provides utility functions for common operations with RetryLoop
:
check_http_response_success
can be used as aRetryLoop
success_check
for HTTP response codes from the Arvados API server.retry_method
can decorate methods to provide a defaultnum_retries
keyword argument.
1"""Utilities to retry operations. 2 3The core of this module is `RetryLoop`, a utility class to retry operations 4that might fail. It can distinguish between temporary and permanent failures; 5provide exponential backoff; and save a series of results. 6 7It also provides utility functions for common operations with `RetryLoop`: 8 9* `check_http_response_success` can be used as a `RetryLoop` `success_check` 10 for HTTP response codes from the Arvados API server. 11* `retry_method` can decorate methods to provide a default `num_retries` 12 keyword argument. 13""" 14# Copyright (C) The Arvados Authors. All rights reserved. 15# 16# SPDX-License-Identifier: Apache-2.0 17 18import functools 19import inspect 20import pycurl 21import time 22 23from collections import deque 24from typing import ( 25 Callable, 26 Generic, 27 Optional, 28 TypeVar, 29) 30 31import arvados.errors 32 33_HTTP_SUCCESSES = set(range(200, 300)) 34_HTTP_CAN_RETRY = set([408, 409, 423, 500, 502, 503, 504]) 35 36CT = TypeVar('CT', bound=Callable) 37T = TypeVar('T') 38 39class RetryLoop(Generic[T]): 40 """Coordinate limited retries of code. 41 42 `RetryLoop` coordinates a loop that runs until it records a 43 successful result or tries too many times, whichever comes first. 44 Typical use looks like: 45 46 loop = RetryLoop(num_retries=2) 47 for tries_left in loop: 48 try: 49 result = do_something() 50 except TemporaryError as error: 51 log("error: {} ({} tries left)".format(error, tries_left)) 52 else: 53 loop.save_result(result) 54 if loop.success(): 55 return loop.last_result() 56 57 Arguments: 58 59 * num_retries: int --- The maximum number of times to retry the loop if 60 it doesn't succeed. This means the loop body could run at most 61 `num_retries + 1` times. 62 63 * success_check: Callable[[T], bool | None] --- This is a function that 64 will be called each time the loop saves a result. The function should 65 return `True` if the result indicates the code succeeded, `False` if 66 it represents a permanent failure, and `None` if it represents a 67 temporary failure. If no function is provided, the loop will end 68 after any result is saved. 69 70 * backoff_start: float --- The number of seconds that must pass before 71 the loop's second iteration. Default 0, which disables all waiting. 72 73 * backoff_growth: float --- The wait time multiplier after each 74 iteration. Default 2 (i.e., double the wait time each time). 75 76 * save_results: int --- Specify a number to store that many saved 77 results from the loop. These are available through the `results` 78 attribute, oldest first. Default 1. 79 80 * max_wait: float --- Maximum number of seconds to wait between 81 retries. Default 60. 82 """ 83 def __init__( 84 self, 85 num_retries: int, 86 success_check: Callable[[T], Optional[bool]]=lambda r: True, 87 backoff_start: float=0, 88 backoff_growth: float=2, 89 save_results: int=1, 90 max_wait: float=60 91 ) -> None: 92 self.tries_left = num_retries + 1 93 self.check_result = success_check 94 self.backoff_wait = backoff_start 95 self.backoff_growth = backoff_growth 96 self.max_wait = max_wait 97 self.next_start_time = 0 98 self.results = deque(maxlen=save_results) 99 self._attempts = 0 100 self._running = None 101 self._success = None 102 103 def __iter__(self) -> 'RetryLoop': 104 """Return an iterator of retries.""" 105 return self 106 107 def running(self) -> Optional[bool]: 108 """Return whether this loop is running. 109 110 Returns `None` if the loop has never run, `True` if it is still running, 111 or `False` if it has stopped—whether that's because it has saved a 112 successful result, a permanent failure, or has run out of retries. 113 """ 114 return self._running and (self._success is None) 115 116 def __next__(self) -> int: 117 """Record a loop attempt. 118 119 If the loop is still running, decrements the number of tries left and 120 returns it. Otherwise, raises `StopIteration`. 121 """ 122 if self._running is None: 123 self._running = True 124 if (self.tries_left < 1) or not self.running(): 125 self._running = False 126 raise StopIteration 127 else: 128 wait_time = max(0, self.next_start_time - time.time()) 129 time.sleep(wait_time) 130 self.backoff_wait *= self.backoff_growth 131 if self.backoff_wait > self.max_wait: 132 self.backoff_wait = self.max_wait 133 self.next_start_time = time.time() + self.backoff_wait 134 self.tries_left -= 1 135 return self.tries_left 136 137 def save_result(self, result: T) -> None: 138 """Record a loop result. 139 140 Save the given result, and end the loop if it indicates 141 success or permanent failure. See documentation for the `__init__` 142 `success_check` argument to learn how that's indicated. 143 144 Raises `arvados.errors.AssertionError` if called after the loop has 145 already ended. 146 147 Arguments: 148 149 * result: T --- The result from this loop attempt to check and save. 150 """ 151 if not self.running(): 152 raise arvados.errors.AssertionError( 153 "recorded a loop result after the loop finished") 154 self.results.append(result) 155 self._success = self.check_result(result) 156 self._attempts += 1 157 158 def success(self) -> Optional[bool]: 159 """Return the loop's end state. 160 161 Returns `True` if the loop recorded a successful result, `False` if it 162 recorded permanent failure, or else `None`. 163 """ 164 return self._success 165 166 def last_result(self) -> T: 167 """Return the most recent result the loop saved. 168 169 Raises `arvados.errors.AssertionError` if called before any result has 170 been saved. 171 """ 172 try: 173 return self.results[-1] 174 except IndexError: 175 raise arvados.errors.AssertionError( 176 "queried loop results before any were recorded") 177 178 def attempts(self) -> int: 179 """Return the number of results that have been saved. 180 181 This count includes all kinds of results: success, permanent failure, 182 and temporary failure. 183 """ 184 return self._attempts 185 186 def attempts_str(self) -> str: 187 """Return a human-friendly string counting saved results. 188 189 This method returns '1 attempt' or 'N attempts', where the number 190 in the string is the number of saved results. 191 """ 192 if self._attempts == 1: 193 return '1 attempt' 194 else: 195 return '{} attempts'.format(self._attempts) 196 197 198def check_http_response_success(status_code: int) -> Optional[bool]: 199 """Convert a numeric HTTP status code to a loop control flag. 200 201 This method takes a numeric HTTP status code and returns `True` if 202 the code indicates success, `None` if it indicates temporary 203 failure, and `False` otherwise. You can use this as the 204 `success_check` for a `RetryLoop` that queries the Arvados API server. 205 Specifically: 206 207 * Any 2xx result returns `True`. 208 209 * A select few status codes, or any malformed responses, return `None`. 210 211 * Everything else returns `False`. Note that this includes 1xx and 212 3xx status codes. They don't indicate success, and you can't 213 retry those requests verbatim. 214 215 Arguments: 216 217 * status_code: int --- A numeric HTTP response code 218 """ 219 if status_code in _HTTP_SUCCESSES: 220 return True 221 elif status_code in _HTTP_CAN_RETRY: 222 return None 223 elif 100 <= status_code < 600: 224 return False 225 else: 226 return None # Get well soon, server. 227 228def retry_method(orig_func: CT) -> CT: 229 """Provide a default value for a method's num_retries argument. 230 231 This is a decorator for instance and class methods that accept a 232 `num_retries` keyword argument, with a `None` default. When the method 233 is called without a value for `num_retries`, this decorator will set it 234 from the `num_retries` attribute of the underlying instance or class. 235 236 Arguments: 237 238 * orig_func: Callable --- A class or instance method that accepts a 239 `num_retries` keyword argument 240 """ 241 @functools.wraps(orig_func) 242 def num_retries_setter(self, *args, **kwargs): 243 if kwargs.get('num_retries') is None: 244 kwargs['num_retries'] = self.num_retries 245 return orig_func(self, *args, **kwargs) 246 return num_retries_setter
40class RetryLoop(Generic[T]): 41 """Coordinate limited retries of code. 42 43 `RetryLoop` coordinates a loop that runs until it records a 44 successful result or tries too many times, whichever comes first. 45 Typical use looks like: 46 47 loop = RetryLoop(num_retries=2) 48 for tries_left in loop: 49 try: 50 result = do_something() 51 except TemporaryError as error: 52 log("error: {} ({} tries left)".format(error, tries_left)) 53 else: 54 loop.save_result(result) 55 if loop.success(): 56 return loop.last_result() 57 58 Arguments: 59 60 * num_retries: int --- The maximum number of times to retry the loop if 61 it doesn't succeed. This means the loop body could run at most 62 `num_retries + 1` times. 63 64 * success_check: Callable[[T], bool | None] --- This is a function that 65 will be called each time the loop saves a result. The function should 66 return `True` if the result indicates the code succeeded, `False` if 67 it represents a permanent failure, and `None` if it represents a 68 temporary failure. If no function is provided, the loop will end 69 after any result is saved. 70 71 * backoff_start: float --- The number of seconds that must pass before 72 the loop's second iteration. Default 0, which disables all waiting. 73 74 * backoff_growth: float --- The wait time multiplier after each 75 iteration. Default 2 (i.e., double the wait time each time). 76 77 * save_results: int --- Specify a number to store that many saved 78 results from the loop. These are available through the `results` 79 attribute, oldest first. Default 1. 80 81 * max_wait: float --- Maximum number of seconds to wait between 82 retries. Default 60. 83 """ 84 def __init__( 85 self, 86 num_retries: int, 87 success_check: Callable[[T], Optional[bool]]=lambda r: True, 88 backoff_start: float=0, 89 backoff_growth: float=2, 90 save_results: int=1, 91 max_wait: float=60 92 ) -> None: 93 self.tries_left = num_retries + 1 94 self.check_result = success_check 95 self.backoff_wait = backoff_start 96 self.backoff_growth = backoff_growth 97 self.max_wait = max_wait 98 self.next_start_time = 0 99 self.results = deque(maxlen=save_results) 100 self._attempts = 0 101 self._running = None 102 self._success = None 103 104 def __iter__(self) -> 'RetryLoop': 105 """Return an iterator of retries.""" 106 return self 107 108 def running(self) -> Optional[bool]: 109 """Return whether this loop is running. 110 111 Returns `None` if the loop has never run, `True` if it is still running, 112 or `False` if it has stopped—whether that's because it has saved a 113 successful result, a permanent failure, or has run out of retries. 114 """ 115 return self._running and (self._success is None) 116 117 def __next__(self) -> int: 118 """Record a loop attempt. 119 120 If the loop is still running, decrements the number of tries left and 121 returns it. Otherwise, raises `StopIteration`. 122 """ 123 if self._running is None: 124 self._running = True 125 if (self.tries_left < 1) or not self.running(): 126 self._running = False 127 raise StopIteration 128 else: 129 wait_time = max(0, self.next_start_time - time.time()) 130 time.sleep(wait_time) 131 self.backoff_wait *= self.backoff_growth 132 if self.backoff_wait > self.max_wait: 133 self.backoff_wait = self.max_wait 134 self.next_start_time = time.time() + self.backoff_wait 135 self.tries_left -= 1 136 return self.tries_left 137 138 def save_result(self, result: T) -> None: 139 """Record a loop result. 140 141 Save the given result, and end the loop if it indicates 142 success or permanent failure. See documentation for the `__init__` 143 `success_check` argument to learn how that's indicated. 144 145 Raises `arvados.errors.AssertionError` if called after the loop has 146 already ended. 147 148 Arguments: 149 150 * result: T --- The result from this loop attempt to check and save. 151 """ 152 if not self.running(): 153 raise arvados.errors.AssertionError( 154 "recorded a loop result after the loop finished") 155 self.results.append(result) 156 self._success = self.check_result(result) 157 self._attempts += 1 158 159 def success(self) -> Optional[bool]: 160 """Return the loop's end state. 161 162 Returns `True` if the loop recorded a successful result, `False` if it 163 recorded permanent failure, or else `None`. 164 """ 165 return self._success 166 167 def last_result(self) -> T: 168 """Return the most recent result the loop saved. 169 170 Raises `arvados.errors.AssertionError` if called before any result has 171 been saved. 172 """ 173 try: 174 return self.results[-1] 175 except IndexError: 176 raise arvados.errors.AssertionError( 177 "queried loop results before any were recorded") 178 179 def attempts(self) -> int: 180 """Return the number of results that have been saved. 181 182 This count includes all kinds of results: success, permanent failure, 183 and temporary failure. 184 """ 185 return self._attempts 186 187 def attempts_str(self) -> str: 188 """Return a human-friendly string counting saved results. 189 190 This method returns '1 attempt' or 'N attempts', where the number 191 in the string is the number of saved results. 192 """ 193 if self._attempts == 1: 194 return '1 attempt' 195 else: 196 return '{} attempts'.format(self._attempts)
Coordinate limited retries of code.
RetryLoop
coordinates a loop that runs until it records a
successful result or tries too many times, whichever comes first.
Typical use looks like:
loop = RetryLoop(num_retries=2)
for tries_left in loop:
try:
result = do_something()
except TemporaryError as error:
log("error: {} ({} tries left)".format(error, tries_left))
else:
loop.save_result(result)
if loop.success():
return loop.last_result()
Arguments:
num_retries: int — The maximum number of times to retry the loop if it doesn’t succeed. This means the loop body could run at most
num_retries + 1
times.success_check: Callable[[T], bool | None] — This is a function that will be called each time the loop saves a result. The function should return
True
if the result indicates the code succeeded,False
if it represents a permanent failure, andNone
if it represents a temporary failure. If no function is provided, the loop will end after any result is saved.backoff_start: float — The number of seconds that must pass before the loop’s second iteration. Default 0, which disables all waiting.
backoff_growth: float — The wait time multiplier after each iteration. Default 2 (i.e., double the wait time each time).
save_results: int — Specify a number to store that many saved results from the loop. These are available through the
results
attribute, oldest first. Default 1.max_wait: float — Maximum number of seconds to wait between retries. Default 60.
84 def __init__( 85 self, 86 num_retries: int, 87 success_check: Callable[[T], Optional[bool]]=lambda r: True, 88 backoff_start: float=0, 89 backoff_growth: float=2, 90 save_results: int=1, 91 max_wait: float=60 92 ) -> None: 93 self.tries_left = num_retries + 1 94 self.check_result = success_check 95 self.backoff_wait = backoff_start 96 self.backoff_growth = backoff_growth 97 self.max_wait = max_wait 98 self.next_start_time = 0 99 self.results = deque(maxlen=save_results) 100 self._attempts = 0 101 self._running = None 102 self._success = None
108 def running(self) -> Optional[bool]: 109 """Return whether this loop is running. 110 111 Returns `None` if the loop has never run, `True` if it is still running, 112 or `False` if it has stopped—whether that's because it has saved a 113 successful result, a permanent failure, or has run out of retries. 114 """ 115 return self._running and (self._success is None)
Return whether this loop is running.
Returns None
if the loop has never run, True
if it is still running,
or False
if it has stopped—whether that’s because it has saved a
successful result, a permanent failure, or has run out of retries.
138 def save_result(self, result: T) -> None: 139 """Record a loop result. 140 141 Save the given result, and end the loop if it indicates 142 success or permanent failure. See documentation for the `__init__` 143 `success_check` argument to learn how that's indicated. 144 145 Raises `arvados.errors.AssertionError` if called after the loop has 146 already ended. 147 148 Arguments: 149 150 * result: T --- The result from this loop attempt to check and save. 151 """ 152 if not self.running(): 153 raise arvados.errors.AssertionError( 154 "recorded a loop result after the loop finished") 155 self.results.append(result) 156 self._success = self.check_result(result) 157 self._attempts += 1
Record a loop result.
Save the given result, and end the loop if it indicates
success or permanent failure. See documentation for the __init__
success_check
argument to learn how that’s indicated.
Raises arvados.errors.AssertionError
if called after the loop has
already ended.
Arguments:
- result: T — The result from this loop attempt to check and save.
159 def success(self) -> Optional[bool]: 160 """Return the loop's end state. 161 162 Returns `True` if the loop recorded a successful result, `False` if it 163 recorded permanent failure, or else `None`. 164 """ 165 return self._success
Return the loop’s end state.
Returns True
if the loop recorded a successful result, False
if it
recorded permanent failure, or else None
.
167 def last_result(self) -> T: 168 """Return the most recent result the loop saved. 169 170 Raises `arvados.errors.AssertionError` if called before any result has 171 been saved. 172 """ 173 try: 174 return self.results[-1] 175 except IndexError: 176 raise arvados.errors.AssertionError( 177 "queried loop results before any were recorded")
Return the most recent result the loop saved.
Raises arvados.errors.AssertionError
if called before any result has
been saved.
179 def attempts(self) -> int: 180 """Return the number of results that have been saved. 181 182 This count includes all kinds of results: success, permanent failure, 183 and temporary failure. 184 """ 185 return self._attempts
Return the number of results that have been saved.
This count includes all kinds of results: success, permanent failure, and temporary failure.
187 def attempts_str(self) -> str: 188 """Return a human-friendly string counting saved results. 189 190 This method returns '1 attempt' or 'N attempts', where the number 191 in the string is the number of saved results. 192 """ 193 if self._attempts == 1: 194 return '1 attempt' 195 else: 196 return '{} attempts'.format(self._attempts)
Return a human-friendly string counting saved results.
This method returns ‘1 attempt’ or ’N attempts’, where the number in the string is the number of saved results.
199def check_http_response_success(status_code: int) -> Optional[bool]: 200 """Convert a numeric HTTP status code to a loop control flag. 201 202 This method takes a numeric HTTP status code and returns `True` if 203 the code indicates success, `None` if it indicates temporary 204 failure, and `False` otherwise. You can use this as the 205 `success_check` for a `RetryLoop` that queries the Arvados API server. 206 Specifically: 207 208 * Any 2xx result returns `True`. 209 210 * A select few status codes, or any malformed responses, return `None`. 211 212 * Everything else returns `False`. Note that this includes 1xx and 213 3xx status codes. They don't indicate success, and you can't 214 retry those requests verbatim. 215 216 Arguments: 217 218 * status_code: int --- A numeric HTTP response code 219 """ 220 if status_code in _HTTP_SUCCESSES: 221 return True 222 elif status_code in _HTTP_CAN_RETRY: 223 return None 224 elif 100 <= status_code < 600: 225 return False 226 else: 227 return None # Get well soon, server.
Convert a numeric HTTP status code to a loop control flag.
This method takes a numeric HTTP status code and returns True
if
the code indicates success, None
if it indicates temporary
failure, and False
otherwise. You can use this as the
success_check
for a RetryLoop
that queries the Arvados API server.
Specifically:
Any 2xx result returns
True
.A select few status codes, or any malformed responses, return
None
.Everything else returns
False
. Note that this includes 1xx and 3xx status codes. They don’t indicate success, and you can’t retry those requests verbatim.
Arguments:
- status_code: int — A numeric HTTP response code
229def retry_method(orig_func: CT) -> CT: 230 """Provide a default value for a method's num_retries argument. 231 232 This is a decorator for instance and class methods that accept a 233 `num_retries` keyword argument, with a `None` default. When the method 234 is called without a value for `num_retries`, this decorator will set it 235 from the `num_retries` attribute of the underlying instance or class. 236 237 Arguments: 238 239 * orig_func: Callable --- A class or instance method that accepts a 240 `num_retries` keyword argument 241 """ 242 @functools.wraps(orig_func) 243 def num_retries_setter(self, *args, **kwargs): 244 if kwargs.get('num_retries') is None: 245 kwargs['num_retries'] = self.num_retries 246 return orig_func(self, *args, **kwargs) 247 return num_retries_setter
Provide a default value for a method’s num_retries argument.
This is a decorator for instance and class methods that accept a
num_retries
keyword argument, with a None
default. When the method
is called without a value for num_retries
, this decorator will set it
from the num_retries
attribute of the underlying instance or class.
Arguments:
- orig_func: Callable — A class or instance method that accepts a
num_retries
keyword argument