arvados.retry

Utilities to retry operations.

The core of this module is RetryLoop, a utility class to retry operations that might fail. It can distinguish between temporary and permanent failures; provide exponential backoff; and save a series of results.

It also provides utility functions for common operations with RetryLoop:

  1"""Utilities to retry operations.
  2
  3The core of this module is `RetryLoop`, a utility class to retry operations
  4that might fail. It can distinguish between temporary and permanent failures;
  5provide exponential backoff; and save a series of results.
  6
  7It also provides utility functions for common operations with `RetryLoop`:
  8
  9* `check_http_response_success` can be used as a `RetryLoop` `success_check`
 10  for HTTP response codes from the Arvados API server.
 11* `retry_method` can decorate methods to provide a default `num_retries`
 12  keyword argument.
 13"""
 14# Copyright (C) The Arvados Authors. All rights reserved.
 15#
 16# SPDX-License-Identifier: Apache-2.0
 17
 18import functools
 19import inspect
 20import pycurl
 21import time
 22
 23from collections import deque
 24from typing import (
 25    Callable,
 26    Generic,
 27    Optional,
 28    TypeVar,
 29)
 30
 31import arvados.errors
 32
 33_HTTP_SUCCESSES = set(range(200, 300))
 34_HTTP_CAN_RETRY = set([408, 409, 423, 500, 502, 503, 504])
 35
 36CT = TypeVar('CT', bound=Callable)
 37T = TypeVar('T')
 38
 39class RetryLoop(Generic[T]):
 40    """Coordinate limited retries of code.
 41
 42    `RetryLoop` coordinates a loop that runs until it records a
 43    successful result or tries too many times, whichever comes first.
 44    Typical use looks like:
 45
 46        loop = RetryLoop(num_retries=2)
 47        for tries_left in loop:
 48            try:
 49                result = do_something()
 50            except TemporaryError as error:
 51                log("error: {} ({} tries left)".format(error, tries_left))
 52            else:
 53                loop.save_result(result)
 54        if loop.success():
 55            return loop.last_result()
 56
 57    Arguments:
 58
 59    * num_retries: int --- The maximum number of times to retry the loop if
 60      it doesn't succeed.  This means the loop body could run at most
 61      `num_retries + 1` times.
 62
 63    * success_check: Callable[[T], bool | None] --- This is a function that
 64      will be called each time the loop saves a result.  The function should
 65      return `True` if the result indicates the code succeeded, `False` if
 66      it represents a permanent failure, and `None` if it represents a
 67      temporary failure.  If no function is provided, the loop will end
 68      after any result is saved.
 69
 70    * backoff_start: float --- The number of seconds that must pass before
 71      the loop's second iteration.  Default 0, which disables all waiting.
 72
 73    * backoff_growth: float --- The wait time multiplier after each
 74      iteration.  Default 2 (i.e., double the wait time each time).
 75
 76    * save_results: int --- Specify a number to store that many saved
 77      results from the loop.  These are available through the `results`
 78      attribute, oldest first.  Default 1.
 79
 80    * max_wait: float --- Maximum number of seconds to wait between
 81      retries. Default 60.
 82    """
 83    def __init__(
 84            self,
 85            num_retries: int,
 86            success_check: Callable[[T], Optional[bool]]=lambda r: True,
 87            backoff_start: float=0,
 88            backoff_growth: float=2,
 89            save_results: int=1,
 90            max_wait: float=60
 91    ) -> None:
 92        self.tries_left = num_retries + 1
 93        self.check_result = success_check
 94        self.backoff_wait = backoff_start
 95        self.backoff_growth = backoff_growth
 96        self.max_wait = max_wait
 97        self.next_start_time = 0
 98        self.results = deque(maxlen=save_results)
 99        self._attempts = 0
100        self._running = None
101        self._success = None
102
103    def __iter__(self) -> 'RetryLoop':
104        """Return an iterator of retries."""
105        return self
106
107    def running(self) -> Optional[bool]:
108        """Return whether this loop is running.
109
110        Returns `None` if the loop has never run, `True` if it is still running,
111        or `False` if it has stopped—whether that's because it has saved a
112        successful result, a permanent failure, or has run out of retries.
113        """
114        return self._running and (self._success is None)
115
116    def __next__(self) -> int:
117        """Record a loop attempt.
118
119        If the loop is still running, decrements the number of tries left and
120        returns it. Otherwise, raises `StopIteration`.
121        """
122        if self._running is None:
123            self._running = True
124        if (self.tries_left < 1) or not self.running():
125            self._running = False
126            raise StopIteration
127        else:
128            wait_time = max(0, self.next_start_time - time.time())
129            time.sleep(wait_time)
130            self.backoff_wait *= self.backoff_growth
131            if self.backoff_wait > self.max_wait:
132                self.backoff_wait = self.max_wait
133        self.next_start_time = time.time() + self.backoff_wait
134        self.tries_left -= 1
135        return self.tries_left
136
137    def save_result(self, result: T) -> None:
138        """Record a loop result.
139
140        Save the given result, and end the loop if it indicates
141        success or permanent failure. See documentation for the `__init__`
142        `success_check` argument to learn how that's indicated.
143
144        Raises `arvados.errors.AssertionError` if called after the loop has
145        already ended.
146
147        Arguments:
148
149        * result: T --- The result from this loop attempt to check and save.
150        """
151        if not self.running():
152            raise arvados.errors.AssertionError(
153                "recorded a loop result after the loop finished")
154        self.results.append(result)
155        self._success = self.check_result(result)
156        self._attempts += 1
157
158    def success(self) -> Optional[bool]:
159        """Return the loop's end state.
160
161        Returns `True` if the loop recorded a successful result, `False` if it
162        recorded permanent failure, or else `None`.
163        """
164        return self._success
165
166    def last_result(self) -> T:
167        """Return the most recent result the loop saved.
168
169        Raises `arvados.errors.AssertionError` if called before any result has
170        been saved.
171        """
172        try:
173            return self.results[-1]
174        except IndexError:
175            raise arvados.errors.AssertionError(
176                "queried loop results before any were recorded")
177
178    def attempts(self) -> int:
179        """Return the number of results that have been saved.
180
181        This count includes all kinds of results: success, permanent failure,
182        and temporary failure.
183        """
184        return self._attempts
185
186    def attempts_str(self) -> str:
187        """Return a human-friendly string counting saved results.
188
189        This method returns '1 attempt' or 'N attempts', where the number
190        in the string is the number of saved results.
191        """
192        if self._attempts == 1:
193            return '1 attempt'
194        else:
195            return '{} attempts'.format(self._attempts)
196
197
198def check_http_response_success(status_code: int) -> Optional[bool]:
199    """Convert a numeric HTTP status code to a loop control flag.
200
201    This method takes a numeric HTTP status code and returns `True` if
202    the code indicates success, `None` if it indicates temporary
203    failure, and `False` otherwise.  You can use this as the
204    `success_check` for a `RetryLoop` that queries the Arvados API server.
205    Specifically:
206
207    * Any 2xx result returns `True`.
208
209    * A select few status codes, or any malformed responses, return `None`.
210
211    * Everything else returns `False`.  Note that this includes 1xx and
212      3xx status codes.  They don't indicate success, and you can't
213      retry those requests verbatim.
214
215    Arguments:
216
217    * status_code: int --- A numeric HTTP response code
218    """
219    if status_code in _HTTP_SUCCESSES:
220        return True
221    elif status_code in _HTTP_CAN_RETRY:
222        return None
223    elif 100 <= status_code < 600:
224        return False
225    else:
226        return None  # Get well soon, server.
227
228def retry_method(orig_func: CT) -> CT:
229    """Provide a default value for a method's num_retries argument.
230
231    This is a decorator for instance and class methods that accept a
232    `num_retries` keyword argument, with a `None` default.  When the method
233    is called without a value for `num_retries`, this decorator will set it
234    from the `num_retries` attribute of the underlying instance or class.
235
236    Arguments:
237
238    * orig_func: Callable --- A class or instance method that accepts a
239    `num_retries` keyword argument
240    """
241    @functools.wraps(orig_func)
242    def num_retries_setter(self, *args, **kwargs):
243        if kwargs.get('num_retries') is None:
244            kwargs['num_retries'] = self.num_retries
245        return orig_func(self, *args, **kwargs)
246    return num_retries_setter
class RetryLoop(typing.Generic[~T]):
 40class RetryLoop(Generic[T]):
 41    """Coordinate limited retries of code.
 42
 43    `RetryLoop` coordinates a loop that runs until it records a
 44    successful result or tries too many times, whichever comes first.
 45    Typical use looks like:
 46
 47        loop = RetryLoop(num_retries=2)
 48        for tries_left in loop:
 49            try:
 50                result = do_something()
 51            except TemporaryError as error:
 52                log("error: {} ({} tries left)".format(error, tries_left))
 53            else:
 54                loop.save_result(result)
 55        if loop.success():
 56            return loop.last_result()
 57
 58    Arguments:
 59
 60    * num_retries: int --- The maximum number of times to retry the loop if
 61      it doesn't succeed.  This means the loop body could run at most
 62      `num_retries + 1` times.
 63
 64    * success_check: Callable[[T], bool | None] --- This is a function that
 65      will be called each time the loop saves a result.  The function should
 66      return `True` if the result indicates the code succeeded, `False` if
 67      it represents a permanent failure, and `None` if it represents a
 68      temporary failure.  If no function is provided, the loop will end
 69      after any result is saved.
 70
 71    * backoff_start: float --- The number of seconds that must pass before
 72      the loop's second iteration.  Default 0, which disables all waiting.
 73
 74    * backoff_growth: float --- The wait time multiplier after each
 75      iteration.  Default 2 (i.e., double the wait time each time).
 76
 77    * save_results: int --- Specify a number to store that many saved
 78      results from the loop.  These are available through the `results`
 79      attribute, oldest first.  Default 1.
 80
 81    * max_wait: float --- Maximum number of seconds to wait between
 82      retries. Default 60.
 83    """
 84    def __init__(
 85            self,
 86            num_retries: int,
 87            success_check: Callable[[T], Optional[bool]]=lambda r: True,
 88            backoff_start: float=0,
 89            backoff_growth: float=2,
 90            save_results: int=1,
 91            max_wait: float=60
 92    ) -> None:
 93        self.tries_left = num_retries + 1
 94        self.check_result = success_check
 95        self.backoff_wait = backoff_start
 96        self.backoff_growth = backoff_growth
 97        self.max_wait = max_wait
 98        self.next_start_time = 0
 99        self.results = deque(maxlen=save_results)
100        self._attempts = 0
101        self._running = None
102        self._success = None
103
104    def __iter__(self) -> 'RetryLoop':
105        """Return an iterator of retries."""
106        return self
107
108    def running(self) -> Optional[bool]:
109        """Return whether this loop is running.
110
111        Returns `None` if the loop has never run, `True` if it is still running,
112        or `False` if it has stopped—whether that's because it has saved a
113        successful result, a permanent failure, or has run out of retries.
114        """
115        return self._running and (self._success is None)
116
117    def __next__(self) -> int:
118        """Record a loop attempt.
119
120        If the loop is still running, decrements the number of tries left and
121        returns it. Otherwise, raises `StopIteration`.
122        """
123        if self._running is None:
124            self._running = True
125        if (self.tries_left < 1) or not self.running():
126            self._running = False
127            raise StopIteration
128        else:
129            wait_time = max(0, self.next_start_time - time.time())
130            time.sleep(wait_time)
131            self.backoff_wait *= self.backoff_growth
132            if self.backoff_wait > self.max_wait:
133                self.backoff_wait = self.max_wait
134        self.next_start_time = time.time() + self.backoff_wait
135        self.tries_left -= 1
136        return self.tries_left
137
138    def save_result(self, result: T) -> None:
139        """Record a loop result.
140
141        Save the given result, and end the loop if it indicates
142        success or permanent failure. See documentation for the `__init__`
143        `success_check` argument to learn how that's indicated.
144
145        Raises `arvados.errors.AssertionError` if called after the loop has
146        already ended.
147
148        Arguments:
149
150        * result: T --- The result from this loop attempt to check and save.
151        """
152        if not self.running():
153            raise arvados.errors.AssertionError(
154                "recorded a loop result after the loop finished")
155        self.results.append(result)
156        self._success = self.check_result(result)
157        self._attempts += 1
158
159    def success(self) -> Optional[bool]:
160        """Return the loop's end state.
161
162        Returns `True` if the loop recorded a successful result, `False` if it
163        recorded permanent failure, or else `None`.
164        """
165        return self._success
166
167    def last_result(self) -> T:
168        """Return the most recent result the loop saved.
169
170        Raises `arvados.errors.AssertionError` if called before any result has
171        been saved.
172        """
173        try:
174            return self.results[-1]
175        except IndexError:
176            raise arvados.errors.AssertionError(
177                "queried loop results before any were recorded")
178
179    def attempts(self) -> int:
180        """Return the number of results that have been saved.
181
182        This count includes all kinds of results: success, permanent failure,
183        and temporary failure.
184        """
185        return self._attempts
186
187    def attempts_str(self) -> str:
188        """Return a human-friendly string counting saved results.
189
190        This method returns '1 attempt' or 'N attempts', where the number
191        in the string is the number of saved results.
192        """
193        if self._attempts == 1:
194            return '1 attempt'
195        else:
196            return '{} attempts'.format(self._attempts)

Coordinate limited retries of code.

RetryLoop coordinates a loop that runs until it records a successful result or tries too many times, whichever comes first. Typical use looks like:

loop = RetryLoop(num_retries=2)
for tries_left in loop:
    try:
        result = do_something()
    except TemporaryError as error:
        log("error: {} ({} tries left)".format(error, tries_left))
    else:
        loop.save_result(result)
if loop.success():
    return loop.last_result()

Arguments:

  • num_retries: int — The maximum number of times to retry the loop if it doesn’t succeed. This means the loop body could run at most num_retries + 1 times.

  • success_check: Callable[[T], bool | None] — This is a function that will be called each time the loop saves a result. The function should return True if the result indicates the code succeeded, False if it represents a permanent failure, and None if it represents a temporary failure. If no function is provided, the loop will end after any result is saved.

  • backoff_start: float — The number of seconds that must pass before the loop’s second iteration. Default 0, which disables all waiting.

  • backoff_growth: float — The wait time multiplier after each iteration. Default 2 (i.e., double the wait time each time).

  • save_results: int — Specify a number to store that many saved results from the loop. These are available through the results attribute, oldest first. Default 1.

  • max_wait: float — Maximum number of seconds to wait between retries. Default 60.

RetryLoop( num_retries: int, success_check: Callable[[~T], Optional[bool]] = <function RetryLoop.<lambda>>, backoff_start: float = 0, backoff_growth: float = 2, save_results: int = 1, max_wait: float = 60)
 84    def __init__(
 85            self,
 86            num_retries: int,
 87            success_check: Callable[[T], Optional[bool]]=lambda r: True,
 88            backoff_start: float=0,
 89            backoff_growth: float=2,
 90            save_results: int=1,
 91            max_wait: float=60
 92    ) -> None:
 93        self.tries_left = num_retries + 1
 94        self.check_result = success_check
 95        self.backoff_wait = backoff_start
 96        self.backoff_growth = backoff_growth
 97        self.max_wait = max_wait
 98        self.next_start_time = 0
 99        self.results = deque(maxlen=save_results)
100        self._attempts = 0
101        self._running = None
102        self._success = None
tries_left
check_result
backoff_wait
backoff_growth
max_wait
next_start_time
results
def running(self) -> Optional[bool]:
108    def running(self) -> Optional[bool]:
109        """Return whether this loop is running.
110
111        Returns `None` if the loop has never run, `True` if it is still running,
112        or `False` if it has stopped—whether that's because it has saved a
113        successful result, a permanent failure, or has run out of retries.
114        """
115        return self._running and (self._success is None)

Return whether this loop is running.

Returns None if the loop has never run, True if it is still running, or False if it has stopped—whether that’s because it has saved a successful result, a permanent failure, or has run out of retries.

def save_result(self, result: ~T) -> None:
138    def save_result(self, result: T) -> None:
139        """Record a loop result.
140
141        Save the given result, and end the loop if it indicates
142        success or permanent failure. See documentation for the `__init__`
143        `success_check` argument to learn how that's indicated.
144
145        Raises `arvados.errors.AssertionError` if called after the loop has
146        already ended.
147
148        Arguments:
149
150        * result: T --- The result from this loop attempt to check and save.
151        """
152        if not self.running():
153            raise arvados.errors.AssertionError(
154                "recorded a loop result after the loop finished")
155        self.results.append(result)
156        self._success = self.check_result(result)
157        self._attempts += 1

Record a loop result.

Save the given result, and end the loop if it indicates success or permanent failure. See documentation for the __init__ success_check argument to learn how that’s indicated.

Raises arvados.errors.AssertionError if called after the loop has already ended.

Arguments:

  • result: T — The result from this loop attempt to check and save.
def success(self) -> Optional[bool]:
159    def success(self) -> Optional[bool]:
160        """Return the loop's end state.
161
162        Returns `True` if the loop recorded a successful result, `False` if it
163        recorded permanent failure, or else `None`.
164        """
165        return self._success

Return the loop’s end state.

Returns True if the loop recorded a successful result, False if it recorded permanent failure, or else None.

def last_result(self) -> ~T:
167    def last_result(self) -> T:
168        """Return the most recent result the loop saved.
169
170        Raises `arvados.errors.AssertionError` if called before any result has
171        been saved.
172        """
173        try:
174            return self.results[-1]
175        except IndexError:
176            raise arvados.errors.AssertionError(
177                "queried loop results before any were recorded")

Return the most recent result the loop saved.

Raises arvados.errors.AssertionError if called before any result has been saved.

def attempts(self) -> int:
179    def attempts(self) -> int:
180        """Return the number of results that have been saved.
181
182        This count includes all kinds of results: success, permanent failure,
183        and temporary failure.
184        """
185        return self._attempts

Return the number of results that have been saved.

This count includes all kinds of results: success, permanent failure, and temporary failure.

def attempts_str(self) -> str:
187    def attempts_str(self) -> str:
188        """Return a human-friendly string counting saved results.
189
190        This method returns '1 attempt' or 'N attempts', where the number
191        in the string is the number of saved results.
192        """
193        if self._attempts == 1:
194            return '1 attempt'
195        else:
196            return '{} attempts'.format(self._attempts)

Return a human-friendly string counting saved results.

This method returns ‘1 attempt’ or ’N attempts’, where the number in the string is the number of saved results.

def check_http_response_success(status_code: int) -> Optional[bool]:
199def check_http_response_success(status_code: int) -> Optional[bool]:
200    """Convert a numeric HTTP status code to a loop control flag.
201
202    This method takes a numeric HTTP status code and returns `True` if
203    the code indicates success, `None` if it indicates temporary
204    failure, and `False` otherwise.  You can use this as the
205    `success_check` for a `RetryLoop` that queries the Arvados API server.
206    Specifically:
207
208    * Any 2xx result returns `True`.
209
210    * A select few status codes, or any malformed responses, return `None`.
211
212    * Everything else returns `False`.  Note that this includes 1xx and
213      3xx status codes.  They don't indicate success, and you can't
214      retry those requests verbatim.
215
216    Arguments:
217
218    * status_code: int --- A numeric HTTP response code
219    """
220    if status_code in _HTTP_SUCCESSES:
221        return True
222    elif status_code in _HTTP_CAN_RETRY:
223        return None
224    elif 100 <= status_code < 600:
225        return False
226    else:
227        return None  # Get well soon, server.

Convert a numeric HTTP status code to a loop control flag.

This method takes a numeric HTTP status code and returns True if the code indicates success, None if it indicates temporary failure, and False otherwise. You can use this as the success_check for a RetryLoop that queries the Arvados API server. Specifically:

  • Any 2xx result returns True.

  • A select few status codes, or any malformed responses, return None.

  • Everything else returns False. Note that this includes 1xx and 3xx status codes. They don’t indicate success, and you can’t retry those requests verbatim.

Arguments:

  • status_code: int — A numeric HTTP response code
def retry_method(orig_func: ~CT) -> ~CT:
229def retry_method(orig_func: CT) -> CT:
230    """Provide a default value for a method's num_retries argument.
231
232    This is a decorator for instance and class methods that accept a
233    `num_retries` keyword argument, with a `None` default.  When the method
234    is called without a value for `num_retries`, this decorator will set it
235    from the `num_retries` attribute of the underlying instance or class.
236
237    Arguments:
238
239    * orig_func: Callable --- A class or instance method that accepts a
240    `num_retries` keyword argument
241    """
242    @functools.wraps(orig_func)
243    def num_retries_setter(self, *args, **kwargs):
244        if kwargs.get('num_retries') is None:
245            kwargs['num_retries'] = self.num_retries
246        return orig_func(self, *args, **kwargs)
247    return num_retries_setter

Provide a default value for a method’s num_retries argument.

This is a decorator for instance and class methods that accept a num_retries keyword argument, with a None default. When the method is called without a value for num_retries, this decorator will set it from the num_retries attribute of the underlying instance or class.

Arguments:

  • orig_func: Callable — A class or instance method that accepts a num_retries keyword argument