Skip to content

Circuit Breakers

Introduction

When a downstream microservice has been failing for some time, retries may not be the best approach. Retries will keep sending requests to the microservice, trying to succeed no matter what.

That's a rather selfish strategy that can further overload the microservice and waste time and resources of all upstream microservices waiting for the failed one.

What if we could fail fast as soon as we understood that the odds of getting a successful response were small? Circuit breakers are designed exactly for this purpose.

Generally, circuit breakers track statistics about failed responses. At some point, they block all actions or requests passing through them for a period of time. That period is our hope that it will be enough for the downstream service to recover.

Use Cases

  • Monitor and isolate subsystems. Breakers are a great way to implement effective white-box monitoring, as they divide the system into subsystems. If one subsystem is failing, breakers dispatch the metrics needed to efficiently locate the problem.
  • Fail fast and efficiently if the failure persists. Improve request latency during failures.
  • Shed load from the downstream subsystem in case of failure.

States

Circuit breakers are implemented as state machines. The following states are supported:

  • Working (a.k.a. the closed state) - the system is healthy. Actions are executed.
  • Failing (a.k.a. the open state) - the system is failing. No actions are executed.
  • Recovering (a.k.a. the half-open state) - the recovery delay is over and the system is being probed.

Note

Hyx doesn't follow the traditional state names inspired by electrical circuit breakers. We believe you can find more intuitive names if you look outside that analogy.

Usage

Breakers come in two flavors:

import asyncio
from typing import Any

import httpx

from hyx.circuitbreaker import consecutive_breaker


class InventoryTemporaryError(RuntimeError):
    """
    Occurs when the inventory microservice is temporary inaccessible
    """


breaker = consecutive_breaker(
    exceptions=(InventoryTemporaryError,),
    failure_threshold=5,
    recovery_time_secs=30,
)


@breaker
async def get_product_qty_left(product_sku: str) -> dict[str, Any]:
    async with httpx.AsyncClient() as client:
        response = await client.get(f"http://inventory.shop/{product_sku}/")

        if response.status_code >= 500:
            raise InventoryTemporaryError

        return response.json()


asyncio.run(get_product_qty_left("guido-van-rossum-portrait"))
import asyncio
from typing import Any

import httpx

from hyx.circuitbreaker import consecutive_breaker


class InventoryTemporaryError(RuntimeError):
    """
    Occurs when the inventory microservice is temporary inaccessible
    """


breaker = consecutive_breaker(
    exceptions=(InventoryTemporaryError,),
    failure_threshold=5,
    recovery_time_secs=30,
)


async def get_product_qty_left(product_sku: str) -> dict[str, Any]:
    async with breaker:
        async with httpx.AsyncClient() as client:
            response = await client.get(f"http://inventory.shop/{product_sku}/")

            if response.status_code >= 500:
                raise InventoryTemporaryError

            return response.json()


asyncio.run(get_product_qty_left("guido-van-rossum-portrait"))

Note

Breakers are stateful components. The typical usage is to create a single breaker instance and use or inject it wherever you interact with the underlying subsystem that may fail.

Warning

For the sake of simplicity, Hyx assumes that you are following AsyncIO best practices and not running CPU-intensive operations in the main thread. Otherwise, the breaker delays may fire later after the thread is unblocked.

Breakers

Consecutive Breaker

class hyx.circuitbreaker.consecutive_breaker(exceptions=, failure_threshold=5, recovery_time_secs=30, recovery_threshold=3, listeners=None, name=None, event_manager=None)

Consecutive breaker is the most basic implementation of the circuit breaker pattern. It counts the absolute amount of times the system has been consecutively failed and turns into the failing state if the threshold is exceeded.

Then the breaker waits for the recovery delay and moves into the recovering state. If the action is successful, the breaker gets back to the working state. Otherwise, it goes back to the failing state and waits again.

Graphically, these transitions look like this:

stateDiagram
    [*] --> Working: start from
    Working --> Failing: failure threshold is exceeded
    Failing --> Recovering: after the recovery delay
    Recovering --> Working: after the recovery threshold is passed
    Recovering --> Failing: at least one failing result

Parameters

  • exceptions - Exception or list of exceptions that are considered as a failure
  • failure_threshold - Consecutive number of failures that turns breaker into the failing state
  • recovery_time_secs - Time in seconds we give breaker to recover from the failing state
  • recovery_threshold - Number of consecutive successes that is needed to be pass to turn breaker back to the working state