Skip to content

Circuit Breakers

Introduction

When the downstream microservice has been failing for some time, retries may not be the best thing to do. Retries will keep sending requests to the microservices trying to succeed no matter what.

That's a pretty selfish strategy that further leads to overloading the microservice, wasting time and resources of all upstream microservices that are waiting for the failed one.

What if we could fail as soon as we understood that odds to get a successful response was small? Circuit breakers are designed exactly for this.

Generally, circuit breakers are calculating some statistics about failed responses. At some point, they block all actions or requests that go through them for some time. That time is our hope that it's going to be enough for the downstream service to recover.

Use Cases

  • Monitor and isolate subsystems. Breakers are a great way to implement the effective white-box monitoring as they divide the whole system into subsystems. If one of the subsystems is failing, breakers dispatch metrics needed to efficiently locate the problem.
  • Fail fast and efficiently if the failure is persisting for a long time. Improve latency of the requests in case of failures
  • Shed the load from the downstream subsystem in case of failure

States

Circuit breakers are implemented as state machines. The following states are supported:

  • Working (a.k.a. the closed state) - the system is healthy. Actions are executed.
  • Failing (a.k.a. the open state) - the system is failing. No actions are executed.
  • Recovering (a.k.a. the half-open state) - the recovery delay is over and now the system is being probed

Note

Hyx doesn't follow the traditional state names inspired by the electrical circuit breaker. We believe that you could find more straightforward names if you look outside that analogy.

Usage

The breakers come into two flavors:

import asyncio
from typing import Any

import httpx

from hyx.circuitbreaker import consecutive_breaker


class InventoryTemporaryError(RuntimeError):
    """
    Occurs when the inventory microservice is temporary inaccessible
    """


breaker = consecutive_breaker(
    exceptions=(InventoryTemporaryError,),
    failure_threshold=5,
    recovery_time_secs=30,
)


@breaker
async def get_product_qty_left(product_sku: str) -> dict[str, Any]:
    async with httpx.AsyncClient() as client:
        response = await client.get(f"http://inventory.shop/{product_sku}/")

        if response.status_code >= 500:
            raise InventoryTemporaryError

        return response.json()


asyncio.run(get_product_qty_left("guido-van-rossum-portrait"))
import asyncio
from typing import Any

import httpx

from hyx.circuitbreaker import consecutive_breaker


class InventoryTemporaryError(RuntimeError):
    """
    Occurs when the inventory microservice is temporary inaccessible
    """


breaker = consecutive_breaker(
    exceptions=(InventoryTemporaryError,),
    failure_threshold=5,
    recovery_time_secs=30,
)


async def get_product_qty_left(product_sku: str) -> dict[str, Any]:
    async with breaker:
        async with httpx.AsyncClient() as client:
            response = await client.get(f"http://inventory.shop/{product_sku}/")

            if response.status_code >= 500:
                raise InventoryTemporaryError

            return response.json()


asyncio.run(get_product_qty_left("guido-van-rossum-portrait"))

Note

Breakers are stateful components. The regular usage is to create an instance of a breaker and use or inject it in all places that are working with the underlying subsystem that we anticipate to fail.

Warning

For the sake of simplicity, Hyx assumes that you are following AsyncIO best practices and not running CPU-intensive operations in the main thread. Otherwise, the breaker delays may fire later after the thread is unblocked.

Breakers

Consecutive Breaker

class hyx.circuitbreaker.consecutive_breaker(exceptions=, failure_threshold=5, recovery_time_secs=30, recovery_threshold=3, listeners=None, name=None, event_manager=None)

Consecutive breaker is the most basic implementation of the circuit breaker pattern. It counts the absolute amount of times the system has been consecutively failed and turns into the failing state if the threshold is exceeded.

Then the breaker waits for the recovery delay and moves into the recovering state. If the action is successful, the breaker gets back to the working state. Otherwise, it goes back to the failing state and waits again.

Graphically, these transitions look like this:

stateDiagram
    [*] --> Working: start from
    Working --> Failing: failure threshold is exceeded
    Failing --> Recovering: after the recovery delay
    Recovering --> Working: after the recovery threshold is passed
    Recovering --> Failing: at least one failing result

Parameters

  • exceptions - Exception or list of exceptions that are considered as a failure
  • failure_threshold - Consecutive number of failures that turns breaker into the failing state
  • recovery_time_secs - Time in seconds we give breaker to recover from the failing state
  • recovery_threshold - Number of consecutive successes that is needed to be pass to turn breaker back to the working state