Circuit Breakers¶
Introduction¶
When a downstream microservice has been failing for some time, retries may not be the best approach. Retries will keep sending requests to the microservice, trying to succeed no matter what.
That's a rather selfish strategy that can further overload the microservice and waste time and resources of all upstream microservices waiting for the failed one.
What if we could fail fast as soon as we understood that the odds of getting a successful response were small? Circuit breakers are designed exactly for this purpose.
Generally, circuit breakers track statistics about failed responses. At some point, they block all actions or requests passing through them for a period of time. That period is our hope that it will be enough for the downstream service to recover.
Use Cases¶
- Monitor and isolate subsystems. Breakers are a great way to implement effective white-box monitoring, as they divide the system into subsystems. If one subsystem is failing, breakers dispatch the metrics needed to efficiently locate the problem.
- Fail fast and efficiently if the failure persists. Improve request latency during failures.
- Shed load from the downstream subsystem in case of failure.
States¶
Circuit breakers are implemented as state machines. The following states are supported:
Working(a.k.a. the closed state) - the system is healthy. Actions are executed.Failing(a.k.a. the open state) - the system is failing. No actions are executed.Recovering(a.k.a. the half-open state) - the recovery delay is over and the system is being probed.
Note
Hyx doesn't follow the traditional state names inspired by electrical circuit breakers. We believe you can find more intuitive names if you look outside that analogy.
Usage¶
Breakers come in two flavors:
import asyncio
from typing import Any
import httpx
from hyx.circuitbreaker import consecutive_breaker
class InventoryTemporaryError(RuntimeError):
"""
Occurs when the inventory microservice is temporary inaccessible
"""
breaker = consecutive_breaker(
exceptions=(InventoryTemporaryError,),
failure_threshold=5,
recovery_time_secs=30,
)
@breaker
async def get_product_qty_left(product_sku: str) -> dict[str, Any]:
async with httpx.AsyncClient() as client:
response = await client.get(f"http://inventory.shop/{product_sku}/")
if response.status_code >= 500:
raise InventoryTemporaryError
return response.json()
asyncio.run(get_product_qty_left("guido-van-rossum-portrait"))
import asyncio
from typing import Any
import httpx
from hyx.circuitbreaker import consecutive_breaker
class InventoryTemporaryError(RuntimeError):
"""
Occurs when the inventory microservice is temporary inaccessible
"""
breaker = consecutive_breaker(
exceptions=(InventoryTemporaryError,),
failure_threshold=5,
recovery_time_secs=30,
)
async def get_product_qty_left(product_sku: str) -> dict[str, Any]:
async with breaker:
async with httpx.AsyncClient() as client:
response = await client.get(f"http://inventory.shop/{product_sku}/")
if response.status_code >= 500:
raise InventoryTemporaryError
return response.json()
asyncio.run(get_product_qty_left("guido-van-rossum-portrait"))
Note
Breakers are stateful components. The typical usage is to create a single breaker instance and use or inject it wherever you interact with the underlying subsystem that may fail.
Warning
For the sake of simplicity, Hyx assumes that you are following AsyncIO best practices and not running CPU-intensive operations in the main thread. Otherwise, the breaker delays may fire later after the thread is unblocked.
Breakers¶
Consecutive Breaker¶
hyx.circuitbreaker.consecutive_breaker(exceptions=Consecutive breaker is the most basic implementation of the circuit breaker pattern.
It counts the absolute amount of times the system has been consecutively failed and
turns into the failing state if the threshold is exceeded.
Then the breaker waits for the recovery delay and moves into the recovering state.
If the action is successful, the breaker gets back to the working state.
Otherwise, it goes back to the failing state and waits again.
Graphically, these transitions look like this:
stateDiagram
[*] --> Working: start from
Working --> Failing: failure threshold is exceeded
Failing --> Recovering: after the recovery delay
Recovering --> Working: after the recovery threshold is passed
Recovering --> Failing: at least one failing result
Parameters
- exceptions - Exception or list of exceptions that are considered as a failure
- failure_threshold - Consecutive number of failures that turns breaker into the
failingstate - recovery_time_secs - Time in seconds we give breaker to recover from the
failingstate - recovery_threshold - Number of consecutive successes that is needed to be pass to
turn breaker back to the
workingstate