Circuit Breakers¶
Introduction¶
When the downstream microservice has been failing for some time, retries may not be the best thing to do. Retries will keep sending requests to the microservices trying to succeed no matter what.
That's a pretty selfish strategy that further leads to overloading the microservice, wasting time and resources of all upstream microservices that are waiting for the failed one.
What if we could fail as soon as we understood that odds to get a successful response was small? Circuit breakers are designed exactly for this.
Generally, circuit breakers are calculating some statistics about failed responses. At some point, they block all actions or requests that go through them for some time. That time is our hope that it's going to be enough for the downstream service to recover.
Use Cases¶
- Monitor and isolate subsystems. Breakers are a great way to implement the effective white-box monitoring as they divide the whole system into subsystems. If one of the subsystems is failing, breakers dispatch metrics needed to efficiently locate the problem.
- Fail fast and efficiently if the failure is persisting for a long time. Improve latency of the requests in case of failures
- Shed the load from the downstream subsystem in case of failure
States¶
Circuit breakers are implemented as state machines. The following states are supported:
Working
(a.k.a. the closed state) - the system is healthy. Actions are executed.Failing
(a.k.a. the open state) - the system is failing. No actions are executed.Recovering
(a.k.a. the half-open state) - the recovery delay is over and now the system is being probed
Note
Hyx doesn't follow the traditional state names inspired by the electrical circuit breaker. We believe that you could find more straightforward names if you look outside that analogy.
Usage¶
The breakers come into two flavors:
import asyncio
from typing import Any
import httpx
from hyx.circuitbreaker import consecutive_breaker
class InventoryTemporaryError(RuntimeError):
"""
Occurs when the inventory microservice is temporary inaccessible
"""
breaker = consecutive_breaker(
exceptions=(InventoryTemporaryError,),
failure_threshold=5,
recovery_time_secs=30,
)
@breaker
async def get_product_qty_left(product_sku: str) -> dict[str, Any]:
async with httpx.AsyncClient() as client:
response = await client.get(f"http://inventory.shop/{product_sku}/")
if response.status_code >= 500:
raise InventoryTemporaryError
return response.json()
asyncio.run(get_product_qty_left("guido-van-rossum-portrait"))
import asyncio
from typing import Any
import httpx
from hyx.circuitbreaker import consecutive_breaker
class InventoryTemporaryError(RuntimeError):
"""
Occurs when the inventory microservice is temporary inaccessible
"""
breaker = consecutive_breaker(
exceptions=(InventoryTemporaryError,),
failure_threshold=5,
recovery_time_secs=30,
)
async def get_product_qty_left(product_sku: str) -> dict[str, Any]:
async with breaker:
async with httpx.AsyncClient() as client:
response = await client.get(f"http://inventory.shop/{product_sku}/")
if response.status_code >= 500:
raise InventoryTemporaryError
return response.json()
asyncio.run(get_product_qty_left("guido-van-rossum-portrait"))
Note
Breakers are stateful components. The regular usage is to create an instance of a breaker and use or inject it in all places that are working with the underlying subsystem that we anticipate to fail.
Warning
For the sake of simplicity, Hyx assumes that you are following AsyncIO best practices and not running CPU-intensive operations in the main thread. Otherwise, the breaker delays may fire later after the thread is unblocked.
Breakers¶
Consecutive Breaker¶
hyx.circuitbreaker.consecutive_breaker
(exceptions=Consecutive breaker is the most basic implementation of the circuit breaker pattern.
It counts the absolute amount of times the system has been consecutively failed and
turns into the failing
state if the threshold is exceeded.
Then the breaker waits for the recovery
delay and moves into the recovering
state.
If the action is successful, the breaker gets back to the working
state.
Otherwise, it goes back to the failing
state and waits again.
Graphically, these transitions look like this:
stateDiagram
[*] --> Working: start from
Working --> Failing: failure threshold is exceeded
Failing --> Recovering: after the recovery delay
Recovering --> Working: after the recovery threshold is passed
Recovering --> Failing: at least one failing result
Parameters
- exceptions - Exception or list of exceptions that are considered as a failure
- failure_threshold - Consecutive number of failures that turns breaker into the
failing
state - recovery_time_secs - Time in seconds we give breaker to recover from the
failing
state - recovery_threshold - Number of consecutive successes that is needed to be pass to
turn breaker back to the
working
state